China's supernode boom sends warning to Nvidia
Length: • 3 mins
Annotated by Robert
Some content could not be imported from the original document. View content ↗

Since Huawei debuted the CloudMatrix 384 (CM384) in mid-2025, China's supernode race has taken centre stage in the global AI infrastructure arena. According to SemiAnalysis, Chinese AI supernodes have surpassed Nvidia in several integration metrics, cementing the country's fast-rising influence in high-performance compute architecture.
Analysts are calling 2025 the year China's supernode ecosystem truly took off. Alibaba followed Huawei with its PanJiu AI Infra 2.0, a 128-chip cluster delivering four times the compute density of CM384. Soon after, Sugon (Dawning Information Industry) launched scaleX640, the world's first cabinet-level system housing 640 accelerator cards, claiming a twentyfold leap in aggregated computing power over Huawei's design.
Despite lagging behind Western GPUs in single-card performance and facing tightening US curbs on Nvidia's shipments, China's AI giants are advancing through cluster-level innovation. By building high-bandwidth, low-latency supernode systems, they are rewriting the economics of large-scale model training and token computation. As Washington debates a potential ban on Nvidia's Blackwell chips, Jensen Huang now faces a sobering reality: China's AI infrastructure is evolving faster than expected.
The shift to supernodes in China's AI compute stack
China's new strategy is unmistakable: AI leadership depends not on a single GPU but on supernode-level integration. These architectures are becoming the defining compute units of the AI era, testing innovation across every layer of the stack, from chips and memory to networking, power, and cooling.
At the World Internet Conference (Wuzhen Summit) in early November, Sugon's scaleX640 debuted alongside Huawei's CM384 and Alibaba's PanJiu 128, signalling the start of a full-scale supernode boom. Major firms, including ZTE, Inspur, and H3C, are now joining the race, driving compute density and ecosystem compatibility to record highs.
Industry insiders say top cloud service providers (CSPs) are working closely with server OEMs to customise cabinet-level systems. Most still centre around Nvidia architectures, but Chinese GPU makers are building their own alternatives. The mainstream lineup now includes Tencent's ETH-X, Nvidia's NVL72, Huawei's CM384, and Alibaba's PanJiu, all either commercially deployed or in testing.
Internet giants and GPU ecosystem diversification
ByteDance is developing its own large-model Ethernet solution built on Broadcom's Tomahawk architecture, designed by ASRock in Taiwan. The setup takes a different route from the NVLink or Ethernet clusters used by Nvidia and Huawei, marking a clear diversification in AI network infrastructure.
Tencent's ETH-X uses Enflame's S60 and L600 accelerators, while ByteDance integrates Technologies and custom ASICs optimised for Iluvatar Corex, MetaX, and Moore Threads GPUs. Each targets distinct workloads, balancing inference throughput and training efficiency across local ecosystems.
Broadly, China's internet giants are diverging in architectural focus. Huawei's CM384 is geared toward model training, Tencent's ETH-X toward inference, and ByteDance's systems toward high-performance computation (HPC). Server manufacturers supporting these projects must combine expertise in both switching and compute design while keeping close ties with Broadcom, Nvidia, and domestic GPU developers.
A supernode arms race in full swing
Huawei kicked off the supernode era with the CM384, which integrates 32 Ascend 910C NPUs per cabinet across 12 racks, once the largest high-speed interconnect cluster in the industry. The system underlined Huawei's leadership in combining communication and computation, igniting a new phase in China's AI hardware race.
Alibaba's PanJiu AI Infra 2.0 redefined the category with 128 accelerator cards per cabinet, quadrupling Huawei's compute density. Powered by Alibaba's in-house CIPU 2.0 and EIC/MOC networking, PanJiu's open architecture reaches Pb/s-level bandwidth and sub-microsecond latency. Under equal compute power, inference performance improves by 50%, and its Qwen model delivers a threefold training speed-up through tight software-hardware synergy.
Sugon's scaleX640 represents the boldest step yet, a 640-card cabinet-level supernode built on a fully open AI computing architecture. It supports a wide range of accelerators and mainstream intelligent-computing ecosystems, enabling fast model migration and seamless application optimisation across platforms.
Article edited by Jack Wu

