Nvidia may have to delay mass production of next-generation B200-based AI servers GB 200 According to one company, due to overheating, power consumption, and the need to optimize interconnects, the platform TrendForce Report. The market research firm believes that the peak volume production and shipment of Blackwell machines will occur sometime in mid-2025, which means a delay of nearly half a year. Nvidia has not confirmed or denied these claims.
As expected, Nvidia and its partners will only be able to ship a limited number of Blackwell-based servers in 2024, as the company will have to use its Low output B200 for them. However, Dell has started shipping Blackwell server racks. However, TrendForce doesn’t expect an immediate surge in sales of Blackwell-based servers, even though a refined version of Nvidia’s B200 processor entered mass production in October and will therefore arrive in the company’s hands in January. The company said that due to overheating, power consumption and higher-speed interconnect requirements, the mass production and shipment peak of B200 and GB200 will only occur between the second and third quarters of 2025.
Just a few months ago, there were reports that an Nvidia NVL72 rack based on the GB200 platform, 72 B200 GPUs will consume 120 kW of powerwhich is already significantly higher than the current AI server rack (typical high-density rack power is up to 20kW, while it is reported that the power consumption of the H100-based rack is about 40kW). TrendForce now claims that Nvidia has updated the specs of the device and that it now draws 140 kW, which is more power than a typical data center can provide to a single rack.
The problem is Nvidia’s Blackwell GPUs reportedly prone to overheating In a server with 72 processors, even the power consumption per rack is as high as 120 kW. This problem has forced Nvidia to constantly modify the design of the server rack, because overheating will not only reduce GPU performance, but also bring the risk of hardware damage. Power consumption of 140 kW per rack means further changes to the server design will be required, which can lead to setbacks.
Increased power consumption means additional cooling requirements. Liquid cooling is critical to Blackwell servos, but modern sidecar coolant distribution units (CDUs) can only handle 60 kW – 80 kW of thermal power. To this end, cooling system suppliers are optimizing cold plate designs aimed at doubling or tripling the capacity of CDUs. TrendForce expects liquid-to-liquid CDU performance to exceed 1.3 mW, with further improvements likely, so excessive heat dissipation will eventually cease to be a major issue.
However, power consumption and thermal management are not the only issues Nvidia and its partners are addressing, the report said. TrendForce claims that Nvidia must optimize its interconnects, but does not elaborate on which interconnects must be optimized.
It remains to be seen how the alleged teething problems with Nvidia’s B200 and GB200 servers affect the release timing and availability of the B200A, which is based on simplified Blackwell processors, and the B300 and GB300 machines, which feature newer Blackwell GPUs. While the B200A may consume much less power compared to the B200/GB200, the updated B300 series of Blackwell GPUs are expected to come with more memory and higher computing performance, which typically requires more power, so These products may consume more than 140 kW per rack and require more complex components and cooling.