Nvidia's next-generation AI chip

tech 2024-05-08 131 Comment

Nvidia's next-generation Blackwell architecture could play a significant role.

According to Nvidia's roadmap, it will introduce its next-generation Blackwell architecture soon. The company always launches a new architecture with data center products first and then announces a cut-down GeForce version a few months later, so this is also expected this time. As evidence of Nvidia's upcoming new data center GPU, a Dell executive has shared some interesting information about the next-generation Nvidia hardware, stating in a recent earnings call that the company has a 1000W data center GPU in the works.

Dell's Chief Operating Officer, Jeff Clarke, discussed Dell's engineering advantages and the benefits that Nvidia's upcoming hardware could provide during the earnings call on February 29th. He said: "We are excited about the B100 and GB200 products, which are the chip names for Nvidia's next-generation data center GPUs and their successors. Nvidia currently has the H100 as its flagship data center GPU and has just launched a second-generation product with faster HBM3e memory, called the H200." We all know that B100 is the Blackwell successor to this chip, so GB200 seems to be the second iteration of this GPU, although it does not currently appear on Nvidia's roadmap.

Advertisement

Then, Clarke began discussing the thermal performance of these next-generation components, stating: "You actually do not need direct liquid cooling to achieve an energy density of 1,000W per GPU. Some products next year will achieve this."

The current GH200 Grace Hopper CPU+GPU's TDP already ranges from 450W to 1,000W, depending on the configuration, so it would be somewhat surprising to see the next-generation version maintain the same figure. Meanwhile, the existing H100 is a 700W GPU, but we do not know what power requirements its successor, B100, will have. Nvidia seems capable of ramping up its power to 1,000W, but we have not heard any news regarding the power consumption of B100.Currently, we must wait until March 18 to see what Nvidia has prepared for the Blackwell data center. As gamers, we may also be able to gather some details from that announcement. Given the company's high standing in the artificial intelligence market, the world will be watching this year's GTC to see what tricks Nvidia has up its sleeve. Despite the release being very close, our understanding of Blackwell remains limited, with only the knowledge that it will use TSMC's 3nm process and that Nvidia may adopt a chiplet design for the first time. Nvidia also revealed that the demand for these chips will outstrip supply in the short term.

Nvidia has also disclosed plans for the X100 chip, scheduled for release in 2025, which will expand the product range to include the X40 for enterprise use and the GX200, combining CPU and GPU functions in a Superchip configuration. Similarly, the GB200 is expected to follow the B100's lead by incorporating the superchip concept.

Looking at Nvidia's product roadmap, the AI chip market is set to be turned upside down again in the next 1-2 years.

It is worth noting that in the AI chip field where Nvidia holds an absolute position, AMD is one of the few companies with high-end GPUs capable of training and deploying AI. The industry positions it as a reliable alternative for generative AI and large-scale AI systems. One of AMD's strategies to compete with Nvidia includes the powerful MI300 series of accelerator chips. Currently, AMD is directly challenging Nvidia's dominance with the H100 through more powerful GPUs and innovative CPU+GPU platforms.

AMD's latest release, the MI300, currently includes two major series: the MI300X series is a large GPU with the memory bandwidth needed for leading generative AI and the training and inference performance required for large language models; the MI300A series integrates CPU+GPU, based on the latest CDNA 3 architecture and Zen 4 CPU, offering breakthrough performance for HPC and AI workloads. Undoubtedly, the MI300 is not only a new generation of AI accelerator chips but also AMD's vision for the next generation of high-performance computing.

In addition to this, Nvidia also faces competition from companies developing their own AI chips.

In February of this year, tech giant Meta Platforms confirmed that the company plans to deploy its latest custom-designed chips in its data centers this year, which will work in coordination with other GPU chips, aiming to support the development of its AI large models. Dylan Patel, founder of research institution SemiAnalysis, stated that considering Meta's scale of operations, once the custom chips are successfully deployed on a large scale, it is expected to save hundreds of millions of dollars in energy costs and billions of dollars in chip procurement costs annually.

OpenAI has also begun seeking billions of dollars in funding to build a network of artificial intelligence chip factories. Foreign media reports indicate that OpenAI is exploring the manufacture of its own AI chips. Additionally, OpenAI's website has begun recruiting talent related to hardware, with several positions for the co-design of hardware and software listed on the official website. In September last year, OpenAI also recruited the renowned expert in the field of AI compilers, Andrew Tulloch, which seems to confirm OpenAI's investment in the development of its own chips.

Not only Meta and OpenAI, but according to statistics from The Information, as of now, there are more than 18 chip design startups worldwide that are used for training and inference of AI large models, including Cerebras, Graphcore, Biren Technology, Moore Threads, d-Matrix, and others.

*Disclaimer: This article is the original creation of the author. The content of the article represents their personal views, and our reposting is solely for sharing and discussion, and does not represent our approval or agreement. If there are any objections, please contact the backend.