• Rayhan Memon
  • Posts
  • #3 - The Battle Between AI Chipmakers is a Bloodbath

#3 - The Battle Between AI Chipmakers is a Bloodbath

And NVIDIA hasn't got a scratch.

Welcome to MAJOR.MINOR.PATCH. 

Each edition, I cover one update in the world of computing and what it means for engineers, entrepreneurs and investors.

This week, NVIDIA’s still the Michael Jordan of AI inference…

I remember a simpler time when almost every laptop had an “Intel Core i7 inside” sticker on it. And if you were interested in playing high-performance video games, mining bitcoin, or screwing around with Tensorflow or PyTorch, an NVIDIA GPU was the only option on the menu.

But the landscape of competition has quickly heated up from a simmer to an angry boil.

Google started rolling their own “Tensor Processing Units” (TPUs). Apple started shipping Macbooks with their own “Apple Silicon”.

Before I knew it, what used to be the domain of deep-pocketed tech giants is now the battleground of startups like Groq, Tenstorrent, FuriosaAI, Cerebras, UntetherAI, and more.

Many are forecasting the precipitous decline of NVIDIA among this swath of capable competitors. But after seeing the results of the recent MLPerf v4.1 (think of it as the Olympics for AI inference), I don’t think NVIDIA’s losing its leadership position anytime soon.

The most popular category at MLPerf was “datacenter-closed”. For this category, teams had to run models as they came, with minimal software modification.

Submissions based on NVIDIA’s H200 GPUs and GH200 Superchips won every benchmark.

Some submissions used multiple chips networked together, while others used just one. On a per-accelerator basis, NVIDIA’s new Blackwell chip even outperformed all others by 2.5x at the one benchmark it participated in: the “LLM Q&A” task.

Mind you, MLPerf’s benchmarks only focus on measuring performance on inference, which is the process of using a model to generate outputs based on inputs.

But to generate usable outputs, these models must go through pre-training. Often on a high volume of data and over a long period of time.

It turns out that NVIDIA is even better at pre-training than they are at inference.

The reason for this is scale. Pre-training of the “foundation models” we talk about today is usually done on tens of thousands of GPUs. At that scale, many challenges arise that are nearly impossible to simulate on a smaller scale, and NVIDIA is the only family of chips (outside of Google’s TPUs) that are deployed at that scale and therefore have real-world feedback with which to optimize their chips.

Another moat NVIDIA benefits from (which is discussed at length elsewhere but hey, I only just dove into it so I may as well discuss it here too) is CUDA: a parallel computing platform and programming model that they developed in-house. Developers write CUDA kernels to optimize their code to run on NVIDIA GPUs.

NVIDIA released CUDA in 2007 and they were effectively the only game in town for the following decade.

So during AI’s awkward teenage years…

  • Leading model development frameworks like Tensorflow, Keras, and PyTorch were increasingly optimized to run on NVIDIA GPUs

  • AI practitioners that are now stewarding the field became CUDA experts

  • A large and thriving community developed around CUDA

NVIDIA’s continued ability to out-innovate in AI inference, their real-world feedback loop for pre-training, and their widely-used and deeply embedded software stack are all reasons why NVIDIA’s share of the AI chip market is still close to 90%.

It’ll be a long time before chipmakers can chip away at their leadership position.

Sources: