Rayhan Memon
Posts
#13 - The "Latency vs. Accuracy" Trade-Off

#13 - The "Latency vs. Accuracy" Trade-Off

And the applications AI may never touch

Rayhan Memon
November 18, 2024 • Estimated Reading Time: 4 minutes

Sunday Night Football is about to kickoff. I’ve got a pile of laundry that needs folding. My bedtime is in 2 hours.

Not the best time to wax poetic about the future of AI, but I need to get this thought out.

If you’re in a hurry, scroll to the end for the key takeaway.

My brain is totally fixated on the “Latency vs. Accuracy Trade-Off” right now. I can’t stop thinking about it.

Here’s a summary:

⬆️ Speed = ⬇️ Accuracy

To get the lowest latency, our model likely needs to live on the edge. It must be small enough to fit in our device’s memory and run quickly on our device’s processors. The model also can’t waste time thinking too hard or fetching data from external sources.

Similarly…

⬆️ Accuracy = ⬇️ Speed

The most performant models often need to run on remote compute clusters, far away from the device prompting it. Today, the best results come from models that use Chain of Thought Reasoning (by recursively prompting themselves to think deeply about a problem), which takes a variable amount of time. The model may also need to fetch data from external systems of record with RAG (Retrieval Augmented Generation).

That’s the “Latency vs. Accuracy Trade-Off” in a nutshell. If we want accurate results, we need to be patient. If we want fast results, we need to be lenient.

That’s an acceptable trade-off for most applications.

We need banner ads, social media feeds, and Google search pages to load fast, and we’re okay with the results being imperfect. AI works great for these low accuracy, high speed applications.

We need travel bookings, medical diagnoses, and investment strategies to be right, and we’re okay with patiently chatting back and forth with an AI assistant until we’re sure that they are. AI will eventually be a great match for these high accuracy, low speed applications too.

But there are some applications that need to be both fast and accurate. Your car’s Automatic Emergency Braking (AEB) system can’t stop to think. Your pacemaker can’t miss a beat.

For these high-accuracy, high-speed applications, good-ol’-fashioned deterministic programs will always be favoured over AI.

Key Takeaway

AI applications have to balance latency and accuracy. Large reasoning models like o1 are accurate, but slow. Small models running on the edge are faster, but more error-prone.

Some apps can’t accept that trade-off. Your pacemaker can’t miss a beat.

These are the kinds of applications that AI may never touch.

Share this on X

Quick reminder - If you appreciate my writing, please reply to this email or “add to address book”. These are positive signals that help my emails land in your inbox.

If you don't like my newsletter, you can unsubscribe below. If you were sent this newsletter and want more, you can subscribe here.

See you next week — Rayhan

P.S. 20VC’s recent interview with Sam Altman inspired this article (ya, I know I’ve recommended Sam Altman interviews two weeks in a row, sue me).