• Rayhan Memon
  • Posts
  • #14 - Conversational Agent Interfaces: The GUI of the Future

#14 - Conversational Agent Interfaces: The GUI of the Future

What the history of robotics teaches us about the future of AI agents.

It’s been a while since I went out for a 5k run. I can’t move. Nothing better to do than write.

If you’re in a rush, here’s the key takeaway

How will AI agents work?

In the 2000s, every company needed a Graphical User Interface (GUI) — a website or app that humans can interface with.

In the 2030s, every company will need a Conversational Agent Interface (CAI) — an AI agent that other AI agents can interface with.

There will be an enormous space of design agencies, dev shops, and SaaS companies that crop up to support this new appendage of the internet.

We’re going to have our own AI agents soon.

Not a chat bot confined to a tab in our browser, but a true AI agent like Jarvis from Iron Man that can do meaningful, complex work for us out in the internet. They’ll book our flights, order takeout, schedule meetings, and more.

We’ll have AI agents soon, that’s not the question. The question is:

How will AI agents work?

Many companies are working on models that can understand and navigate applications and websites open on your computer or smartphone.

The intent is for these models to interact with the digital world the same way we would — by scrolling, clicking, swiping and typing through a Graphical User Interface (GUI).

Ferret-UI (Apple ML Research)

That’s probably the optimal architecture for AI agents. It would allow companies to continue designing devices, apps, and websites that are ergonomic for human users. Meanwhile, we human users can delegate any task to our AI agents, since those agents can understand and navigate the same GUIs we do.

But that isn’t how I see AI agents working in the next few years. At least not if things progress the same way the robotics industry has.

I work in robotics.

The future my colleagues and I dream of is one filled with general-purpose robots that can go anywhere and do anything a human can. Picture any humanoid robot from a sci-fi movie — that’s what the robotics industry is driving towards.

But no company has truly achieved this yet.

Robotics companies still makes the majority of their money from special-purpose robots that do a single task in a constrained environments like Roomba vacuums, dishwashers, cars, and surgical robots.

On the path to general-purpose robots that can do everything, we’ve continued to boost our productivity with special-purpose robots by networking or sequencing them together. Think of an assembly line where one robot does a specific manufacturing task on an item and then hands it off to the next robot to do a different task.

That’s how I see us developing AI agents in the next few years: by sequencing and networking together special-purpose AI models.

In the 2000s, every company needed a Graphical User Interface (GUI) — a website or app that humans can interface with.

I believe that in the 2030s and beyond, every company is going to need a Conversational Agent Interface (CAI) — an AI agent that other AI agents can interface with.

A CAI is better than simply offering a public API for AI agents to use. No two companies’ APIs are the same, so my agent would need to read documentation to know how to use each one. Code is also very unforgiving. You can’t misspell an endpoint or exclude a required header.

Natural language, on the other-hand, is universal and flexible. My AI agent can converse with my barber shop’s booking agent the same way it can converse with my 401k’s investing agent.

Few companies can afford to pretrain their own foundation models. But every company can afford to fine-tune their own small, specialized model that has knowledge of their internal resources and access to their internal API.

My brain is exploding thinking about all the different products and services that would pop up to support this new appendage of the internet.

Quick reminder - If you appreciate my writing, please reply to this email or “add to address book”. These positive signals help my emails land in your inbox.

If you don't want these emails, you can unsubscribe below. If you were sent this email and want more, you can subscribe here.

See you next Monday — Rayhan

P.S. My company, Boston Dynamics, did an interview discussing embodied AI and humanoid robotics. It’s worth a listen!