Putting LLMs inside of robots won’t solve the embodiment problem

You can breathe into a robot’s nostrils all you want, it’s still just a glorified Teddy Ruxpin. 

Have you ever wondered why nobody’s been able to create an autonomous AI agent by jamming a chatbot inside one of those fancy robots that can do parkour? 

If modern AI models are supposedly on the brink of reaching human-level reasoning, then why can’t ChatGPT, Claude, or Gemini just hop into a ready-made robot body and show the world just how useful they can be? 

The short answer is: because the laws of physics, as we understand them, prevent this kind of nonsensical stuff from happening. 

Embodiment

Chatbots don’t actually exist. No, we’re not trying to create a conspiracy theory. What we mean is this: large language models (LLMs) aren’t “entities” or “beings.”

Essentially, LLMs are sets of rules. When you prompt an LLM, those rules are exploited to generate content. Once that content is generated, the “chatbot” you were prompting disappears forever. 

Consider the following scenario involving a human prompting an LLM:

Human: Hello, please tell me the capital of Spain.

AI: Hello! The capital of Spain is Madrid. 

Human: Thank you. How many people live there?

AI: Approximately 3.46 million people (source: https://en.wikipedia.org/wiki/Madrid)

It appears as though the human and the AI have had a short conversation. The AI’s second response seems to indicate agentic behavior. The human asked a question, the AI answered, and when the human asked a follow up question the AI was able to provide additional information. 

But that’s not what’s actually happening. Instead, the LLM exploits its rules to generate a token. Then it starts over and generates another. Then it starts over and generates another. Then it starts over and… you get the picture. The “chatbot” that answered the first question stopped existing before the chatbot that answered the second question was even created. 

In biological terms, this would be like asking someone what time it is and, instead of answering “11:32 at night,” they give birth to a baby whose first word is “eleven” and then they give birth to another baby whose first word is “thirty” and then they give birth to another baby whose first word is “two” and, finally, one last baby is born who says “PM.” 

None of these babies have the slightest clue what they’re talking about. And once they’ve performed their task, they disappear forever. 

If you were to say, “Wait, I didn’t hear you. Can you say that again?” Those same babies wouldn’t shoot out again. The person would have to give birth to a whole new set of babies to answer the question again. 

This is why companies such as OpenAI have to create sub-siloed datasets built on-the-fly in order to make it look like these models have “memory.” 

Robots aren’t babies

Putting an LLM inside a robot doesn’t give it the physical properties of the robot. It’s not like putting a human inside of a car. When we sit in a vehicle, we become that machine’s brain. We use our eyes, ears, and perception of motion to make decisions in real time. 

Machines can’t do that. Even if you give an LLM a camera and a microphone, it still can’t “see” or “hear.” Signals have to be translated into binary data that can be processed using binary computations. Once translated, those signals are then interpreted as binary data, and then a binary output is generated. 

For a human, this would be like trying to control a car from inside of its trunk using nothing but text descriptions for input and output. Worse, the LLM doesn’t understand what cars or trunks are. It has no way of knowing that its perception modality is limited. Despite its inherent and extreme limitations, it will perform the task to the best of its abilities with full confidence.

A human, on the other hand, should know better than to try and drive a car in real-time from inside of a dark trunk via “choose your own adventure” text prompts. 

Humans don’t experience quantum phenomena through binary signals. While scientists may not have a complete understanding of the human brain, it’s incredibly difficult to explain its functions and operations using classical physics. 

This means one of two things: either humans are classical entities and LLMs are capable of the same level of agency as we are or our universe is quantum and chatbots are just software rules being exploited through classical mechanics. 

If the latter is true, putting a chatbot inside of a robot doesn’t give it any more agency than it already had. But, if the former is true, then it’s nigh impossible to argue that we aren’t also large language models embodied in biological machines. 

Read more: What could a chatbot say that would convince you it was intelligent?

Art by Nicole Greene

Previous
Previous

Should you be polite to AI? Here’s what the research says

Next
Next

Numbers that lie: AI reaches 95% accuracy on medical diagnostics by reward hacking