Artificial Intelligence

Forget the hype -- robot training is becoming a job for other AI agents

Nvidia's ENPIRE framework uses AI coding agents to train robot fleets without human help, cutting training time and hitting a 99% success rate.
Forget the hype -- robot training is becoming a job for other AI agents

A small metal pin slides into a four-millimeter hole with the precision of a watchmaker. The robot arm holding the pin moves with a fluid, confident motion that suggests years of practice. This successful action is the finished product of a new automated pipeline. Behind that single successful movement is a complex chain of software commands. Those commands came from an AI coding agent like Claude or Codex. The agent itself exists within a framework called ENPIRE, which Nvidia researchers recently unveiled to the public. To power that agent, Nvidia allocated a massive budget of GPU processing time and digital tokens. At the very start of this chain is a simple goal: teach a machine to do a chore without a human in the room.

Nvidia, in collaboration with researchers from Carnegie Mellon and UC Berkeley, recently released a paper detailing ENPIRE. The framework allows AI coding agents to take over the entire process of training a robot. These are the same software tools that developers use to write website code or debug applications. In the ENPIRE system, these agents are responsible for writing the training code, testing it on physical hardware, and fixing errors when the robot fails. Traditionally, a human engineer spent weeks fine-tuning these movements. Now, a fleet of eight robots can teach themselves the same skills in a fraction of the time.

The tireless intern in the machine

To understand how this works, think of the AI coding agent as a tireless intern. In a typical lab, an engineer has to watch a robot try to pick up a block, see it fail, and then manually rewrite the code to fix the grip. This is slow and expensive. ENPIRE replaces the human observer with a digital loop. The process has two initial steps where humans are involved. First, a person helps the agent build a reset routine. This is a set of instructions that tells the robot how to put the workspace back to its original state after a failed attempt. Second, the human helps create a reward function. This is an AI referee that watches camera footage to decide if the robot succeeded or failed.

Once these two tools are in place, the humans leave. The AI agent starts its shift by searching through academic papers for the best training methods. It picks a strategy, writes the necessary Python code, and sends it to the robot arms. If the robot drops a pin or misses a target, the agent sees the failure, analyzes the data, and rewrites the code. This is autoresearch in the physical world. While humans sleep, the agents run hundreds of experiments. They do not get bored, and they do not need coffee breaks. This constant cycle of trial and error is what allows the system to reach a 99% success rate on complex physical tasks.

Why eight arms are better than one

The real power of ENPIRE is evident when the system moves from a single robot to a fleet. Nvidia used eight bimanual robot stations for its primary experiment. These stations are not isolated. They are connected via Git, which is the standard tool software developers use to share and track changes in code. When one robot discovers a better way to insert a graphics card or cut a zip tie, it commits that code to a shared repository. The other seven robots immediately download the update.

This shared intelligence creates a massive speed advantage. In the task known as Push-T, where a robot must slide a T-shaped block into a specific zone, a single robot took about five hours to master the movement. When the researchers turned on all eight robots, the time dropped to just two hours. The same trend appeared in pin insertion. A single arm needed over 90 minutes to become reliable, but the fleet finished the job in 40 minutes.

Task Single Robot Training Time Eight-Robot Fleet Training Time Final Success Rate
Push-T 5 Hours 2 Hours 99%
Pin Insertion 90 Minutes 40 Minutes 99%
Zip-tie Cutting N/A Accelerated 99%
GPU Seating N/A Accelerated 99%

Looking at the big picture, this suggests that the bottleneck in robotics has never been the hardware. The limitation was the speed of human instruction. By letting the robots talk to each other through a central coding agent, the learning process becomes decentralized and incredibly fast.

The friction of the real world

There is a significant hurdle that AI researchers call the sim-to-real gap. It is easy to teach a robot to do something in a computer simulation where gravity is perfect and surfaces have no texture. In a simulator, every T-shaped block is identical, and every table is perfectly flat. The real world is messy. Tables have friction, lighting changes throughout the day, and mechanical parts have tiny imperfections.

During the ENPIRE experiments, the gap between simulation and reality was clear. All three coding agents tested—OpenAI’s Codex, Anthropic’s Claude Code, and Moonshot’s Kimi Code—solved the Push-T task easily in a virtual kitchen. However, when the code moved to the actual physical robots, two of those three agents failed initially. They struggled with the physics of a real table. The agents had to rewrite their code several times to account for the way the plastic block actually slid across the surface. This highlights why physical testing is still the gold standard for robotics. An AI can be a genius in a digital world and still fail to cut a zip tie in a lab because it did not account for the way the plastic bends.

The high price of machine thinking

While the time saved is impressive, it is not free. There is a hidden cost to letting AI agents run the show. Each time an agent like Claude Code thinks about a problem, it consumes tokens. These tokens represent the data processed by the large language model, and they cost real money. Nvidia noted that while scaling from one robot to eight cut training time by more than half, the token bill grew even faster.

Essentially, the system is trading cheap human time for expensive computer time. For a giant like Nvidia, which owns the chips and the data centers, this is a winning trade. For a smaller startup, the cost of letting an AI agent "think" its way through a thousand failed experiments might be higher than just hiring a human engineer. This creates a divide in the market. Companies with the most computing power will likely be the ones who produce the most capable robots because they can afford the high cost of automated failure.

What this means for your future home

For the average user, this research is the first step toward robots that are actually useful in a house. Most current home robots, like basic vacuum cleaners, are programmed with rigid rules. They struggle if you move your furniture or buy a new rug. A robot powered by a system like ENPIRE would not need a software update from the manufacturer to handle a new chore. It could theoretically spend an afternoon "practicing" how to fold your specific brand of laundry or load your specific dishwasher.

On the market side, we are seeing a race between the US and China. The same week Nvidia released ENPIRE, Alibaba introduced its Qwen-Robot Suite. Alibaba is focusing on the software brains that can work on any robot body, while Nvidia is testing how its own hardware can improve itself. This competition is good for consumers. It means that the technology to make robots smarter is moving out of the purely theoretical space and into the factory and the home.

Practically speaking, we are moving away from the era of robots that are programmed and toward an era of robots that are coached. The human provides the goal and the referee, and the AI handles the tedious work of practicing until it is perfect. Ultimately, this will change how we interact with technology. Instead of learning how to use a machine, we will simply tell the machine what we want it to learn.

Behind the jargon of coding agents and reward functions is a simple reality: the machines are starting to write their own manuals. This shift will likely lead to more resilient hardware and more intuitive devices. Observe how the tools in your life currently require you to adapt to them. In a few years, as these autonomous training loops become standard, the devices in your home will be the ones doing the adapting.

Sources: Nvidia GEAR Lab Research Paper, official announcements from Jim Fan via X/Twitter, and the ENPIRE project technical documentation.

bg
bg
bg

See you on the other side.

Our end-to-end encrypted email and cloud storage solution provides the most powerful means of secure data exchange, ensuring the safety and privacy of your data.

/ Create a free account