There is a robotic model that reaches into the physical world


Gemini Robotics: Exploiting DeepMind Artificial Intelligence to Build General-Purpose Robots with Robotic ‘Brain’

Gemini Robotics is built on Gemini 2.0, the latest version of Google’s flagship AI model. Carolina Parada, the senior director and head of robots at DeepMind, said at the press briefing that the company is working on a new type of robot that integrates physical actions into its world understanding.

In a series of demonstration videos, the company showed several robots equipped with the new model, called Gemini Robotics, manipulating items in response to spoken commands: Robot arms fold paper, hand over vegetables, gently put a pair of glasses into a case, and complete other tasks. The robots rely on the new model to connect items that are visible with possible actions in order to do what they’re told. The model is trained in a way that allows it to be generalized.

“While we have made progress in each one of these areas individually in the past with general robotics, we’re bringing [drastically] increasing performance in all three areas with a single model,” Parada said. This allows us to build more capable, responsive and robust robots that can adapt to changes in their environment.

The company is among several working to harness the artificial intelligence (AI) advances that power chatbots to create general-purpose robots. The approach brings with it safety concerns, given that such models can generate wrong and harmful outputs.

Artificial intelligence is often used in sci-fi stories to power smart, capable, and occasionally homicidal robot. The best artificial intelligence for now is trapped inside the chat window.

A team at Google DeepMind, which is headquartered in London, started with Gemini 2.0, the firm’s most advanced vision and language model, trained by analysing patterns in huge volumes of data.

The model they created was designed to help with reasoning tasks involving 3D physical and spatial understanding, such as predicting an object’s trajectory or identifying the same part of an object in images taken from different angles.

Finally, they further trained the model on data from thousands of hours of real, remote-operated robot demonstrations. This allowed the robotic ‘brain’ to implement real actions, much in the way LLMs use their learned associations to generate the next word in a sentence.