Do What? Teaching Vision-Language-Action Models to Reject the Impossible

Recently, Vision-Language-Action (VLA) models have demonstrated strong performance on a range of robotic tasks. These models rely on multimodal inputs, with language instructions playing a crucial role — not only in predicting actions, but also in robustly interpreting user intent, even when the requests are impossible to fulfill. In this work, we investigate how VLAs can recognize, interpret, and respond to false-premise instructions: natural language commands that reference objects or conditions absent from the environment. We propose Instruct-Verify-and-Act (IVA), a unified framework that (i) detects when an instruction cannot be executed due to a false premise, (ii) engages in language-based clarification or correction, and (iii) grounds plausible alternatives in perception and action. Towards this end, we construct a large-scale instruction tuning setup with structured language prompts and train a VLA model capable of handling both accurate and erroneous requests. Our approach leverages a contextually augmented, semi-synthetic dataset containing paired positive and false-premise instructions, enabling robust detection and natural language correction. Our experiments show that IVA improves false premise detection accuracy by 97.56% over baselines, while increasing successful responses in false-premise scenarios by 50.78%. — Read More

#robotics, #vision

Robotic neck incision replaces heart valve with no chest opening in world first

In a surgical first, doctors have replaced a heart valve through a small neck incision using robotic assistance, avoiding the need to open the chest.

The pioneering procedure, performed at the Cleveland Clinic by cardiothoracic surgeon Dr. Marijan Koprivanac, marks the first known clinical use of transcervical robotic access for aortic valve replacement (AVR).

Four patients underwent the technique earlier this year and were discharged within days. — Read More

#robotics

Gemini 2.5 for robotics and embodied intelligence

The latest generation of Gemini models, 2.5 Pro and Flash, are unlocking new frontiers in robotics. Their advanced coding, reasoning, and multimodal capabilities, now combined with spatial understanding, provide the foundation for the next generation of interactive and intelligent robots.

This post explores how developers can leverage Gemini 2.5 to build sophisticated robotics applications. — Read More

#robotics

Real-Time Action Chunking with Large Models

Unlike chatbots or image generators, robots must operate in real time. While a robot is “thinking”, the world around it evolves according to physical laws, so delays between inputs and outputs have a tangible impact on performance. For a language model, the difference between fast and slow generation is a satisfied or annoyed user; for a vision-language-action model (VLA), it could be the difference between a robot handing you a hot coffee or spilling it in your lap. While VLAs have achieved promising results in open-world generalization, they can be slow to run. Like their cousins in language and vision, these models have billions of parameters and require heavy-duty GPUs. On edge devices like mobile robots, that adds even more latency for network communication between a centralized inference server and the robot. — Read More

#robotics

Boston Dynamics Makes AGT HISTORY With Robots Dancing To “Don’t Stop Me Now” by Queen

Read More

#robotics, #videos

Meta’s V-JEPA 2 model teaches AI to understand its surroundings

Meta on Wednesday unveiled its new V-JEPA 2 AI model, a “world model” that is designed to help AI agents understand the world around them.

V-JEPA 2 is an extension of the V-JEPA model that Meta released last year, which was trained on over 1 million hours of video. This training data is supposed to help robots or other AI agents operate in the physical world, understanding and predicting how concepts like gravity will impact what happens next in a sequence.

These are the kinds of common sense connections that small children and animals make as their brains develop. — Read More

#robotics

The Shape of Things to Come

Amazon ‘testing humanoid robots to deliver packages’: Amazon is reportedly developing software for humanoid robots that could perform the role of delivery workers and “spring out” of its vans.

… The Information reported that the robots could eventually take the jobs of delivery workers. It is developing the artificial intelligence software that would power the robots but will use hardware developed by other companies. — Read More

Walmart and Wing expand drone delivery to five more US cities: Wing, the on-demand drone delivery company owned by Alphabet, is spreading its commercial wings with help from Walmart.

The two companies announced Thursday plans to roll out drone delivery to more than 100 Walmart stores in five new cities: Atlanta, Charlotte, Houston, Orlando, and Tampa. Walmart is also adding Wing drone deliveries to its existing market in the Dallas-Fort Worth area. — Read More

#robotics

Stumbling and Overheating, Most Humanoid Robots Fail to Finish Half Marathon in Beijing

About 12,000 human athletes ran in a half marathon race in Beijing on Saturday, but most of the attention was on a group of other, more unconventional participants: 21 humanoid robots. The event’s organizers, which included several branches of Beijing’s municipal government, claim it’s the first time humans and bipedal robots have run in the same race, though they jogged on separate tracks. Six of the robots successfully finished the course, but they were unable to keep up with the speed of the humans.

The fastest robot, Tiangong Ultra, developed by Chinese robotics company UBTech in collaboration with the Beijing Humanoid Robot Innovation Center, finished the race in two hours and 40 minutes after assistants changed its batteries three times and it fell down once. — Read More

#robotics

Samsung’s cute Ballie robot arrives this summer with Google Gemini in tow

Samsung’s Ballie will go on sale in the US and South Korea this summer, the company announced today. What’s more, through a partnership with Google Cloud, the diminutive robot will ship with a Gemini AI model.

Samsung didn’t state the specific system that powers Ballie, but in combination with the company’s own proprietary language models, it says the robot has multimodal capabilities, meaning Ballie can process voice, audio and visual data from its sensors. According to Samsung, Ballie can also manage your smart home devices and even offer health and styling recommendations, if you’re inclined to seek that type of advice from a robot. — Read More

#robotics

Accelerate Generalist Humanoid Robot Development with NVIDIA Isaac GR00T N1

Humanoid robots are designed to adapt to human workspaces, tackling repetitive or demanding tasks. However, creating general-purpose humanoid robots for real-world tasks and unpredictable environments is challenging. Each of these tasks often requires a dedicated AI model. Training these models from scratch for every new task and environment is a laborious process due to the need for vast task-specific data, high computational cost, and limited generalization. 

NVIDIA Isaac GR00T helps tackle these challenges and accelerates general-purpose humanoid robot development by providing you with open-source SimReady data, simulation frameworks such as NVIDIA Isaac Sim and Isaac Labsynthetic data blueprints, and pretrained foundation models. — Read More

#nvidia, #robotics