Large Language Models (LLMs) optimized for predicting subsequent utterances and adapting to tasks using contextual embeddings can process natural language at a level close to human proficiency. This study shows that neural activity in the human brain aligns linearly with the internal contextual embeddings of speech and language within large language models (LLMs) as they process everyday conversations.
How does the human brain process natural language during everyday conversations? Theoretically, large language models (LLMs) and symbolic psycholinguistic models of human language provide a fundamentally different computational framework for coding natural language. Large language models do not depend on symbolic parts of speech or syntactic rules. Instead, they utilize simple self-supervised objectives, such as next-word prediction and generation enhanced by reinforcement learning. This allows them to produce context-specific linguistic outputs drawn from real-world text corpora, effectively encoding the statistical structure of natural speech (sounds) and language (words) into a multidimensional embedding space.
Inspired by the success of LLMs, our team at Google Research, in collaboration with Princeton University, NYU, and HUJI, sought to explore the similarities and differences in how the human brain and deep language models process natural language to achieve their remarkable capabilities. Through a series of studies over the past five years, we explored the similarity between the internal representations (embeddings) of specific deep learning models and human brain neural activity during natural free-flowing conversations, demonstrating the power of deep language model’s embeddings to act as a framework for understanding how the human brain processes language. We demonstrate that the word-level internal embeddings generated by deep language models align with the neural activity patterns in established brain regions associated with speech comprehension and production in the human brain. — Read More
Monthly Archives: March 2025
Cloudflare turns AI against itself with endless maze of irrelevant facts
On Wednesday, web infrastructure provider Cloudflare announced a new feature called “AI Labyrinth” that aims to combat unauthorized AI data scraping by serving fake AI-generated content to bots. The tool will attempt to thwart AI companies that crawl websites without permission to collect training data for large language models that power AI assistants like ChatGPT.
… Instead of simply blocking bots, Cloudflare’s new system lures them into a “maze” of realistic-looking but irrelevant pages, wasting the crawler’s computing resources. The approach is a notable shift from the standard block-and-defend strategy used by most website protection services. Cloudflare says blocking bots sometimes backfires because it alerts the crawler’s operators that they’ve been detected. — Read More
Gemini 2.5: Our most intelligent AI model
Today we’re introducing Gemini 2.5, our most intelligent AI model. Our first 2.5 release is an experimental version of 2.5 Pro, which is state-of-the-art on a wide range of benchmarks and debuts at #1 on LMArena by a significant margin.
Gemini 2.5 models are thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. — Read More
Vibe Coding: Pairing vs. Delegation
In The Vibe Coding Handbook: How To Engineer Production-Grade Software With GenAI, Chat, Agents, and Beyond, Steve Yegge and I describe a spectrum of coding modalities with GenAI. On one extreme is “pairing,” where you are working with the AI to achieve a goal. It really is like pair programming with another person, if that person was like a “summer intern who believes in conspiracy theories” (as coined by Simon Willison) and the world’s best software architect.
On the other extreme is “delegating” (which I think many will associate with “agentic coding”), where you ask the AI to do something, and it does so without any human interaction.
… These dimensions dictate the frequency of reporting and feedback you need. — Read More
Accelerate Generalist Humanoid Robot Development with NVIDIA Isaac GR00T N1
Humanoid robots are designed to adapt to human workspaces, tackling repetitive or demanding tasks. However, creating general-purpose humanoid robots for real-world tasks and unpredictable environments is challenging. Each of these tasks often requires a dedicated AI model. Training these models from scratch for every new task and environment is a laborious process due to the need for vast task-specific data, high computational cost, and limited generalization.
NVIDIA Isaac GR00T helps tackle these challenges and accelerates general-purpose humanoid robot development by providing you with open-source SimReady data, simulation frameworks such as NVIDIA Isaac Sim and Isaac Lab, synthetic data blueprints, and pretrained foundation models. — Read More
Code is the new no-code
Most people can’t code. So if you’re running a business, for years you’ve had only two options when you wanted to improve your productivity with the tools and systems you used.
1. Buy better software
2. Pay someone to build better software
For years, we’ve been promised a future where anyone could build software without learning to code, giving us a third option. A promised third option was that you could just drag-and-drop some blocks, connect a few nodes, and voilà — you’ve built a fully functional app without writing a single line of code! — Read More
Not all AI-assisted programming is vibe coding (but vibe coding rocks)
Vibe coding is having a moment. The term was coined by Andrej Karpathy just a few weeks ago (on February 6th) and has since been featured in the New York Times, Ars Technica, the Guardian and countless online discussions.
I’m concerned that the definition is already escaping its original intent. I’m seeing people apply the term “vibe coding” to all forms of code written with the assistance of AI. I think that both dilutes the term and gives a false impression of what’s possible with responsible AI-assisted programming.
Vibe coding is not the same thing as writing code with the help of LLMs!
… When I talk about vibe coding I mean building software with an LLM without reviewing the code it writes. — Read More
Amazing New Technology Can ‘Bend’ Sounds Into Your Ears Only
What if you could listen to music or a podcast without headphones or earbuds and without disturbing anyone around you? Or have a private conversation in public without other people hearing you?
Our newly published research introduces a way to create audible enclaves – localized pockets of sound that are isolated from their surroundings. In other words, we’ve developed a technology that could create sound exactly where it needs to be.
The ability to send sound that becomes audible only at a specific location could transform entertainment, communication and spatial audio experiences. — Read More
Everyone in AI is talking about Manus. We put it to the test.
Since the general AI agent Manus was launched last week, it has spread online like wildfire. And not just in China, where it was developed by the Wuhan-based startup Butterfly Effect. It’s made its way into the global conversation, with influential voices in tech, including Twitter cofounder Jack Dorsey and Hugging Face product lead Victor Mustar, praising its performance. Some have even dubbed it “the second DeepSeek,” comparing it to the earlier AI model that took the industry by surprise for its unexpected capabilities as well as its origin.
Manus claims to be the world’s first general AI agent, using multiple AI models (such as Anthropic’s Claude 3.5 Sonnet and fine-tuned versions of Alibaba’s open-source Qwen) and various independently operating agents to act autonomously on a wide range of tasks. (This makes it different from AI chatbots, including DeepSeek, which are based on a single large language model family and are primarily designed for conversational interactions.)
… MIT Technology Review was able to obtain access to Manus, and when I gave it a test-drive, I found that using it feels like collaborating with a highly intelligent and efficient intern: While it occasionally lacks understanding of what it’s being asked to do, makes incorrect assumptions, or cuts corners to expedite tasks, it explains its reasoning clearly, is remarkably adaptable, and can improve substantially when provided with detailed instructions or feedback. Ultimately, it’s promising but not perfect. — Read More
MovieAgent: Automated Movie Generation via Multi-Agent CoT Planning
Existing long-form video generation frameworks lack automated planning, requiring manual input for storylines, scenes, cinematography, and character interactions, resulting in high costs and inefficiencies. To address these challenges, we present MovieAgent, an automated movie generation via multi-agent Chain of Thought (CoT) planning. MovieAgent offers two key advantages: 1) We firstly explore and define the paradigm of automated movie/long-video generation. Given a script and character bank, our MovieAgent can generates multi-scene, multi-shot long-form videos with a coherent narrative, while ensuring character consistency, synchronized subtitles, and stable audio throughout the film. 2) MovieAgent introduces a hierarchical CoT-based reasoning process to automatically structure scenes, camera settings, and cinematography, significantly reducing human effort. By employing multiple LLM agents to simulate the roles of a director, screenwriter, storyboard artist, and location manager, MovieAgent streamlines the production pipeline. Experiments demonstrate that MovieAgent achieves new state-of-the-art results in script faithfulness, character consistency, and narrative coherence. Our hierarchical framework takes a step forward and provides new insights into fully automated movie generation. — Read More