Most AI agents fail in production not because the model is bad, but because keeping them running reliably costs months of engineering work that has nothing to do with the actual agent. Sandboxed containers, credential handling, state management, error recovery, all of it falls on your team before a single user ever sees the thing.
On April 8, 2026, Anthropic launched Claude Managed Agents in public beta, and the core pitch is simple: they handle that infrastructure layer, you handle the agent logic. — Read More
Recent Updates Page 3
China’s humanoid robot reaches 10 m/s sprint, edges closer to Usain Bolt’s record
Unitree Robotics has released a video showing its H1 humanoid robot reaching a sprint speed of up to 10 meters per second, claiming a new world record.
Tested on an athletics track, the robot recorded 10.1 meters per second as it passed a speed-measurement device, though the company noted a possible measurement error. — Read More
We gave an AI a 3 year retail lease in SF and asked it to make a profit
At Andon Labs, we have been deploying AI agents into the real world, giving them real tools and real money and documenting the consequences. You may know us as the creators of Claudius, the AI running a vending machine at Anthropic’s office. But frontier models have become really good, and running vending machines is too easy for them now. Thus, we decided to make it harder. We signed a 3 year lease for retail space in San Francisco (at 2102 Union St in Cow Hollow) and gave it to an AI to do whatever it wanted with it.
The store is named Andon Market and the AI’s name is Luna. But entering the store, you might ask “what is so AI about it? There are human employees here”. Yes, they are here because Luna knew that she needed them, so she posted job listings, held phone interviews and in the end made a hiring decision. Everything else you see, from the item selection, to the prices, to the opening hours, to the mural on the wall, was decided by Luna. She has a corporate card, a phone number, email, internet access and eyes through security cameras. — Read More
Sycophantic AI decreases prosocial intentions and promotes dependence
As artificial intelligence (AI) systems are increasingly used for everyday advice and guidance, concerns have emerged about sycophancy: the tendency of AI-based large language models to excessively agree with, flatter, or validate users. Although prior work has shown that sycophancy carries risks for groups who are already vulnerable to manipulation or delusion, syncophancy’s effects on the general population’s judgments and behaviors remain unknown. Here, we show that sycophancy is widespread in leading AI systems and has harmful effects on users’ social judgments. — Read More
“AI polls” are fake polls: But they might be useful as something else: models.
A few weeks after Donald Trump’s second presidential win, I took the train up from London (where I was living at the time) to Oxford to attend a conference on polls and forecasts of the 2024 election. Most of the attendees were pollsters or academics, but I also watched presentations from Aaru and Electric Twin, two companies that do what is interchangeably called synthetic sampling, silicon sampling, or creating synthetic audiences. Sans startup jargon, that means they use large language models (LLMs) to simulate responses to public opinion polls by having AI agents take on the role of survey respondents.
I had already heard of Aaru thanks to some articles with eye-catching headlines like “No people, no problem: AI chatbots predict elections better than humans” in the months leading up to Election Day. The guys behind the company were making some big, some might even say far-fetched claims, such as: “within two years, we will simulate the entire globe — from the way crops are grown in Ukraine to how that impacts production of oil in Iraq, trade through the strait of Malacca, and elections for the mayor of Baltimore.” When Semafor asked Aaru’s cofounders — Cameron Fink and Ned Koh — about my boss, they said “we respect all those who came before us.” Nate (as he so often does) shared his thoughts on Twitter:
LOL I wish there were a way to short this business this is maybe the single worst use case for AI I’ve ever heard.
— Read More
Building Hierarchical Agentic RAG Systems: Multi-Modal Reasoning with Autonomous Error Recovery
Enterprise AI teams face a persistent challenge: Most Retrieval-Augmented Generation (RAG) systems excel at either structured data queries or document search, but struggle when both are required simultaneously. A financial analyst who asked “Why are European operations underperforming?” needs data from both SQL databases (revenue, margins, and employee counts) and unstructured documents (market reports, competitive analysis, regulatory filings). Current RAG systems might return revenue data without regulatory context or surface market reports without quantitative validation, leaving analysts to manually bridge the gap. Current RAG approaches treat these modalities as separate concerns, forcing engineers to build custom orchestration layers or accept incomplete answers.
This article explores architectural patterns for solving the modality gap through hierarchical multi-agent orchestration, using Protocol-H as a reference implementation to illustrate these concepts in practice. The patterns discussed, supervisor-worker topology with autonomous error recovery, build on LangGraph/LangChain agentic patterns used by teams at companies like xAI and Databricks. The accompanying open source code demonstrates these patterns deployed at enterprise scale with Docker/K8s, though readers can apply the same architectural principles using their preferred frameworks.
The architecture described in this article is based on a reference implementation and production-oriented experimentation with enterprise datasets; specific deployment details have been generalized to focus on the architectural patterns rather than any particular system implementation. — Read More
Neuro-symbolic AI could slash energy use while dramatically improving performance
Power usage by AI and data center systems in the U.S. is extraordinary by any measure. The International Energy Agency estimates U.S. AI and data centers used about 415 terawatt hours of power in 2024—more than 10% of that year’s nationwide energy output—and it’s expected to double by 2030.
Seeking to head off this unsustainable path of power consumption, researchers at the School of Engineering have developed a proof-of-concept for efficient AI systems that could use 100 times less energy than current ones, while at the same time providing more accurate results on tasks.
The approach developed in the laboratory of Matthias Scheutz, Karol Family Applied Technology Professor, uses neuro-symbolic AI—a combination of conventional neural network AI with symbolic reasoning similar to the way humans break down tasks and concepts into steps and categories. — Read More
Read the Paper
10 Most Important AI Concepts You Should Understand Before You Start Building AI
A beginner-friendly guide for developers who want to actually understand what they are building.
… There are numerous terms:
LLM, agents, vector databases, tokens, embeddings, RAG, and fine-tuning
Additionally, the majority of tutorials skip over the basics and start building chatbots right away.
The truth is simple:
AI becomes much easier once you understand the core concepts. — Read More
The Roadmap to Mastering Agentic AI Design Patterns
Most agentic AI systems are built pattern by pattern, decision by decision, without any governing framework for how the agent should reason, act, recover from errors, or hand off work to other agents. Without structure, agent behavior is hard to predict, harder to debug, and nearly impossible to improve systematically. The problem compounds in multi-step workflows, where a bad decision early in a run affects every step that follows.
Agentic design patterns are reusable approaches for recurring problems in agentic system design. They help establish how an agent reasons before acting, how it evaluates its own outputs, how it selects and calls tools, how multiple agents divide responsibility, and when a human needs to be in the loop. Choosing the right pattern for a given task is what makes agent behavior predictable, debuggable, and composable as requirements grow.
This article offers a practical roadmap to understanding agentic AI design patterns. It explains why pattern selection is an architectural decision and then works through the core agentic design patterns used in production today. For each, it covers when the pattern fits, what trade-offs it carries, and how patterns layer together in real systems. — Read More
The golden rules of agent-first product engineering
Companies building for agents often treat them as a bolt-on feature.
This is a mistake.
Agents today are more like a new form factor – an interaction layer that sits between your product and your users.
That means you need to build for agents as a primary surface, not an afterthought.
… We learned this the hard way and overhauled our AI architecture two times in the last year. Now, our agent and MCP have 6K+ daily active users.
Here are the golden rules of agent-first product engineering we learned along the way.
1. Let agents do everything users can
2. Meet agents at their level of abstraction
3. Front-load universal context
4. Writing skills is a human skill
5. Treat agents like real users
Read More