What can agents actually do?

There’s a lot of excitement about what AI (specifically the latest wave of LLM-anchored AI) can do, and how AI-first companies are different from the prior generations of companies. There are a lot of important and real opportunities at hand, but I find that many of these conversations occur at such an abstract altitude that they border on meaningless. Sort of like saying that your company could be much better if you merely adopted more software. That’s certainly true, but it’s not a particularly helpful claim.

This post is an attempt to concisely summarize how AI agents work, apply that summary to a handful of real-world use cases for AI, and to generally make the case that agents are a multiplier on the quality of your software and system design. If your software or systems are poorly designed, agents will only cause harm. If there’s any meaningful definition of an AI-first company, it must be companies where their software and systems are designed with an immaculate attention to detail.

By the end of this writeup, my hope is that you’ll be well-armed to have a concrete discussion about how LLMs and agents could change the shape of your company, and to avoid getting caught up in the needlessly abstract discussions that are often taking place today. — Read More

#devops

The American DeepSeek Project

While America has the best AI models in Gemini, Claude, o3, etc. and the best infrastructure with Nvidia it’s rapidly losing its influence over the future directions of AI that unfold in the open-source and academic communities. Chinese organizations are releasing the most notable open models and datasets across all modalities, from text to robotics or video, and at the same time it’s common for researchers worldwide to read far more new research papers from Chinese organizations rather than their Western counterparts.

This balance of power has been shifting rapidly in the last 12 months and reflects shifting, structural advantages that Chinese companies have with open-source AI — China has more AI researchers, data, and an open-source default.

On the other hand, America’s open technological champions for AI, like Meta, are “reconsidering their open approach” after yet another expensive re-org and the political environment is dramatically reducing the interest of the world’s best scientists in coming to our country. — Read More

#china-vs-us

Software engineering with LLMs in 2025: reality check

How are devs at AI startups and in Big Tech using AI tools, and what do they think of them? A broad overview of the state of play in tooling, with Anthropic, Google, Amazon, and others.

LLMs are a new tool for building software that us engineers should become hands-on with. There seems to have been a breakthrough with AI agents like Claude Code in the last few months. Agents that can now “use” the command line to get feedback about suggested changes: and thanks to this addition, they have become much more capable than their predecessors.

As Kent Beck put it in our conversation:

“The whole landscape of what’s ‘cheap’ and what’s ‘expensive’ has shifted.

… It’s time to experiment! — Read More

#devops

The rise of the AI-native employee

I’ve been at Lovable for five weeks and yeah… I’m not in Kansas anymore. This company operates on a completely different level – and as someone who’s spent my entire career in traditional tech, I’m seeing a very different pattern here that’s worth sharing.

Lovable is blowing past crazy revenue milestones: $1M ARR in just 8 days post-launch, $17M in 3 months, $60M in 6 months, and $80M ARR in just 7 months. With ~35 people. That’s not a typo. That’s the new normal – if you’re AI-native. And I don’t just mean the product is AI-native. I mean the people are. — Read More

#strategy

Context Engineering for Agents

As Andrej Karpathy puts it, LLMs are like a new kind of operating system. The LLM is like the CPU and its context window is like the RAM, serving as the model’s working memory. Just like RAM, the LLM context window has limited capacity to handle various sources of context. And just as an operating system curates what fits into a CPU’s RAM, we can think about “context engineering” playing a similar role. Karpathy summarizes this well:

[Context engineering is the] ”…delicate art and science of filling the context window with just the right information for the next step.”

What are the types of context that we need to manage when building LLM applications? Context engineering as an umbrella that applies across a few different context types:

Tools – feedback from tool calls
Instructions – prompts, memories, few‑shot examples, tool descriptions, etc
Knowledge – facts, memories, etc

Read More

#nlp

The End of Moore’s Law for AI? Gemini Flash Offers a Warning

For the past few years, the AI industry has operated under its own version of Moore’s Law: an unwavering belief that the cost of intelligence would perpetually decrease by orders of magnitude each year. Like clockwork, each new model generation promised to be not only more capable but also cheaper to run. Last week, Google quietly broke that trend.

In a move that at first went unnoticed, Google significantly increased the price of its popular Gemini 2.5 Flash model. The input token price doubled from $0.15 to $0.30 per million tokens, while the output price more than quadrupled from $0.60 to $2.50 per million. Simultaneously, they introduced a new, less capable model, “Gemini 2.5 Flash Lite”, at a lower price point.

This is the first time a major provider has backtracked on the price of an established model. While it may seem like a simple adjustment, we believe this signals a turning point. The industry is no longer on an endless downward slide of cost. Instead, we’ve hit a fundamental soft floor on the cost of intelligence, given the current state of hardware and software. — Read More

#strategy

Machine Mental Imagery: Empower MultimodalReasoning with Latent Visual Tokens

Vision-language models (VLMs) excel at multimodal understanding, yet their text-only decoding forces them to verbalize visual reasoning, limiting performance on tasks that demand visual imagination. Recent attempts train VLMs to render explicit images, but the heavy image-generation pre-training often hinders the reasoning ability. Inspired by the way humans reason with mental imagery-the internal construction and manipulation of visual cues-we investigate whether VLMs can reason through interleaved multimodal trajectories without producing explicit images. To this end, we present a Machine Mental Imagery framework, dubbed as Mirage, which augments VLM decoding with latent visual tokens alongside ordinary text. Concretely, whenever the model chooses to “think visually”, it recasts its hidden states as next tokens, thereby continuing a multimodal trajectory without generating pixel-level images. Begin by supervising the latent tokens through distillation from ground-truth image embeddings, we then switch to text-only supervision to make the latent trajectory align tightly with the task objective. A subsequent reinforcement learning stage further enhances the multimodal reasoning capability. Experiments on diverse benchmarks demonstrate that Mirage unlocks stronger multimodal reasoning without explicit image generation. — Read More

#vision

Gemini 2.5 for robotics and embodied intelligence

The latest generation of Gemini models, 2.5 Pro and Flash, are unlocking new frontiers in robotics. Their advanced coding, reasoning, and multimodal capabilities, now combined with spatial understanding, provide the foundation for the next generation of interactive and intelligent robots.

This post explores how developers can leverage Gemini 2.5 to build sophisticated robotics applications. — Read More

#robotics

The Path to Medical Superintelligence

The Microsoft AI team shares research that demonstrates how AI can sequentially investigate and solve medicine’s most complex diagnostic challenges—cases that expert physicians struggle to answer.

Benchmarked against real-world case records published each week in the New England Journal of Medicine, we show that the Microsoft AI Diagnostic Orchestrator (MAI-DxO) correctly diagnoses up to 85% of NEJM case proceedings, a rate more than four times higher than a group of experienced physicians. MAI-DxO also gets to the correct diagnosis more cost-effectively than physicians. — Read More

#human

Fei-Fei Li: Spatial Intelligence is the Next Frontier in AI

Read More

#videos