Two weeks after launching Perplexity Computer, a cloud-based AI agent that can orchestrate 20 frontier models to execute multi-step workflows autonomously, the company used its inaugural Ask 2026 developer conference in San Francisco on Wednesday to dramatically widen the platform’s reach
The centrepiece of announcement is Personal Computer: software that runs continuously on a user-supplied Mac mini, merging local files, apps, and sessions with Perplexity’s cloud-based Computer system. — Read More
Monthly Archives: March 2026
The 8 Levels of Agentic Engineering
AI’s coding ability is outpacing our ability to wield it effectively. That’s why all the SWE-bench score maxxing isn’t syncing with the productivity metrics engineering leadership actually cares about. When Anthropic’s team ships a product like Cowork in 10 days and another team can’t move past a broken POC using the same models, the difference is that one team has closed the gap between capability and practice and the other hasn’t.
That gap doesn’t close overnight. It closes in levels. 8 of them. Most of you reading this are likely past the first few, and you should be eager to reach the next one because each subsequent level is a huge leap in output, and every improvement in model capability amplifies those gains further.
Level 1: Tab Complete
Level 2: Agent IDE
Level 3: Context Engineering
Level 4: Compounding Engineering
Level 5: MCP and Skills
Level 6: Harness Engineering
Level 7: Background Agents
Level 8: Autonomous Agent Teams
— Read More
Tilly Norwood | Take The Lead (Official Music Video)
Open Weights isn’t Open Training
When I was in college, my data structures professor told a story. It went something like this:
“When I was your age, I received an assignment, and encountered an inexplicable bug. I debugged and debugged and found that adding a print statement resolved the bug. I was young like all of you, and I was certain I’d found a bug in the C compiler. Turns out the problem was me.”
The takeaway was clear: if you have a bug, it’s your fault.
This is a good heuristic for most cases, but with open source ML infrastructure, you need to throw this advice out the window. There might be features that appear to be supported but are not. If you’re suspicious about an operation or stage that’s taking a long time, it may be implemented in a way that’s efficient enough…for an 8B model, not a 1T+ one. HuggingFace is good, but it’s not always correct. Libraries have dependencies, and problems can hide several layers down the stack. Even Pytorch isn’t ground truth.
Over the past couple months, I worked on developing infrastructure to post-train and serve models cheaply. Ultimately, my team decided to develop a custom training codebase, but only after I spent a few days attempting to use existing open-source options. The following is an account of my successes and failures and what it means for open-weights models. — Read More
How to scan for vulnerabilities with GitHub Security Lab’s open source AI-powered framework
For the last few months, we’ve been using the GitHub Security Lab Taskflow Agent along with a new set of auditing taskflows that specialize in finding web security vulnerabilities. They also turn out to be very successful at finding high-impact vulnerabilities in open source projects.
As security researchers, we’re used to losing time on possible vulnerabilities that turn out to be unexploitable, but with these new taskflows, we can now spend more of our time on manually verifying the results and sending out reports. Furthermore, the severity of the vulnerabilities that we’re reporting is uniformly high. Many of them are authorization bypasses or information disclosure vulnerabilities that allow one user to login as somebody else or to access the private data of another user.
Using these taskflows, we’ve reported more than 80 vulnerabilities so far. — Read More
The “Last Mile” Problem Slowing AI Transformation
Executives are increasingly enamored with the promise of an AI-driven transformation and have invested accordingly. Most large-scale companies have initiated hundreds of pilots and provided widespread access to tools like Copilot and ChatGPT.
But while many of these pilots have succeeded individually—they’ve saved time and money, made processes more efficient—those gains haven’t scaled across the organization. Few companies have been able to fundamentally change their operating and business models around AI. — Read More
The Capability Maturity Model for AI in Design
Matt Davey, who is Chief Experience Officer at 1Password, created a useful capability maturity model for AI in design. His original model has 5 levels (Limited, Reactive, Developing, Embedded, and Leading), each of which differs along 6 characteristics (Leadership on AI, Strategy & Budgeting, AI Culture & Talent, AI Learning & Enablement, AI Agents & Automation, and AI Product Design). Thus, the model covers both the use of AI within the design process and the use of AI in the resulting product. I recommend you read the full thing, but here is a summary of Davey’s 5 capability maturity levels for AI in design.
As discussed below, I added Maturity Level 6, Symbiotic, for a more complete capability maturity ladder.
For a summary of this article, watch my short overview explainer video (YouTube, 6 min.). — Read More
Teaching LLMs to reason like Bayesians
AI systems based on large language models (LLMs) are increasingly used as agents that interact with users and the world. To do this successfully, LLMs need to construct internal representations of the world and estimate the probability that each of these representations is accurate. Take personalized recommendations, for example: the LLM needs to gradually infer the user’s preferences from their choices over the course of multiple interactions.
Bayesian inference defines the optimal way to perform such updates. By implementing this strategy, LLMs could optimize user interactions by updating their estimates of the user’s preferences as new info about the user arrives. But without specific training, LLMs often default to simple heuristics — like assuming everyone wants the cheapest option — instead of inferring a specific user’s unique preferences.
In “Bayesian teaching enables probabilistic reasoning in large language models”, we teach the LLMs to reason in a Bayesian manner by training them to mimic the predictions of the Bayesian model, which defines the optimal way to reason about probabilities. We find that this approach not only significantly improves the LLM’s performance on the particular recommendation task on which it is trained, but also enables generalization to other tasks. This suggests that this method teaches the LLM to better approximate Bayesian reasoning. More generally, our results indicate that LLMs can effectively learn reasoning skills from examples and generalize those skills to new domains. — Read More
China leads the humanoid robot race — but the U.S. still has a shot
Since the start of the year, China’s humanoid robots have made waves at home and abroad — from the Consumer Electronics Show in Las Vegas to China’s Lunar New Year Spring Gala — fueling bold claims about a new industrial revolution that would make it impossible for the U.S. to catch up.
Chinese companies now dominate the humanoid robot market, capturing over 90% of global sales with thousands of units shipped last year. While Elon Musk maintains that Tesla will ultimately lead the industry, he recently acknowledged Chinese firms as his primary competition and noted that Tesla’s Optimus robots won’t be ready for launch until at least next year.
To unpack the claims and look beyond the viral robot performances, Lian Jye Su, chief analyst at tech consulting company Omdia and the author of its latest humanoid robotics report, spoke to Rest of World at a virtual event on February 25. — Read More
The Top 100 Gen AI Consumer Apps — 6th Edition
Three years ago, we published the first edition of this list with a simple goal: identify which generative AI products were actually getting used by mainstream consumers. At the time, the distinction between “AI-first” companies and everything else was clear. ChatGPT, Midjourney, and Character.AI were purpose-built around foundation models. The rest of the software world was still figuring out what to do with the technology.
That distinction no longer holds. …From this edition onward, we’re broadening the aperture to include any consumer product where generative AI has become a core part of the experience — including CapCut, Canva, Notion, Picsart, Freepik, and Grammarly. The result is what we believe is a more accurate picture of how people actually use AI, though the bulk of the top products continue to be AI-native. — Read More