Language models can solve tough math problems at research grade but struggle on simple computational tasks that involve reasoning over many steps and long context. Even multiplying two numbers or solving small Sudokus is nearly impossible unless they rely on external tools.
But what does it take for an LLM itself to be as reliable and efficient as a computer?
We answer this by literally building a computer inside a transformer. We turn arbitrary C code into tokens that the model itself can execute reliably for millions of steps in seconds. — Read More
Daily Archives: March 16, 2026
The context problem: Why enterprise AI needs more than foundation models
Ask an AI coding assistant to, say, “build a React component with a dropdown menu,” and you’ll probably get something impressive in seconds—clean code, proper hooks, accessible markup. It’s the kind of demo that makes CTOs lean forward in their chairs.
Now ask that same AI about your company’s internal API for user authentication. Ask it to integrate with your legacy billing system. Ask it why your team deprecated a particular approach last quarter. Watch it hallucinate with confidence, suggesting endpoints that don’t exist, recommending patterns your architecture explicitly forbids, and generally ignoring the hard-won institutional knowledge that makes your systems actually, you know, work.
This is the enterprise AI paradox: Foundation models know everything about public libraries but precious little about the specifics that matter for your business. They’re trained on millions of open source repositories, but they’ve never seen your codebase. They can regurgitate best practices from popular engineering blogs, but they fail to grasp why those practices might be impossible in your environment. Without context—the community-vetted, institutional knowledge behind business decisions—AI assistants remain dangerously confident when they shouldn’t be. — Read More
How Karpathy’s Autoresearch Works And What You Can Learn From It
Most “autonomous AI research” demos look impressive for the same reason magic tricks do: you only see the interesting part. An agent edits some code, runs an experiment, and shows a better result. What you usually do not see is the part that actually determines whether the system is useful: what is the harness optimizing for, how stable is the evaluation, and what happens when the agent fails?
That is why Karpathy’s Autoresearch is worth paying attention to.X
Autoresearch is not trying to be a general-purpose AI scientist. It is a small, tightly constrained system for one specific job: let an agent modify a training script, run a bounded experiment, measure the result, keep the change if it helps, and discard it if it does not. The repo is tiny, but the design behind it is one of the cleanest examples I have seen of how to build a useful autonomous improvement harness. — Read More
The “Night Shift” Agentic Workflow
Since December, 2025, I’ve been integrating AI agents into my coding workflow.
Previous attempts at agentic workflows have left me exhausted, overwhelmed, and feeling out of touch with the systems I was building. They also degraded quality too much.
My current agentic workflow is about 5x faster, better quality, I understand the system better, and I’m having fun again.
I call this the Night Shift workflow. — Read More
MCP is Dead; Long Live MCP!
There is currently a social media and industry zeitgeist dialed-in on CLIs…just as there was a moment for MCP but just a few short months ago
While it is true that there are token savings to be had by using a CLI, many folks have not considered how agents using custom CLIs run into the same context problem as MCP, except now without structure and many other sacrifices
In much of the discourse, there is a lack of distinction between local MCP over stdio versus server MCP over HTTP; the latter is a very different use case
… The oversight made by many is that individual usage of coding agents looks very different from organizational adoption of coding agents where there is an emphasis on visibility, telemetry, security, quality, and being able to operationalize and maintain agent-coded systems by a team of varying degrees of skill and experience.
For enterprise and org-level use cases, MCP is the present and future and teams need to be able to cut through the hype of the moment. — Read More
5 design skills to sharpen in the AI era
AI is reshaping the way products are made: It’s accelerating exploration, lowering barriers to entry, and widening the circle of who can participate in the design process. In response, teams are honing new skills to meet the moment. In our recent report State of the Designer 2026, we asked the design community which skills matter most to them in the age of AI. Here, we’re sharing what those skills are—and how to perfect them. — Read More
Academia and the “AI Brain Drain”
In 2025, Google, Amazon, Microsoft and Meta collectively spent US$380 billion on building artificial-intelligence tools. That number is expected to surge still higher this year, to $650 billion, to fund the building of physical infrastructure, such as data centers (see go.nature.com/3lzf79q). Moreover, these firms are spending lavishly on one particular segment: top technical talent.
Meta reportedly offered a single AI researcher, who had cofounded a start-up firm focused on training AI agents to use computers, a compensation package of $250 million over four years (see go.nature.com/4qznsq1). Technology firms are also spending billions on “reverse-acquihires”—poaching the star staff members of start-ups without acquiring the companies themselves. Eyeing these generous payouts, technical experts earning more modest salaries might well reconsider their career choices.
Academia is already losing out. — Read More