The context problem: Why enterprise AI needs more than foundation models

Ask an AI coding assistant to, say, “build a React component with a dropdown menu,” and you’ll probably get something impressive in seconds—clean code, proper hooks, accessible markup. It’s the kind of demo that makes CTOs lean forward in their chairs.

Now ask that same AI about your company’s internal API for user authentication. Ask it to integrate with your legacy billing system. Ask it why your team deprecated a particular approach last quarter. Watch it hallucinate with confidence, suggesting endpoints that don’t exist, recommending patterns your architecture explicitly forbids, and generally ignoring the hard-won institutional knowledge that makes your systems actually, you know, work.

This is the enterprise AI paradox: Foundation models know everything about public libraries but precious little about the specifics that matter for your business. They’re trained on millions of open source repositories, but they’ve never seen your codebase. They can regurgitate best practices from popular engineering blogs, but they fail to grasp why those practices might be impossible in your environment. Without context—the community-vetted, institutional knowledge behind business decisions—AI assistants remain dangerously confident when they shouldn’t be. — Read More

#strategy

How Karpathy’s Autoresearch Works And What You Can Learn From It

Most “autonomous AI research” demos look impressive for the same reason magic tricks do: you only see the interesting part. An agent edits some code, runs an experiment, and shows a better result. What you usually do not see is the part that actually determines whether the system is useful: what is the harness optimizing for, how stable is the evaluation, and what happens when the agent fails?

That is why Karpathy’s Autoresearch is worth paying attention to.X

Autoresearch is not trying to be a general-purpose AI scientist. It is a small, tightly constrained system for one specific job: let an agent modify a training script, run a bounded experiment, measure the result, keep the change if it helps, and discard it if it does not. The repo is tiny, but the design behind it is one of the cleanest examples I have seen of how to build a useful autonomous improvement harness. — Read More

#devops

The “Night Shift” Agentic Workflow

Since December, 2025, I’ve been integrating AI agents into my coding workflow.

Previous attempts at agentic workflows have left me exhausted, overwhelmed, and feeling out of touch with the systems I was building. They also degraded quality too much.

My current agentic workflow is about 5x faster, better quality, I understand the system better, and I’m having fun again.

I call this the Night Shift workflow. — Read More

#devops

MCP is Dead; Long Live MCP!

There is currently a social media and industry zeitgeist dialed-in on CLIs…just as there was a moment for MCP but just a few short months ago

While it is true that there are token savings to be had by using a CLI, many folks have not considered how agents using custom CLIs run into the same context problem as MCP, except now without structure and many other sacrifices

In much of the discourse, there is a lack of distinction between local MCP over stdio versus server MCP over HTTP; the latter is a very different use case

… The oversight made by many is that individual usage of coding agents looks very different from organizational adoption of coding agents where there is an emphasis on visibility, telemetry, security, quality, and being able to operationalize and maintain agent-coded systems by a team of varying degrees of skill and experience.

For enterprise and org-level use cases, MCP is the present and future and teams need to be able to cut through the hype of the moment. — Read More

#devops

5 design skills to sharpen in the AI era

AI is reshaping the way products are made: It’s accelerating exploration, lowering barriers to entry, and widening the circle of who can participate in the design process. In response, teams are honing new skills to meet the moment. In our recent report State of the Designer 2026, we asked the design community which skills matter most to them in the age of AI. Here, we’re sharing what those skills are—and how to perfect them. — Read More

#vfx

Academia and the “AI Brain Drain”

In 2025, Google, Amazon, Microsoft and Meta collectively spent US$380 billion on building artificial-intelligence tools. That number is expected to surge still higher this year, to $650 billion, to fund the building of physical infrastructure, such as data centers (see go.nature.com/3lzf79q). Moreover, these firms are spending lavishly on one particular segment: top technical talent.

Meta reportedly offered a single AI researcher, who had cofounded a start-up firm focused on training AI agents to use computers, a compensation package of $250 million over four years (see go.nature.com/4qznsq1). Technology firms are also spending billions on “reverse-acquihires”—poaching the star staff members of start-ups without acquiring the companies themselves. Eyeing these generous payouts, technical experts earning more modest salaries might well reconsider their career choices.

Academia is already losing out.  — Read More

#strategy

He Wrote 200 Lines of Code and Walked Away (What happened Next will blow your Mind)

Let me tell you a story that’s going to mess with your head a little bit.

A developer named Liyuanhao sat down and wrote 200 lines of code in Rust.

That’s it. Just a tiny, bare-bones script.

But what happened after he hit run is the kind of thing you have to read twice just to make sure you aren’t imagining things.

He named the project yoyo — a self-evolving coding agent. And then, and this is the part that genuinely gets me, he stepped away entirely. He took his hands off the keyboard.

He gave it one single instruction: evolve until you rival Claude Code. Then, he just sat back and watched. — Read More

#devops

Institutional AI vs Individual AI

AI just made every individual 10x more productive.

No company became 10x more valuable as a result.

Where did the productivity go?

This isn’t the first time this has happened.

In the 1890s, electricity promised enormous productivity gains.

Textile mills in New England, built to harness the rotational power of steam engines, quickly installed faster electric motors in their place.

But for thirty years, electrified mills saw almost no increase in output. The technology was far superior. But the organization was not.

It wasn’t until the 1920s, when factories completely redesigned the mills once again, with assembly lines, individual motors within every piece of equipment, and workers and machines executing drastically different jobs, that electrification produced meaningful returns. — Read More

#strategy

How I Use LLMs for Security Work

I’ve been using LLM tools like Claude, Cursor, and ChatGPT extensively in my security & engineering work for the past couple years. Not as a replacement for thinking—but they genuinely help me move faster through complex problems. If you’re a security analyst, SOC analyst, threat hunter or engineer who hasn’t found a rhythm with these tools yet, I’ll try to share what’s been working for me with the hope it helps you too. — Read More

#cyber

How We Hacked McKinsey’s AI Platform

McKinsey & Company — the world’s most prestigious consulting firm — built an internal AI platform called Lilli for its 43,000+ employees. Lilli is a purpose-built system: chat, document analysis, RAG over decades of proprietary research, AI-powered search across 100,000+ internal documents. Launched in 2023, named after the first professional woman hired by the firm in 1945, adopted by over 70% of McKinsey, processing 500,000+ prompts a month.

So we decided to point our autonomous offensive agent at it. No credentials. No insider knowledge. And no human-in-the-loop. Just a domain name and a dream.

Within 2 hours, the agent had full read and write access to the entire production database.Read More

#cyber