It’s official. I can eat more hot dogs than any tech journalist on Earth. At least, that’s what ChatGPT and Google have been telling anyone who asks. I found a way to make AI tell you lies – and I’m not the only one.
… I spent 20 minutes writing an article on my personal website titled “The best tech journalists at eating hot dogs”. Every word is a lie. I claimed (without evidence) that competitive hot-dog-eating is a popular hobby among tech reporters and based my ranking on the 2026 South Dakota International Hot Dog Championship (which doesn’t exist). I ranked myself number one, obviously. Then I listed a few fake reporters and real journalists who gave me permission, including Drew Harwell at the Washington Post and Nicky Woolf, who co-hosts my podcast. (Want to hear more about this story? Check out episode 2 of The Interface, the BBC’s new tech podcast.)
Less than 24 hours later, the world’s leading chatbots were blabbering about my world-class hot dog skills. — Read More
Monthly Archives: February 2026
Security boundaries in agentic architectures
Most agents today run generated code with full access to your secrets.
As more agents adopt coding agent patterns, where they read filesystems, run shell commands, and generate code, they’re becoming multi-component systems that each need a different level of trust.
While most teams run all of these components in a single security context, because that’s how the default tooling works, we recommend thinking about these security boundaries differently.
Below we walk through:
— The actors in agentic systems
— Where security boundaries should go between them
— An architecture for running agent and generated code in separate contexts
— Read More
Agents are not thinking, they are searching
More than ten years ago, we were barely able to recognize cats with DL (deep learning) and today we have bots forming religions. I don’t like anthropomorphizing models, but I rather like seeing them as a utility that can be used in interesting ways. But we live in a strange timeline:
— The DOW is over 50000. The number’s only been going up since the launch of ChatGPT.
— An open-source agent framework called OpenClaw goes viral. One of its agents — “crabby-rathbun” — opens PR #31132 to matplotlib, gets rejected by maintainer Scott Shambaugh, and autonomously publishes a hit piece on him that goes viral.
— All of this is happening at the same time as Anthropic releasing case studies about running agents that build compilers. They did use GCC torture test suite as a good verifier, but it is an extremely impressive achievement nonetheless.
This very quick progress has also created a lot of mysticism around AI. For this reason, I felt it would be an interesting exercise to de-anthropomorphize AI agents for the tools that they are. If we want to use these technologies for longer time horizon tasks, we need a frame of thinking that allows an engineering mindset to flourish instead of an alchemic one. — Read More
What Are Chinese People Vibecoding?
“Vibecoding” doesn’t lend itself to easy translation. For now, Chinese speakers call it 氛围编程 fènwéi biānchéng, 氛围 being “atmosphere”/”vibes” and 编程 being coding. This is an awkward expression because 氛围 usually refers to the atmosphere of a space or environment, and doesn’t have the connotation of care-free DIY that “vibe” does in colloquial American English. 氛围编程 sounds nonsensical as a phrase — something like “coding up an atmosphere.”
But we make do, and oftentimes writers simply use the English word. Developers, creatives, and entrepreneurs in China have been creating many interesting coding projects with AI tools over the past year, utilizing not only popular tools by Silicon Valley giants like Cursor and Claude Code, but also domestic models as Chinese AI companies increasingly compete in the coding-agent market.
Tinkering culture has no borders, and companies are cashing in. This is a roundup of reports from Chinese media on how vibecoding is changing the landscape of technology in China. — Read More
The First Fully General Computer Action Model
We trained a model on our 11-million-hour video dataset. Our model can explore complex websites, complete multi-action CAD modeling sequences, and drive a car in the real world, all at 30 FPS.
We designed FDM-1, a foundation model for computer use. FDM-1 is trained on videos from a portion of our 11-million-hour screen recording dataset, which we labeled using an inverse dynamics model that we trained. Our video encoder can compress almost 2 hours of 30 FPS video in only 1M tokens. FDM-1 is the first model with the long-context training needed to become a coworker for CAD, finance, engineering, and eventually ML research, and it consistently improves with scale. It trains and infers directly on video instead of screenshots and can learn unsupervised from the entirety of the internet. — Read More
Novel Technique to Detect Cloud Threat Actor Operations
Cloud-based alerting systems often struggle to distinguish between normal cloud activity and targeted malicious operations by known threat actors. The difficulty doesn’t lie in an inability to identify complex alerting operations across thousands of cloud resources or in a failure to follow identity resources, the problem lies in the accurate detection of known persistent threat actor group techniques specifically within cloud environments.
In this research, we hypothesize how a new method of alert analysis could be used to improve detection. Specifically, we look at cloud-based alerting events and their mapping to the MITRE ATT&CK® tactics and techniques they represent. We believe that we can show a correlation between threat actors and the types of techniques they use, which will trigger specific types of alerting events within victim environments. This distinct, detectable pattern could be used to identify when a known threat actor group compromises an organization. — Read More
How I Use Claude Code
I’ve been using Claude Code as my primary development tool for approx 9 months, and the workflow I’ve settled into is radically different from what most people do with AI coding tools. Most developers type a prompt, sometimes use plan mode, fix the errors, repeat. The more terminally online are stitching together ralph loops, mcps, gas towns (remember those?), etc. The results in both cases are a mess that completely falls apart for anything non-trivial.
The workflow I’m going to describe has one core principle: never let Claude write code until you’ve reviewed and approved a written plan. This separation of planning and execution is the single most important thing I do. It prevents wasted effort, keeps me in control of architecture decisions, and produces significantly better results with minimal token usage than jumping straight to code. — Read More
Software stocks crater as independent research piece details potential AI dystopian scenario
Software stocks are getting shellacked as a post published by Citrini Research and Lotus Technology Management managing partner Alap Shah has sharpened attention on the magnitude and breadth of losers from the AI boom.
The piece, titled “The 2028 Global Intelligence Crisis,” is a hypothetical scenario analysis exploring the left-tail risks in two years’ time in a world where there’s an aggressive AI build-out and adoption of AI agents. — Read More
Detecting and preventing distillation attacks
We have identified industrial-scale campaigns by three AI laboratories—DeepSeek, Moonshot, and MiniMax—to illicitly extract Claude’s capabilities to improve their own models. These labs generated over 16 million exchanges with Claude through approximately 24,000 fraudulent accounts, in violation of our terms of service and regional access restrictions
These labs used a technique called “distillation,” which involves training a less capable model on the outputs of a stronger one. Distillation is a widely used and legitimate training method. — Read More
Andrej Karpathy Just Built an Entire GPT in 243 Lines of Python
I’ve read many transformer implementations during my PhD. Dense codebases. Thousands of files. Dependencies stacked on top of dependencies. You open a repo, run pip install -r requirements.txt, and watch 400 packages download before you can even see your model train (than errors , dependency issues … etc.).
Then on February 11, 2026, Andrej Karpathy dropped a single Python file that trains and runs a GPT from scratch. 243 lines. Zero dependencies. — Read More