Software engineers aren’t being replaced; they’re moving from typing code to orchestrating agents, proving that infrastructure matters more than model size.
Boris Cherny, creator of Anthropic’s Claude Code, says he hasn’t written a line of code by hand in months. He shipped 22 pull requests one day, 27 the next, all AI-generated. Company-wide, Anthropic reports that 70 to 90% of its code is now written by AI. CEO Dario Amodei has predicted that AI could handle “most, maybe all” of what software engineers do within months.
And yet Anthropic typically has dozens of software engineering openings, one reportedly carrying $570K in total compensation. As one observer noted, the company is simultaneously predicting the end of the profession and paying top dollar to hire into it. — Read More
Recent Updates Page 4
A Mental Model for Agentic Work
Something shifted in the first quarter of 2026. Not a feature launch, not a new product – a structural change in how work happens.
For the first time, I found myself genuinely operating with agents across every dimension of my work: personal tasks, software engineering, company operations. Not as a novelty. As the default mode.
This post is the abstraction I arrived at after weeks of doing this. A mental model that applies everywhere – because the architecture underneath is always the same. — Read More
Import AI 455: AI systems are about to start building themselves.
AI systems are about to start building themselves. What does that mean?
I’m writing this post because when I look at all the publicly available information I reluctantly come to the view that there’s a likely chance (60%+) that no-human-involved AI R&D – an AI system powerful enough that it could plausibly autonomously build its own successor – happens by the end of 2028.
This is a big deal.
I don’t know how to wrap my head around it. — Read More
A New Type of Neuroplasticity Rewires the Brain After a Single Experience
Every experience we have changes our brain, the way a ceramicist reshapes a slab of clay. Every corner we turn, every conversation we have, every shudder we feel causes cascading effects: Chemicals are released, electricity surges, the connections between brain cells tighten, and our mental models update.
The brain is “incredibly plastic, and it stays that way throughout the lifespan of a human,” said Christine Grienberger(opens a new tab), a neuroscientist at Brandeis University. This plasticity, the quality of being easily reshaped, makes the brain really good at learning — a quintessential process that allows us to remember the plotline of a novel, navigate a new city, pick up a new language, and avoid touching a hot stove. But neuroscientists are still uncovering fundamental rules that describe how neuroplasticity reshapes brain connections.
Recently, neuroscientists described a new form of neuroplasticity that might be helping the brain learn across a timescale of several seconds — long enough to capture the behavioral process of learning from a single experience. In two recent reviews, published in The Journal of Neuroscience(opens a new tab) and Nature Neuroscience(opens a new tab), they describe “behavioral timescale synaptic plasticity,” or BTSP. This type of learning in the hippocampus, the brain’s memory hub, is caused by an electrical change that affects multiple neurons at once and unfolds across several seconds. Researchers suspect that it may help the brain learn in a single attempt. — Read More
A Final Answer: Is AI Really a Bubble?
With Anthropic having hit a $44 billion run rate, up from “just” $9 billion three months ago, and on a trend line to a $100 billion run rate by the end of the year, they are putting their business on par with some of the most cash-generating business models of all time. OpenAI’s growth with Codex is just as impressive.
And while I have my counterarguments, one way or another, AI has found some sort of product-market fit, and people have finally put to rest the idea that AI is a bubble.
Well, wrong.
The economic picture in AI is much more complicated than it meets the eye; it’s bubbly in ways people in San Francisco, too smart for their own good, fail to identify. — Read More
DeepSeek V4—almost on the frontier, a fraction of the price
Chinese AI lab DeepSeek’s last model release was V3.2 (and V3.2 Speciale) last December. They just dropped the first of their hotly anticipated V4 series in the shape of two preview models, DeepSeek-V4-Pro and DeepSeek-V4-Flash.
Both models are 1 million token context Mixture of Experts. Pro is 1.6T total parameters, 49B active. Flash is 284B total, 13B active. They’re using the standard MIT license.
I think this makes DeepSeek-V4-Pro the new largest open weights model. — Read More
Designing, Refining, and Maintaining Agent Skills at Perplexity
Perplexity’s frontier agent products rest on a foundation of know-how and domain expertise packaged in modular Agent Skills. We maintain a carefully curated library of Skills across our technical environments. These Skills include many of the general-purpose utilities powering Perplexity Computer; vertical-specific capabilities in areas such as finance, law, and health; and a very long tail of modules for addressing user needs. Some Skills are infrequently invoked but critical when invoked. To ensure a consistently excellent user experience, Perplexity’s Agents team prioritizes Skill quality just as much as code quality.
The intuitions and best practices required to develop a high-quality Skill differ significantly from those required to build traditional software. The Agents team reviews many pull requests from excellent engineers who develop Skills in the course of their work. The result is almost always numerous comments and suggestions for revision. This is because many useful patterns for writing code become antipatterns in Skill creation. — Read More
Leveraging Verifier-Based Reinforcement Learning in Image Editing
While Reinforcement Learning from Human Feedback (RLHF) has become a pivotal paradigm for text-to-image generation, its application to image editing remains largely unexplored. A key bottleneck is the lack of a robust general reward model for all editing tasks. Existing edit reward models usually give overall scores without detailed checks, ignoring different instruction requirements and causing biased rewards. To address this, we argue that the key is to move from a simple scorer to a reasoning verifier. We introduce Edit-R1, a framework that builds a chain-of-thought (CoT) verifier-based reasoning reward model (RRM) and then leverages it for downstream image editing. The Edit-RRM breaks instructions into distinct principles, evaluates the edited image against each principle, and aggregates these checks into an interpretable, fine-grained reward. To build such an RRM, we first apply supervised fine-tuning (SFT) as a “cold-start” to generate CoT reward trajectories. Then, we introduce Group Contrastive Preference Optimization (GCPO), a reinforcement learning algorithm that leverages human pairwise preference data to reinforce our pointwise RRM. After building the RRM, we use GRPO to train editing models with this non-differentiable yet powerful reward model. Extensive experiments demonstrate that our Edit-RRM surpasses powerful VLMs such as Seed-1.5-VL and Seed-1.6-VL as an editing-specific reward model, and we observe a clear scaling trend, with performance consistently improving from 3B to 7B parameters. Moreover, Edit-R1 delivers gains to editing models like FLUX.1-kontext, highlighting its effectiveness in enhancing image editing. — Read More
Salesforce launches Agentforce Operations to fix the workflows breaking enterprise AI
Enterprise AI teams are hitting a wall — not because their models can’t reason, but because the workflows underneath them were never built for agents. Tasks fail, handoffs break, and the problem compounds as organizations push agents deeper into back-office systems. A new architectural layer is emerging to address it: workflow execution control planes that impose deterministic structure on processes agents are expected to run.
One of the companies bringing this to the forefront is Salesforce, with a new workflow platform that turns back-office workflows into a set of tasks for specialized agents to complete. Users can upload their processes or use one of the set Blueprints provided by Salesforce, and Agentforce Operations will break it down for agents. — Read More
Agent Skills
The default behaviour of any AI coding agent is to take the shortest path to “done.” Ask for a feature and it writes the feature. It does not ask whether you have a spec, write a test before the implementation, consider whether the change crosses a trust boundary, or check what the PR will look like to a reviewer. It produces code, declares victory, and moves on.
This is the same failure mode every senior engineer has spent their career learning to avoid. The senior version of any task includes work that doesn’t show up in the diff: surfacing assumptions, writing the spec, breaking the work into reviewable chunks, choosing the boring design, leaving evidence that the result is correct, sizing the change so a human can actually review it. Those steps are most of what separates engineers who ship reliable software at scale from people who push code that breaks.
Agents skip those steps for the same reason any junior would. They’re invisible. The reward signal points at “task complete” not “task complete and the design doc exists.” So we have to bolt the senior-engineer scaffolding back on.
Agent Skills is my attempt at that scaffolding. It just crossed 26K stars, so apparently I’m not alone in wanting it. This post is the part the README doesn’t quite cover: why each design choice exists, how it maps onto standard SDLC and Google’s published engineering practices, and what you should steal from the project even if you never install a single skill. — Read More