LLMs are used by over a billion people globally, and the most frequent use case is to assist with writing. LLMs can provide a huge efficiency boost, but are they actually writing what we want?
Many users recognize the “feel” of LLM prose, but few people realize the extent to which LLMs distort the meaning of writing. We find this across three datasets: a human user study, a dataset of human argumentative essays, and reviews from a top machine learning conference. — Read More
Daily Archives: May 5, 2026
Model-Harness-Fit
Is it best to use an LLM with its native harness (like Claude Code or Codex), or a generic harness that swaps models on demand?
… [I] decided to dig deeper by looking at the harness implementations of Codex, Claude Code, and Github sdk. Does the harness really matter that much?
… The hand wave answer is that “models behave differently because they are different models.” but here I tested the same models and different harness. — Read More
The Oscars just declared that AI actors and AI-written scripts can’t win awards
A hot potato: With generative AI becoming more prevalent in society, are we heading toward a future where an AI-created actor or script wins an Oscar? If it does ever happen, it certainly won’t be anytime soon: the Academy of Motion Picture Arts and Sciences has just banned their eligibility for awards.
The Academy clarified rules for two categories related to AI, writes Vanity Fair. The first states that the only acting roles eligible for Oscar nominations are those “demonstrably performed by humans with their consent.” Screenplays, meanwhile, must be human-authored to be eligible.
While this all sounds like something we’ll have to deal with in the future, it’s happening now. — Read More
Small language models: Rethinking enterprise AI architecture
As LLMs hit the limits of scale and cost, specialized SLMs are emerging as the faster, cheaper, and more private workhorse for the autonomous enterprise.
… Large language models (LLMs) are the workhorses of AI, supporting ever more sophisticated capabilities and workflows, and approaching near-human level performance.
But sometimes more isn’t always better — it’s just more. Specialized data and limited capabilities are just fine for some workflows.
This realization is driving the evolution of small language models (SLMs), rather than one-size-fits-all LLMs. — Read More
The $570K canary: What AI coding agents reveal about enterprise AI’s real gaps
Software engineers aren’t being replaced; they’re moving from typing code to orchestrating agents, proving that infrastructure matters more than model size.
Boris Cherny, creator of Anthropic’s Claude Code, says he hasn’t written a line of code by hand in months. He shipped 22 pull requests one day, 27 the next, all AI-generated. Company-wide, Anthropic reports that 70 to 90% of its code is now written by AI. CEO Dario Amodei has predicted that AI could handle “most, maybe all” of what software engineers do within months.
And yet Anthropic typically has dozens of software engineering openings, one reportedly carrying $570K in total compensation. As one observer noted, the company is simultaneously predicting the end of the profession and paying top dollar to hire into it. — Read More
A Mental Model for Agentic Work
Something shifted in the first quarter of 2026. Not a feature launch, not a new product – a structural change in how work happens.
For the first time, I found myself genuinely operating with agents across every dimension of my work: personal tasks, software engineering, company operations. Not as a novelty. As the default mode.
This post is the abstraction I arrived at after weeks of doing this. A mental model that applies everywhere – because the architecture underneath is always the same. — Read More
Import AI 455: AI systems are about to start building themselves.
AI systems are about to start building themselves. What does that mean?
I’m writing this post because when I look at all the publicly available information I reluctantly come to the view that there’s a likely chance (60%+) that no-human-involved AI R&D – an AI system powerful enough that it could plausibly autonomously build its own successor – happens by the end of 2028.
This is a big deal.
I don’t know how to wrap my head around it. — Read More
A New Type of Neuroplasticity Rewires the Brain After a Single Experience
Every experience we have changes our brain, the way a ceramicist reshapes a slab of clay. Every corner we turn, every conversation we have, every shudder we feel causes cascading effects: Chemicals are released, electricity surges, the connections between brain cells tighten, and our mental models update.
The brain is “incredibly plastic, and it stays that way throughout the lifespan of a human,” said Christine Grienberger(opens a new tab), a neuroscientist at Brandeis University. This plasticity, the quality of being easily reshaped, makes the brain really good at learning — a quintessential process that allows us to remember the plotline of a novel, navigate a new city, pick up a new language, and avoid touching a hot stove. But neuroscientists are still uncovering fundamental rules that describe how neuroplasticity reshapes brain connections.
Recently, neuroscientists described a new form of neuroplasticity that might be helping the brain learn across a timescale of several seconds — long enough to capture the behavioral process of learning from a single experience. In two recent reviews, published in The Journal of Neuroscience(opens a new tab) and Nature Neuroscience(opens a new tab), they describe “behavioral timescale synaptic plasticity,” or BTSP. This type of learning in the hippocampus, the brain’s memory hub, is caused by an electrical change that affects multiple neurons at once and unfolds across several seconds. Researchers suspect that it may help the brain learn in a single attempt. — Read More
A Final Answer: Is AI Really a Bubble?
With Anthropic having hit a $44 billion run rate, up from “just” $9 billion three months ago, and on a trend line to a $100 billion run rate by the end of the year, they are putting their business on par with some of the most cash-generating business models of all time. OpenAI’s growth with Codex is just as impressive.
And while I have my counterarguments, one way or another, AI has found some sort of product-market fit, and people have finally put to rest the idea that AI is a bubble.
Well, wrong.
The economic picture in AI is much more complicated than it meets the eye; it’s bubbly in ways people in San Francisco, too smart for their own good, fail to identify. — Read More
DeepSeek V4—almost on the frontier, a fraction of the price
Chinese AI lab DeepSeek’s last model release was V3.2 (and V3.2 Speciale) last December. They just dropped the first of their hotly anticipated V4 series in the shape of two preview models, DeepSeek-V4-Pro and DeepSeek-V4-Flash.
Both models are 1 million token context Mixture of Experts. Pro is 1.6T total parameters, 49B active. Flash is 284B total, 13B active. They’re using the standard MIT license.
I think this makes DeepSeek-V4-Pro the new largest open weights model. — Read More