Every time an LLM generates a response, two operations run in sequence on the same GPU. The first processes the input prompt and emits a single token. The second produces every token after that, one at a time.
From the outside, they look like stages of one process. However, inside the hardware, they have opposite bottlenecks. One is limited by raw compute. The other is limited by how fast data moves through memory. Most of the engineering work that makes production AI systems fast exists because of this split, and the techniques used to handle it are what inference engineering is built around.
Inference engineering is the discipline of running trained AI models in production efficiently. The work spans low-level GPU code, model serving frameworks, and the cloud infrastructure that ties them together. — Read More
Daily Archives: June 16, 2026
The Once And Future Fable #2
On Friday evening the United States Government has forced Anthropic to take down all access to Fable and Mythos.
It’s been a rough weekend.
Dean W. Ball: One thing about AI regulation being haphazardly imposed on just-released, highly performant models is that in a very real sense, the government just made my world *dumber.* In some impressionistic sense I almost always think this is true of government, but here it is literal. — Read More
Facebook Gets its Own AI Mode That Turns Public Posts and Reels into a Search Engine
Meta has officially introduced Facebook AI Mode for US users, transforming the standard search bar into a conversational tool that answers questions by mining public Group discussions, Reels, and Marketplace data. While the update aims to increase platform engagement and support Meta’s expanding subscription tiers, it faces scrutiny regarding data privacy and the accuracy of crowd-sourced AI summaries. — Read More
Sakana Marlin
We are excited to introduce Sakana Marlin, our first commercial product—an autonomous research assistant for business, built on our long-horizon reasoning technology. Give it a research topic, and Marlin works autonomously for up to roughly eight hours, crafting a detailed strategy report up to a hundred pages long, along with executive summary slides.
Sakana Marlin is designed to take on the kind of substantial strategy research that a Chief Strategy Officer (CSO) and a small team might otherwise spend weeks on. — Read More
Anthropic’s Safety Superpower
I’m sympathetic to the cynics who consistently characterize Anthropic’s public statements, particularly those surrounding their model releases, as scare-mongering for the sake of marketing. It was only two months ago that Anthropic announced Mythos Preview, a model that they said was too dangerous to make publicly available, thanks in particular to its advanced cybersecurity capabilities. Then, two months later, the company publicly released Fable, a version of Mythos with various safety guardrails.
Fable is, in my limited experience, a very impressive model. It’s increasingly difficult to objectively evaluate models for anything other than coding performance, but there is subjective feel, and I found my interactions with Fable to be extremely impressive; it made other models, including GPT 5.5 and Opus 4.8, feel small and dumb. The two times I felt that way previously were with GPT-4 and Grok 4, both of which represented new generations in terms of base model size and complexity; my sense is that Fable is downstream of a new pre-train and the first of a new generation.
To that end, I can certainly buy the case that Fable/Mythos is in fact more capable when it comes to identifying and exploiting security issues, and that Anthropic’s cautious roll-out was justified. The problem with publicly releasing models, however, is that guardrails can be jailbroken, and apparently that is exactly what happened shortly after the release. — Read More
Wi-Fi Flies Higher As Edge AI Build-Out Takes Root
The accelerating build-out of edge AI is starting to redefine how people interact with AI, shifting the focus from massive global data mining and analysis in huge AI data centers to faster results, greater efficiency, and much more targeted workloads at the edge.
In both cases, the emphasis is still on processing and moving data at blazingly fast speeds. But at the edge, there is less data to process, and the distances that data has to travel are shorter. Hyperscalers emphasize contextual search, massive simulations, and training of large language models. At the edge, goal may be as limited as feeding commands to a robot about how much pressure is needed to pick up an object, or telling a car to jam on the brakes because a pedestrian just darted across the street. Small language models that are domain- and workload-specific replace more generalized capabilities in LLMs.
There is demand for both, but as the edge takes shape, it is beginning to look very different from what OpenAI or Anthropic does. — Read More
Agentic Code Review
Coding agents are extraordinarily good now, and getting better fast. The interesting consequence is that the hard part of engineering moved from writing code to deciding whether to trust it, which makes review the most leveraged skill in software right now. How you approach it depends enormously on who you are: a solo developer with no users and a team maintaining a ten-year-old application are not solving the same problem.
… Code review used to work because of a happy accident of relative speed. A senior engineer could read code faster than a junior could write it, so review kept pace without anyone designing it to, and the team absorbed how the system fit together as a side effect of reading each other’s diffs. A lot of that was not deliberate. It fell out of a single fact: writing code was the slow, expensive part, and reading it was cheap and fast.
That fact no longer holds. — Read More
The FCC Wants to Eliminate Burner Phones
A proposed FCC rule would kill burner phones: phones whose accounts are not attached to a particular person.
The FCC plans to do this by legally forcing the country’s telecoms to store a wealth of personal information about essentially all phone customers, including a government issued identification number and their physical address, alarming privacy advocates and civil rights activists who compare the measures to those from authoritarian countries where it can be difficult to buy a mobile phone plan without giving up your identity. — Read More