Recently at work, I had to build a feature on a tight deadline. It involved chat plus tool calling components. I didn’t give much thought to prompt caching as I was just trying to ship v0.
Following next week I started to optimise it and started realising some silly mistakes I had made under pressure. I ended up adding long user-specific data at the end of system prompt thinking that I just need to keep the longest prefix stable for a single conversation / messages array.
… I could find amazing tips for prompt caching but was unable to find a comprehensive resource on how prompt caching works under the hood. So here I am load-bearing the responsibility and suffering to write the post. Following “Be the change you want to see in the world” etc. When somebody searches “how does prompt caching work really”, my hope is this post pops-up and gives them a good idea of how prompt caching works with the bonus of learning how inference looks like at scale. — Read More
Recent Updates Page 5
The AI Race Just Flipped: Inside the MIT Study Showing China Overtaking US in Open Source Models
For the last half-decade, the prevailing narrative in Silicon Valley has been one of absolute, unassailable dominance. The United States possesses the GPUs, the capital, and the talent. Everyone else is merely playing catch-up, drafting behind the aerodynamic wake of OpenAI and Google. That narrative just hit a wall.
A rigorous new study by researchers at MIT, Hugging Face, and others has analyzed the complete history of model downloads—2.2 billion of them—to trace where the actual power lies in the ecosystem. The results are not just surprising. They represent a fundamental inversion of the status quo.
According to the data, China has officially overtaken the United States in the global market share of open model downloads. In the last year alone, Chinese organizations captured 17.1% of the download market, surpassing the US share of 15.8%. — Read More
Technical Deflation
In economics, deflation is the opposite of inflation—it’s what we call it when prices go down instead of up. It is generally considered harmful: both because it is usually brought on by something really bad (like a severe economic contraction), and because in and of itself, it has knock-on effects on consumer behavior that can lead to a death spiral. One of the main problems is that if people expect prices to keep going down, they’ll delay purchases and save more, because they expect that they’ll be able to get the stuff for less later. Less spending means less demand means less revenue means fewer jobs which means less spending and then whoops you’re in a deflationary spiral.
… This isn’t really an economics blog post, though. I’m thinking about deflation because it parallels a recent pattern I’m seeing in startups. (So I guess you could call it a micro-economics blog post?) The basic mechanism is: (1) it’s easier and cheaper to build software now than ever before; (2) it seems like it probably will keep getting easier and cheaper for the forseeable future; so (3) why bother building anything now, just build it later when it’s cheaper and easier. — Read More
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
Large language models have made significant progress in mathematical reasoning, which serves as an important testbed for AI and could impact scientific research if further advanced. By scaling reasoning with reinforcement learning that rewards correct final answers, LLMs have improved from poor performance to saturating quantitative reasoning competitions like AIME and HMMT in one year. However, this approach faces fundamental limitations.
Pursuing higher final answer accuracy doesn’t address a key issue: correct answers don’t guarantee correct reasoning. Moreover, many mathematical tasks like theorem proving require rigorous step-by-step derivation rather than numerical answers, making final answer rewards inapplicable.
To push the limits of deep reasoning, we believe it is necessary to verify the comprehensiveness and rigor of mathematical reasoning. Self-verification is particularly important for scaling test-time compute, especially for open problems without known solutions. Towards self-verifiable mathematical reasoning, we investigate how to train an accurate and faithful LLM-based verifier for theorem proving. We then train a proof generator using the verifier as the reward model, and incentivize the generator to identify and resolve as many issues as possible in their own proofs before finalizing them. — Read More
AI infrastructure in the “Era of experience”
In the famous essay from May 2025, “Welcome to the Era of Experience,” Rich Sutton and David Silver proposed a new paradigm of training AI models – models that learn not through predicting the next word against text scraped from Common Crawl, but through gaining experience via interaction with environments. As we approach the exhaustion of easily scrapable text data, we predict we’ll observe a shift toward AI models increasingly trained in this fashion via reinforcement learning (RL). In this text, we discuss the technical details underpinning this process.
… We intend this text to provide the reader with the theoretical basis needed to reason about AI infrastructure in the context of reinforcement learning. We argue that in the next 6-12 months there are significant opportunities for new businesses to be built around recent developments in RL, particularly for product companies to build sustainable moats through custom models trained on their proprietary environments, as well as for infrastructure players to build “picks and shovels” enabling the RL economy. — Read More
Implications of AI to Schools
… You will never be able to detect the use of AI in homework. Full stop. All “detectors” of AI imo don’t really work, can be defeated in various ways, and are in principle doomed to fail. You have to assume that any work done outside classroom has used AI.
…[T]he goal is that the students are proficient in the use of AI, but can also exist without it, and imo the only way to get there is to flip classes around and move the majority of testing to in class settings. — Read More
The Iceberg Index: Measuring Workforce Exposure Across the AI Economy
Artificial Intelligence is reshaping America’s $9.4 trillion labor market, with cascading effects that extend far beyond visible technology sectors. When AI transforms quality control tasks in automotive plants, consequences spread through logistics networks, supply chains, and local service economies. Yet traditional workforce metrics cannot capture these ripple effects: they measure employment outcomes after disruption occurs, not where AI capabilities overlap with human skills before adoption crystallizes. Project Iceberg addresses this gap using Large Population Models to simulate the human-AI labor market, representing 151 million workers as autonomous agents executing over 32,000 skills and interacting with thousands of AI tools. It introduces the Iceberg Index, a skills-centered metric that measures the wage value of skills AI systems can perform within each occupation. The Index captures technical exposure, where AI can perform occupational tasks, not displacement outcomes or adoption timelines. Analysis shows that visible AI adoption concentrated in computing and technology (2.2% of wage value, approx $211 billion) represents only the tip of the iceberg. Technical capability extends far below the surface through cognitive automation spanning administrative, financial, and professional services (11.7%, approx $1.2 trillion). This exposure is fivefold larger and geographically distributed across all states rather than confined to coastal hubs. Traditional indicators such as GDP, income, and unemployment explain less than 5% of this skills-based variation, underscoring why new indices are needed to capture exposure in the AI economy. By simulating how these capabilities may spread under scenarios, Iceberg enables policymakers and business leaders to identify exposure hotspots, prioritize investments, and test interventions before committing billions to implementation. — Read More
Deep Work in an Always-On World: How Focus Becomes Your Unfair Advantage
In an always-on environment of Slack pings, email floods, and meeting overload, the scarcest resource isn’t information or compute—it’s sustained human attention. This article argues that deep work—distraction-free, cognitively demanding, value-creating effort—is now core infrastructure for modern high performance. Drawing on research in attention, task switching, interruptions, and flow, it explains why “multitasking” is actually rapid context switching that slows delivery, increases defects, and spikes stress. It then connects focus to hard business outcomes: fewer incidents, faster recovery, better code, higher throughput, and improved retention. Practical sections translate the science into playbooks for individuals, teams, and leaders—covering how to measure deep work, protect maker time, fix meeting and communication norms, and overcome cultural resistance to being “less available.” The conclusion is simple: in an AI-heavy, always-on world, organizations that systematically protect deep work will ship better work, with saner teams, at lower real cost. — Read More
Scientists identify five ages of the human brain over a lifetime
Neuroscientists at the University of Cambridge have identified five “major epochs” of brain structure over the course of a human life, as our brains rewire to support different ways of thinking while we grow, mature, and ultimately decline.
A study led by Cambridge’s MRC Cognition and Brain Sciences Unit compared the brains of 3,802 people between zero and ninety years old using datasets of MRI diffusion scans, which map neural connections by tracking how water molecules move through brain tissue.
In a study published in Nature Communications, scientists say they detected five broad phases of brain structure in the average human life, split up by four pivotal “turning points” between birth and death when our brains reconfigure. — Read More
Ilya Sutskever: AI’s bottleneck is ideas, not compute
Ilya Sutskever, in a rare interview with Dwarkesh Patel, laid out his sharp critique of the AI industry. He argues that reliance on brute-force “scaling” has hit a wall. While AI models may be brilliant on tests, they are fragile in terms of real-world applications. He believes the pursuit of general intelligence must now shift from simply gathering more data to discovering a new, more efficient scientific principles. — Read More
#strategy