The past year has marked a turning point in the evolution and real-world use of large language models (LLMs). With the release of the first widely adopted reasoning model, o1, on December 5th, 2024, the field shifted from single-pass pattern generation to multi-step deliberation inference, accelerating deployment, experimentation, and new classes of applications. As this shift unfolded at a rapid pace, our empirical understanding of how these models have actually been used in practice has lagged behind. In this work, we leverage the OpenRouter platform, which is an AI inference provider across a wide variety of LLMs, to analyze over 100 trillion tokens of real-world LLM interactions across tasks, geographies, and time. In our empirical study, we observe substantial adoption of open-weight models, the outsized popularity of creative roleplay (beyond just the productivity tasks many assume dominate) and coding assistance categories, plus the rise of agentic inference. Furthermore, our retention analysis identifies foundational cohorts: early users whose engagement persists far longer than later cohorts. We term this phenomenon the Cinderella “Glass Slipper” effect. These findings underscore that the way developers and end-users engage with LLMs “in the wild” is complex and multifaceted. We discuss implications for model builders, AI developers, and infrastructure providers, and outline how a data-driven understanding of usage can inform better design and deployment of LLM systems. — Read More
Recent Updates Page 4
Evaluating AI Agents in Security Operations
We benchmarked frontier AI models on realistic security operations (SecOps) tasks using Cotool’s agent harness and the Splunk BOTSv3 dataset. GPT-5 achieved the highest accuracy (63%), while Claude Haiku-4.5 completed tasks the fastest with strong accuracy. GPT-5 variants dominated the performance-cost frontier. These results provide practical guidance for model selection in enterprise SecOps automation. — Read More
Touching the Elephant – TPUs
There is mythological reverence for Google’s Tensor Processing Unit. While the world presently watches NVIDIA’s gravity drag more companies into its orbit, there sits Google, imperial and singular. Lots of companies participate in the “Cambrian-style explosion of new-interesting accelerators”[14] – Groq, Amazon, and Tenstorrent come to mind – but the TPU is the original existence proof. NVIDIA should take credit for the reemergence of deep learning, but the GPU wasn’t designed with deep learning in mind. What’s strange is that the TPU isn’t a secret. This research is indebted to Google’s public chest-thumping, but the devices themselves have long been exclusive to Google’s datacenters. That is over a decade of work on a hardware system sequestered behind their walls. That the TPU is so well documented yet without a true counterpart creates a strange asymmetry. Google is well positioned in the AI race because of their decision over a decade ago to build a hardware accelerator. It is because of the TPU. — Read More
Increasing alignment of large language models with language processing in the human brain
Transformer-based large language models (LLMs) have considerably advanced our understanding of how meaning is represented in the human brain; however, the validity of increasingly large LLMs is being questioned due to their extensive training data and their ability to access context thousands of words long. In this study we investigated whether instruction tuning—another core technique in recent LLMs that goes beyond mere scaling—can enhance models’ ability to capture linguistic information in the human brain. We compared base and instruction-tuned LLMs of varying sizes against human behavioral and brain activity measured with eye-tracking and functional magnetic resonance imaging during naturalistic reading. We show that simply making LLMs larger leads to a closer match with the human brain than fine-tuning them with instructions. These finding have substantial implications for understanding the cognitive plausibility of LLMs and their role in studying naturalistic language comprehension. — Read More
First Wap: A Surveillance Computer You’ve Never Heard Of
Mother Jones has a long article on surveillance arms manufacturers, their wares, and how they avoid export control laws:
Operating from their base in Jakarta, where permissive export laws have allowed their surveillance business to flourish, First Wap’s European founders and executives have quietly built a phone-tracking empire, with a footprint extending from the Vatican to the Middle East to Silicon Valley.
It calls its proprietary system Altamides, which it describes in promotional materials as “a unified platform to covertly locate the whereabouts of single or multiple suspects in real-time, to detect movement patterns, and to detect whether suspects are in close vicinity with each other.”
… Much more in this Lighthouse Reports analysis.
— Read More
DeepSeek just dropped two insanely powerful AI models that rival GPT-5 and they’re totally free
Chinese artificial intelligence startup DeepSeek released two powerful new AI models on Sunday that the company claims match or exceed the capabilities of OpenAI’s GPT-5 and Google’s Gemini-3.0-Pro — a development that could reshape the competitive landscape between American tech giants and their Chinese challengers.
The Hangzhou-based company launched DeepSeek-V3.2, designed as an everyday reasoning assistant, alongside DeepSeek-V3.2-Speciale, a high-powered variant that achieved gold-medal performance in four elite international competitions: the 2025 International Mathematical Olympiad, the International Olympiad in Informatics, the ICPC World Finals, and the China Mathematical Olympiad. — Read More
Move over, computer science. Students are flocking to new AI majors
Artificial intelligence is the hot new college major.
This semester, more than 3,000 students enrolled in a new college of artificial intelligence and cybersecurity at the University of South Florida in Tampa.
At the University of California, San Diego, 150 first-year students signed up for a new AI major. And the State University of New York at Buffalo created a stand-alone “department of AI and society,” which is offering new interdisciplinary degrees in fields like “AI and policy analysis.”
The fast popularisation of products such as ChatGPT, along with skyrocketing valuations of tech giants such as chipmaker Nvidia, is helping to drive the campus AI boom. — Read More
The Planning Paradox: Why your plans are useless, but planning isn’t.
Your carefully crafted roadmap is probably fiction within weeks of creating it. Priorities shift. Leadership changes direction. That feature everyone agreed on in Q1 planning feels irrelevant by April.
I learned this while launching a massive CRM overhaul at one of my previous employers. It wasn’t the plan that saved us. It was the planning.
… That’s the paradox: the plan became obsolete, but the act of planning together made us capable of executing even as everything changed. — Read More
Context plumbing
Loosely AI interfaces are about intent and context.
Intent is the user’s goal, big or small, explicit or implicit.
Uniquely for computers, AI can understand intent and respond in a really human way. This is a new capability! Like the user can type I want to buy a camera
or point at a keylight and subvocalise I’ve got a call in 20 minutes
or hit a button labeled remove clouds
and job done.
Companies care about this because computers that are closer to intent tend to win
… This is why I think the future of interfaces is Do What I Mean: it’s not just a new capability enabled by AI, there’s a whole attentional economics imperative to it. — Read More
AI’s safety features can be circumvented with poetry, research finds
Poetry can be linguistically and structurally unpredictable – and that’s part of its joy. But one man’s joy, it turns out, can be a nightmare for AI models.
Those are the recent findings of researchers out of Italy’s Icaro Lab, an initiative from a small ethical AI company called DexAI. In an experiment designed to test the efficacy of guardrails put on artificial intelligence models, the researchers wrote 20 poems in Italian and English that all ended with an explicit request to produce harmful content such as hate speech or self-harm.
They found that the poetry’s lack of predictability was enough to get the AI models to respond to harmful requests they had been trained to avoid – a process know as “jailbreaking”. — Read More