State of LLMs in Late 2025

By October 2025, the AI landscape has evolved from “one model does everything” to a hyper-specialized ecosystem where each LLM has distinct strengths.

Training compute is doubling every five months, datasets expand every eight months, and performance continues hitting new benchmarks. Yet challenges are emerging: diminishing returns on scaling, massive energy consumption, and the rise of smaller specialized models (SLMs) are reshaping the field.

The question isn’t “Which AI is smartest?” It’s “Which AI is the right tool for this job?”

This guide explains the technical foundations that make each model different and helps choose the right one for specific tasks. — Read More

#nlp

Less is More: Recursive Reasoning with Tiny Networks

Hierarchical Reasoning Model (HRM) is a novel approach using two small neural networks recursing at different frequencies. This biologically inspired method beats Large Language models (LLMs) on hard puzzle tasks such as Sudoku, Maze, and ARC-AGI while trained with small models (27M parameters) on small data (around 1000 examples). HRM holds great promise for solving hard problems with small networks, but it is not yet well understood and may be suboptimal. We propose Tiny Recursive Model (TRM), a much simpler recursive reasoning approach that achieves significantly higher generalization than HRM, while using a single tiny network with only 2 layers. With only 7M parameters, TRM obtains 45% test-accuracy on ARC-AGI-1 and 8% on ARC-AGI-2, higher than most LLMs (e.g., Deepseek R1, o3-mini, Gemini 2.5 Pro) with less than 0.01% of the parameters. — Read More

#training

Welcome to STATE OF AI REPORT 2025

The State of AI Report is the most widely read and trusted analysis of key developments in AI. Published annually since 2018, the open-access report aims to spark informed conversation about the state of AI and what it means for the future. Produced by AI investor Nathan Benaich and Air Street Capital.

If 2024 was the year of consolidation, 2025 was the year reasoning got real. What began as a handful of “thinking” models has turned into a global competition to make machines that can plan, verify, and reflect. OpenAI, Google, Anthropic, and DeepSeek all released systems capable of reasoning through complex tasks, sparking one of the fastest research cycles the field has ever seen.

AI [now] acts as a force multiplier for technological progress in our increasingly digital, data-driven world. This is because everything around us, from culture to consumer products, is ultimately a product of intelligence. — Read More

#strategy

Building a Resilient Event Publisher with Dual Failure Capture

When we set out to rebuild Klaviyo’s event infrastructure, our goal wasn’t just to handle more scale, it was to make the system rock solid. In Part 1 of this series, we shared how we migrated from RabbitMQ to a Kafka-based architecture to process 170,000 events per second at peak without losing data. In Part 2, we dived into how we made event consumers resilient.

This post, Part 3, is all about the Event Publisher, the entry point into our event pipeline. The publisher has an important job: It needs to accept events from hundreds of thousands of concurrent clients, serialize them, keep up with unpredictable traffic spikes, and most importantly, ensure that no event is ever lost. If the publisher isn’t resilient, the rest of the pipeline can’t rely on a steady and complete flow of events. — Read More

#devops

Introducing CodeMender: an AI agent for code security

… Software vulnerabilities are notoriously difficult and time-consuming for developers to find and fix, even with traditional, automated methods like fuzzing. Our AI-based efforts like Big Sleep and OSS-Fuzz have demonstrated AI’s ability to find new zero-day vulnerabilities in well-tested software. As we achieve more breakthroughs in AI-powered vulnerability discovery, it will become increasingly difficult for humans alone to keep up.

CodeMender helps solve this problem by taking a comprehensive approach to code security that’s both reactive, instantly patching new vulnerabilities, and proactive, rewriting and securing existing code and eliminating entire classes of vulnerabilities in the process. Over the past six months that we’ve been building CodeMender, we have already upstreamed 72 security fixes to open source projects, including some as large as 4.5 million lines of code.

By automatically creating and applying high-quality security patches, CodeMender’s AI-powered agent helps developers and maintainers focus on what they do best — building good software. — Read More

#cyber