Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity

Post-training alignment often reduces LLM diversity, leading to a phenomenon known as mode collapse. Unlike prior work that attributes this effect to algorithmic limitations, we identify a fundamental, pervasive data-level driver: typicality bias in preference data, whereby annotators systematically favor familiar text as a result of well-established findings in cognitive psychology. We formalize this bias theoretically, verify it on preference datasets empirically, and show that it plays a central role in mode collapse. Motivated by this analysis, we introduce Verbalized Sampling, a simple, training-free prompting strategy to circumvent mode collapse. VS prompts the model to verbalize a probability distribution over a set of responses (e.g., “Generate 5 jokes about coffee and their corresponding probabilities”). Comprehensive experiments show that VS significantly improves performance across creative writing (poems, stories, jokes), dialogue simulation, open-ended QA, and synthetic data generation, without sacrificing factual accuracy and safety. For instance, in creative writing, VS increases diversity by 1.6-2.1x over direct prompting. We further observe an emergent trend that more capable models benefit more from VS. In sum, our work provides a new data-centric perspective on mode collapse and a practical inference-time remedy that helps unlock pre-trained generative diversity. — Read More

#performance

How I Would Restart My Cybersecurity Career in 2025

Read More

#cyber, #videos

Reasoning Is Not Model Improvement

When OpenAI released o1 in 2024 and called it a “reasoning model,” the industry celebrated a breakthrough. Finally, AI that could think step-by-step, solve complex problems, handle graduate-level mathematics.

But look closer at what’s actually happening under the hood. When you ask o1 to multiply two large numbers, it doesn’t calculate. It generates Python code, executes it in a sandbox, and returns the result. Unlike GPT-3, which at least attempted arithmetic internally (and often failed), o1 explicitly delegates computation to external tools.

This pattern extends everywhere. The autonomy in agentic AI? Chained tool calls like web searches, API invocations, database queries. The breakthrough isn’t in the model’s intelligence. It’s in the orchestration layer coordinating external systems. Everything from reasoning to agentic AI is just a sophisticated application of code generation. These are not model improvements. They’re engineering workarounds for models that stopped improving.

This matters because the entire AI industry (from unicorn valuations to trillion-dollar GDP projections) depends on continued model improvement. What we’re getting instead is increasingly elaborate plumbing for fundamentally stagnant foundations. — Read More

#devops

Machine Learning and Design Thinking are “basically” the same

When you hear backpropagation, you probably think of machine learning, neural networks, and intimidating math. But even if the concept is new to you there’s no reason to worry. Because if we look closely, backpropagation isn’t just a computer science algorithm for machine learning.
No, backpropagation acts on the philosophy of learning through feedback, and thereby has a lot in common with design thinking.

In this article, I compare design thinking to machine learning to make complex concepts from computer science more graspable. I translate the logic of backprop (backpropagation) into design thinking language, and I illustrate how both follow the same idea: iterative improvement through feedback loops. In the latter half of the article I explain more machine learning concepts, the “bias”, “cost function”, what is “overfittig” and “underfitting”, as well as “activation functions”. And what seems incredibly complicated or simply unknown to you now will be a little bit more clear and relatable by the end of this article. — Read More

#training

The Continual Learning Problem

If we want to move towards a world where models are “always training” and continually learning from experience over time, we need to address a basic challenge: how do we keep updating the parameters of a model without breaking it? In this post, I’ll motivate memory layers as a natural architecture for this paradigm: high-capacity, but sparse (few active parameters) on each forward pass. In our recent paper, we found that finetuning memory layers enables learning without forgetting much more effectively than LoRA. When learning TriviaQA facts, NaturalQuestions performance drops by 89% with full finetuning and 71% with LoRA, but only 11% with memory layers. Along the way, I’ll also discuss the challenges of the continual learning problem broadly.

Read More

Check out the paper here: Continual Learning via Sparse Memory Finetuning

#training

Claude Code is unreasonably good at building MVPs

The most valuable code I’ve written in the past six months is code I fully intend to throw away. This isn’t some zen programming philosophy or agile methodology talking point. It’s the practical reality of using Claude Code to build prototypes and MVPs.

Here’s what’s fundamentally changed: the time from “what if we built X” to “here’s a working version of X” has collapsed from weeks or months down to hours or days. That compression doesn’t just make development faster. It changes what kinds of ideas are worth exploring in the first place. — Read More

#devops

Andrej Karpathy — AGI is still a decade away

Read More

#videos

Data Modeling for the Agentic Era: Semantics, Speed, and Stewardship

In data analytics, we’re facing a paradox. AI agents can theoretically analyze anything, but without the right foundations, they’re as likely to hallucinate a metric as to calculate it correctly. They can write SQL in seconds, but will it answer the right business question? They promise autonomous insights, but at what cost to trust and accuracy?

These days, everyone is embedding AI chat in their product. But to what end? Does it actually help, or would users rather turn to tools like Claude Code when they need real work done? The real questions are: how can we model our data for agents to reliably consume, and how can we use agents to develop better data models?

After spending the last year exploring where LLMs have genuine leverage in analytics (see my writing on GenBI and Self-Serve BI), I’ve identified three essential pillars that make agentic data modeling actually work: semantics as the shared language both humans and AI need to understand metrics, speed through sub-second analytics that lets you verify numbers before they become decisions, and stewardship with guardrails that guide without constraining. The TL;DR? AI needs structure to understand, humans need speed to verify, and both need boundaries to stay productive. — Read More

#data-science

Advanced RAG Techniques for High-Performance LLM Applications

Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by combining retrieval with generation to ground outputs in your own data rather than relying solely on pretraining. In practice, RAG systems retrieve relevant information from a knowledge source and integrate it into the prompt, enabling responses that are more accurate, contextual, and trustworthy.

RAG is now a widely used architecture for LLM applications, powering everything from question-answering services that leverage web search, to internal chat tools that index enterprise content, to complex QA pipelines. Its appeal is simple: by augmenting generation with retrieval, teams can deliver LLM experiences that meet today’s expectations for relevance and reliability.

But shipping a RAG system isn’t the finish line. Anyone who’s moved beyond a prototype knows the symptoms: hallucinations creep back in, long queries bog down performance, or answers miss the mark despite the right documents being retrieved. That’s where advanced RAG techniques come in. This guide walks through the strategies that help teams improve relevance, accuracy, and efficiency, so your system not only works, but works at scale. — Read More

#performance

Emerging Architectures for Modern Data Infrastructure

The growth of the data infrastructure industry has continued unabated since we published a set of reference architectures in late 2020. Nearly all key industry metrics hit record highs during the past year, and new product categories appeared faster than most data teams could reasonably keep track. Even the benchmarkwars and billboard battles returned.

To help data teams stay on top of the changes happening in the industry, we’re publishing in this post an updated set of data infrastructure architectures. They show the current best-in-class stack across both analytic and operational systems, as gathered from numerous operators we spoke with over the last year. Each architectural blueprint includes a summary of what’s changed since the prior version.

We’ll also attempt to explain why these changes are taking place. We argue that core data processing systems have remained relatively stable over the past year, while supporting tools and applications have proliferated rapidly. We explore the hypothesis that platforms are beginning to emerge in the data ecosystem, and that this helps explain the particular patterns we’re seeing in the evolution of the data stack. — Read More

#architecture