One question that I often get is what makes Product Data Scientist special at Meta. My answer has always been “You are by default a product leader, navigating product directions with data”. This is true across all levels, from new grads to directors. Data scientists at Meta don’t just analyze data — they transform business questions into data-driven product visions that help building better human connections.
The challenge? Product strategy development exists across a spectrum of conditions. Here I’ll explore how data scientists at Meta can drive product strategies across four distinct scenarios defined by data availability (low to high) and problem clarity (broad to concrete). — Read More
Recent Updates Page 24
Where do Large Vision-Language Models Look at when Answering Questions?
Large Vision-Language Models (LVLMs) have shown promising performance in vision-language understanding and reasoning tasks. However, their visual understanding behaviors remain underexplored. A fundamental question arises: to what extent do LVLMs rely on visual input, and which image regions contribute to their responses? It is non-trivial to interpret the free-form generation of LVLMs due to their complicated visual architecture (e.g., multiple encoders and multi-resolution) and variable-length outputs. In this paper, we extend existing heatmap visualization methods (e.g., iGOS++) to support LVLMs for open-ended visual question answering. We propose a method to select visually relevant tokens that reflect the relevance between generated answers and input image. Furthermore, we conduct a comprehensive analysis of state-of-the-art LVLMs on benchmarks designed to require visual information to answer. Our findings offer several insights into LVLM behavior, including the relationship between focus region and answer correctness, differences in visual attention across architectures, and the impact of LLM scale on visual understanding. The code and data are available at this https URL. — Read More
How To Become A Mechanistic Interpretability Researcher
Mechanistic interpretability (mech interp) is, in my incredibly biased opinion, one of the most exciting research areas out there. We have these incredibly complex AI models that we don’t understand, yet there are tantalizing signs of real structure inside them. Even partial understanding of this structure opens up a world of possibilities, yet is neglected by 99% of machine learning researchers. There’s so much to do!
think mech interp is an unusually easy field to learn about on your own: there’s a lot of educational materials, you don’t need too much compute, and there’s short feedback loops. But if you’re new, it can feel pretty intimidating to get started. This is my updated guide on how to skill up, get involved, reach the point where you can do actual research, and some advice on how to go from there to a career/academic role in the field.
This guide is deliberately highly opinionated. My goal is to convey a productive mindset and concrete steps that I think will work well, and give a sense of direction, rather than trying to give a fully broad overview or perfect advice. — Read More
Parallel AI Agents Are a Game Changer
I’ve been in this industry long enough to watch technologies come and go. I’ve seen the excitement around new frameworks, the promises of revolutionary tools, and the breathless predictions about what would “change everything.” Most of the time, these technologies turned out to be incremental improvements wrapped in marketing hyperbole.
But parallel agents? This is different. This is the first time I can say, without any exaggeration, that I’m witnessing technology that will fundamentally transform how we develop software. — Read More
R-Zero: Self-Evolving Reasoning LLM from Zero Data
Self-evolving Large Language Models (LLMs) offer a scalable path toward super-intelligence by autonomously generating, refining, and learning from their own experiences. However, existing methods for training such models still rely heavily on vast human-curated tasks and labels, typically via fine-tuning or reinforcement learning, which poses a fundamental bottleneck to advancing AI systems toward capabilities beyond human intelligence. To overcome this limitation, we introduce R-Zero, a fully autonomous framework that generates its own training data from scratch. Starting from a single base LLM, R-Zero initializes two independent models with distinct roles, a Challenger and a Solver. These models are optimized separately and co-evolve through interaction: the Challenger is rewarded for proposing tasks near the edge of the Solver capability, and the Solver is rewarded for solving increasingly challenging tasks posed by the Challenger. This process yields a targeted, self-improving curriculum without any pre-existing tasks and labels. Empirically, R-Zero substantially improves reasoning capability across different backbone LLMs, e.g., boosting the Qwen3-4B-Base by +6.49 on math-reasoning benchmarks and +7.54 on general-domain reasoning benchmarks. — Read More
Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of Artificial Intelligence
This paper examines changes in the labor market for occupations exposed to generative artificial intelligence using high-frequency administrative data from the largest payroll software provider in the United States. We present six facts that characterize these shifts. We find that since the widespread adoption of generative AI, early-career workers (ages 22-25) in the most AI-exposed occupations have experienced a 13 percent relative decline in employment even after controlling for firm-level shocks. In contrast, employment for workers in less exposed fields and more experienced workers in the same occupations has remained stable or continued to grow. We also find that adjustments occur primarily through employment rather than compensation. Furthermore, employment declines are concentrated in occupations where AI is more likely to automate, rather than augment, human labor. Our results are robust to alternative explanations, such as excluding technology-related firms and excluding occupations amenable to remote work. These six facts provide early, large-scale evidence consistent with the hypothesis that the AI revolution is beginning to have a significant and disproportionate impact on entry-level workers in the American labor market. — Read More
#strategyThe Evidence That AI Is Destroying Jobs For Young People Just Got Stronger
In a moment with many important economic questions and fears, I continue to find this among the more interesting mysteries about the US economy in the long run: Is artificial intelligence already taking jobs from young people?
If you’ve been casually following the debate over AI and its effect on young graduates’ employment, you could be excused for thinking that the answer to that question is “possibly,” or “definitely yes,” or “almost certainly no.”
… To be honest with you, I considered this debate well and truly settled. No, I’d come to think, AI is probably not wrecking employment for young people. But now, I’m thinking about changing my mind again. — Read More
Understanding LLMs: Insights from Mechanistic Interpretability
Since the release of ChatGPT in 2022, large language models (LLMs) based on the transformer architecture like ChatGPT, Gemini and Claude have transformed the world with their ability to produce high-quality, human-like text and more recently the ability to produce images and videos. Yet, behind this incredible capability lies a profound mystery: we don’t understand how these models work.
The reason is that LLMs aren’t built like traditional software. A traditional program is designed by human programmers and written in explicit, human-readable code. But LLMs are different. Instead of being programmed, LLMs are automatically trained to predict the next word on vast amounts of internet text, growing a complex network of trillions of connections that enable them to perform tasks and understand language. This training process automatically creates emergent knowledge and abilities, but the resulting model is usually messy, complex and incomprehensible since the training process optimizes the model for performance but not interpretability or ease of understanding.
The field of mechanistic interpretability aims to study LLM models and reverse engineer the knowledge and algorithms they use to perform tasks, a process that is more like biology or neuroscience than computer science.
The goal of this post is to provide insights into how LLMs work using findings from the field of mechanistic interpretability. — Read More
Context Engineering Series: Building Better Agentic RAG Systems
We’ve moved far beyond prompt engineering. Now we’re designing portfolios of tools (directory listing, file editing, web search), slash commands like /pr-create that inject prompts vs , specialized sub-agents @pr-creation-agent, vs having an AGENT.md with systems that work across IDEs, command lines, GitHub, and Slack.
Context engineering is designing tool responses and interaction patterns that give agents situational awareness to navigate complex information spaces effectively.
To understand what this means practically, let’s look at how systems have evolved:
Before: We precomputed what chunks needed to be put into context, injected them, and then asked the system to reason about the chunks.
… Now: Agents are incredibly easy to build because all you need is a messages array and a bunch of tools. — Read More
The Parallelism Mesh Zoo
When training large scale LLMs, there is a large assortment of parallelization strategies which you can employ to scale your training runs to work on more GPUs. There are already a number of good resources for understanding how to parallelize your models: I particularly recommend How To Scale Your Model and The Ultra-Scale Playbook. The purpose of this blog post is to discuss parallelization strategies in a more schematic way by focusing only on how they affect your device mesh. The device mesh is an abstraction used by both PyTorch and JAX that takes your GPUs (however many of them you’ve got in your cluster!) and organizes them into a N-D tensor that expresses how the devices communicate with each other. When we parallelize computation, we shard a tensor along one dimension of the mesh, and then do collectives along that dimension when there are nontrivial dependencies between shards. Being able to explain why a device mesh is set up the way it is for a collection of parallelization strategies is a good check for seeing if you understand how the parallelization strategies work in the first place! (Credit: This post was influenced by Visualizing 6D Mesh Parallelism.) — Read More