Autoregressive models (ARMs) are widely regarded as the cornerstone of large language models (LLMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised fine-tuning (SFT) paradigm. LLaDA models distributions through a forward data masking process and a reverse process, parameterized by a vanilla Transformer to predict masked tokens. By optimizing a likelihood bound, it provides a principled generative approach for probabilistic inference. Across extensive benchmarks, LLaDA demonstrates strong scalability, outperforming our self-constructed ARM baselines. Remarkably, LLaDA 8B is competitive with strong LLMs like LLaMA3 8B in in-context learning and, after SFT, exhibits impressive instruction-following abilities in case studies such as multi-turn dialogue. Moreover, LLaDA addresses the reversal curse, surpassing GPT-4o in a reversal poem completion task. Our findings establish diffusion models as a viable and promising alternative to ARMs, challenging the assumption that key LLM capabilities discussed above are inherently tied to ARMs. — Read More
Project page and codes: this https URL.
Monthly Archives: March 2025
The Widespread Adoption of Large Language Model-Assisted Writing Across Society
he recent advances in large language models (LLMs) attracted significant public and policymaker interest in its adoption patterns. In this paper, we systematically analyze LLM-assisted writing across four domains-consumer complaints, corporate communications, job postings, and international organization press releases-from January 2022 to September 2024. Our dataset includes 687,241 consumer complaints, 537,413 corporate press releases, 304.3 million job postings, and 15,919 United Nations (UN) press releases. Using a robust population-level statistical framework, we find that LLM usage surged following the release of ChatGPT in November 2022. By late 2024, roughly 18% of financial consumer complaint text appears to be LLM-assisted, with adoption patterns spread broadly across regions and slightly higher in urban areas. For corporate press releases, up to 24% of the text is attributable to LLMs. In job postings, LLM-assisted writing accounts for just below 10% in small firms, and is even more common among younger firms. UN press releases also reflect this trend, with nearly 14% of content being generated or modified by LLMs. Although adoption climbed rapidly post-ChatGPT, growth appears to have stabilized by 2024, reflecting either saturation in LLM adoption or increasing subtlety of more advanced models. Our study shows the emergence of a new reality in which firms, consumers and even international organizations substantially rely on generative AI for communications. — Read More
Amazon is reportedly developing its own AI ‘reasoning’ model
According to Business Insider, Amazon is developing an AI model that incorporates advanced “reasoning” capabilities, similar to models like OpenAI’s o3-mini and Chinese AI lab DeepSeek’s R1. The model may launch as soon as June under Amazon’s Nova brand, which the company introduced at its re:Invent developer conference last year. — Read More
Diffusion Models Enter the Large Language Arena as Inception Labs Unveils Mercury
For years, large language models (LLMs) have operated within a well-defined paradigm: autoregression. Each word or token is generated sequentially, one at a time, creating a fundamental bottleneck in speed and efficiency. This has led to increasing inference costs and latency issues as AI-generated text becomes more complex. Now, Inception Labs, a startup co-founded by Stanford professor Stefano Ermon and his colleagues Volodymyr Kuleshov and Aditya Grover, is introducing a different approach, diffusion large language models (dLLMs). Their first commercial-scale product, Mercury, aims to disrupt the status quo by offering significantly faster and more efficient text generation.
Traditional LLMs, including OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Haiku, generate text in a left-to-right fashion, with each token dependent on those before it. …“Diffusion models start with a rough estimate of data and refine it all at once,” Ermon told TechCrunch. “With LLMs, you cannot generate the second word until you’ve generated the first one, and you cannot generate the third one until you generate the first two.” By leveraging diffusion’s unique structure, Mercury’s dLLMs aim to bypass these constraints and deliver responses more efficiently than their autoregressive counterparts. — Read More
Inception emerges from stealth with a new type of AI model
Inception, a new Palo Alto-based company started by Stanford computer science professor Stefano Ermon, claims to have developed a novel AI model based on “diffusion” technology. Inception calls it a diffusion-based large language model, or a “DLM” for short.
The generative AI models receiving the most attention now can be broadly divided into two types: large language models (LLMs) and diffusion models. LLMs are used for text generation. Meanwhile, diffusion models, which power AI systems like Midjourney and OpenAI’s Sora, are mainly used to create images, video, and audio.
Inception’s model offers the capabilities of traditional LLMs, including code generation and question-answering, but with significantly faster performance and reduced computing costs, according to the company.
Ermon told TechCrunch that he has been studying how to apply diffusion models to text for a long time in his Stanford lab. — Read More
Vibe Coding and the Future of Software Engineering
Vibe coding (or vibeware) is making rounds on X now. To the best of my knowledge Andrej Karpathy started the “meme” in this X entry. I find it well written and hilarious and it seems to have taken off.
Karpathy: “There’s a new kind of coding I call “vibe coding”, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It’s possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like “decrease the padding on the sidebar by half” because I’m too lazy to find it. I “Accept All” always, I don’t read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I’d have to really read through it for a while. Sometimes the LLMs can’t fix a bug so I just work around it or ask for random changes until it goes away. It’s not too bad for throwaway weekend projects, but still quite amusing. I’m building a project or webapp, but it’s not really coding – I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.” — Read More
“It’s a lemon”—OpenAI’s largest AI model ever arrives to mixed reviews
The verdict is in: OpenAI’s newest and most capable traditional AI model, GPT-4.5, is big, expensive, and slow, providing marginally better performance than GPT-4o at 30x the cost for input and 15x the cost for output. The new model seems to prove that longstanding rumors of diminishing returns in training unsupervised-learning LLMs were correct and that the so-called “scaling laws” cited by many for years have possibly met their natural end. — Read More
Musk, DOGE, and the AI-Fueled Plan to Fire Everybody
What is DOGE? Officially, it’s the “Department of Government Efficiency,” intended to find and eliminate government fraud and waste; officially, it’s also a joke, named after an old meme. DOGE doesn’t just emit mixed signals — the incoherent messaging is right there in the name. It’s part of the plan and, for supporters, part of the fun. When necessary, the government argues that what DOGE is doing is just common sense, that Elon Musk is spearheading a “comprehensive forensic audit of every department and agency in the federal government,” and that the administration has a “commitment to an efficient and accountable federal workforce.” Nearly as often, though, the mask slips or gets pulled off and thrown on the ground.
…AI executives talk about work and labor in general. In AI, we’re getting some mixed messages too — the people working on this stuff are excited but worried. Founders and CEOs talk about glorious abundance with the public and tease concentrated returns to investors. They muse about automation and the future of work, raise alarms, fund alignment research, and talk about existential risk. Altman, for his part, has commissioned research into the plausibility and effects of UBI for a post-AI world. A bit like DOGE, however, OpenAI’s conflicted identity is embodied in its name and concept. The company was a nonprofit with the mission to “advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return.” By 2019, it was teaming up with Microsoft to raise tens of billions of dollars. Now, it’s in the process of converting into a for-profit company. Just as DOGE is pursuing something more than simple increases in efficiency, AI firms are pitching something more than simple increases in productivity. — Read More