Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

The rapid scaling of large language models (LLMs) has unveiled critical limitations in current hardware architectures, including constraints in memory capacity, computational efficiency, and interconnection bandwidth. DeepSeek-V3, trained on 2,048 NVIDIA H800 GPUs, demonstrates how hardware-aware model co-design can effectively address these challenges, enabling cost-efficient training and inference at scale. This paper presents an in-depth analysis of the DeepSeek-V3/R1 model architecture and its AI infrastructure, highlighting key innovations such as Multi-head Latent Attention (MLA) for enhanced memory efficiency, Mixture of Experts (MoE) architectures for optimized computation-communication trade-offs, FP8 mixed-precision training to unlock the full potential of hardware capabilities, and a Multi-Plane Network Topology to minimize cluster-level network overhead. Building on the hardware bottlenecks encountered during DeepSeek-V3’s development, we engage in a broader discussion with academic and industry peers on potential future hardware directions, including precise low-precision computation units, scale-up and scale-out convergence, and innovations in low-latency communication fabrics. These insights underscore the critical role of hardware and model co-design in meeting the escalating demands of AI workloads, offering a practical blueprint for innovation in next-generation AI systems. — Read More

#performance

The Top 5 domestic large models contend for supremacy, a decisive battle in AGI

China’s foundation model market has completely changed! Today, the players on the table have become the “Top 5 Foundation Models” – Bytedance, Alibaba, Stepfun [阶跃星辰], Zhipu and DeepSeek. Where will the key winning point be in the next battle at the peak?

DeepSeek’s emergence from out of nowhere has completely changed the global AI situation.

From then on, not only has the competition pattern of large-scale models between China and the United States changed, but also the industrial landscape of domestic large-scale models has been broken in one swoop!

Looking at the market of large-scale foundation models in China, we can see that today’s foundation model landscape has changed dramatically and evolved into a new top five pattern –

Bytedance, Alibaba, Stepfun, Zhipu, and DeepSeek. — Read More

#china-ai

OpenAlpha_Evolve

OpenAlpha_Evolve is an open-source Python framework inspired by the groundbreaking research on autonomous coding agents like DeepMind’s AlphaEvolve. It’s a regeneration of the core idea: an intelligent system that iteratively writes, tests, and improves code using Large Language Models (LLMs) like Google’s Gemini, guided by the principles of evolution. — Read More

#devops

Large Language Models Are More Persuasive Than Incentivized Human Persuaders

We directly compare the persuasion capabilities of a frontier large language model (LLM; Claude Sonnet 3.5) against incentivized human persuaders in an interactive, real-time conversational quiz setting. In this preregistered, large-scale incentivized experiment, participants (quiz takers) completed an online quiz where persuaders (either humans or LLMs) attempted to persuade quiz takers toward correct or incorrect answers. We find that LLM persuaders achieved significantly higher compliance with their directional persuasion attempts than incentivized human persuaders, demonstrating superior persuasive capabilities in both truthful (toward correct answers) and deceptive (toward incorrect answers) contexts. We also find that LLM persuaders significantly increased quiz takers’ accuracy, leading to higher earnings, when steering quiz takers toward correct answers, and significantly decreased their accuracy, leading to lower earnings, when steering them toward incorrect answers. Overall, our findings suggest that AI’s persuasion capabilities already exceed those of humans that have real-money bonuses tied to performance. Our findings of increasingly capable AI persuaders thus underscore the urgency of emerging alignment and governance frameworks. — Read More

#human

The Simulation Says the Orioles Should Be Good

The Baltimore Orioles should be good, but they are not good. At 15-24, they are one of the worst teams in all of Major League Baseball this season, an outcome thus far that fans, experts, and the team itself will tell you are either statistically improbable or nearing statistically impossible based on thousands upon thousands of simulations run before the season started. 

Trying to figure out why this is happening is tearing the fanbase apart and has turned a large portion of them against management, which has put a huge amount of its faith, on-field strategy, and player acquisition decision making into predictive AI systems, advanced statistics, probabilistic simulations, expected value positive moves, and new-age baseball thinking in which statistical models and AI systems try to reduce human baseball players into robotic, predictable chess pieces. Teams have more or less tried to “solve” baseball like researchers try to solve games with AI. Technology has changed not just how teams play the game, but how fans like me experience it, too.  — Read More

#strategy

Company Regrets Replacing All Those Pesky Human Workers With AI, Just Wants Its Humans Back

Two years after partnering with OpenAI to automate marketing and customer service jobs, financial tech startup Klarna says it’s longing for human connection again.

Once gunning to be OpenAI CEO Sam Altman’s “favorite guinea pig,” Klarna is now plotting a big recruitment drive after its AI customer service agents couldn’t quite hack it.

The buy-now-pay-later company had previously shredded its marketing contracts in 2023, followed by its customer service team in 2024, which it proudly began replacing with AI agents. Now, the company says it imagines an “Uber-type of setup” to fill their ranks, with gig workers logging in remotely to argue with customers from the comfort of their own homes. — Read More

#strategy

INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning

We introduce INTELLECT-2, the first globally distributed reinforcement learning (RL) training run of a 32 billion parameter language model. Unlike traditional centralized training efforts, INTELLECT-2 trains a reasoning model using fully asynchronous RL across a dynamic, heterogeneous swarm of permissionless compute contributors.

To enable a training run with this unique infrastructure, we built various components from scratch: we introduce PRIME-RL, our training framework purpose-built for distributed asynchronous reinforcement learning, based on top of novel components such as TOPLOC, which verifies rollouts from untrusted inference workers, and SHARDCAST, which efficiently broadcasts policy weights from training nodes to inference workers.

Beyond infrastructure components, we propose modifications to the standard GRPO training recipe and data filtering techniques that were crucial to achieve training stability and ensure that our model successfully learned its training objective, thus improving upon QwQ-32B, the state of the art reasoning model in the 32B parameter range.

We open-source INTELLECT-2 along with all of our code and data, hoping to encourage and enable more open research in the field of decentralized training. — Read More

#training

Meet AlphaEvolve, the Google AI that writes its own code—and just saved millions in computing costs

Google DeepMind today pulled the curtain back on AlphaEvolve, an artificial-intelligence agent that can invent brand-new computer algorithms — then put them straight to work inside the company’s vast computing empire.

AlphaEvolve pairs Google’s Gemini large language models with an evolutionary approach that tests, refines, and improves algorithms automatically. The system has already been deployed across Google’s data centers, chip designs, and AI training systems — boosting efficiency and solving mathematical problems that have stumped researchers for decades.

AlphaEvolve is a Gemini-powered AI coding agent that is able to make new discoveries in computing and mathematics. — Read More

#devops

Memory Layers at Scale

Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsely activated memory layers complement compute-heavy dense feed-forward layers, providing dedicated capacity to store and retrieve information cheaply. This work takes memory layers beyond proof-of-concept, proving their utility at contemporary scale. On downstream tasks, language models augmented with our improved memory layer outperform dense models with more than twice the computation budget, as well as mixture-of-expert models when matched for both compute and parameters. We find gains are especially pronounced for factual tasks. We provide a fully parallelizable memory layer implementation, demonstrating scaling laws with up to 128B memory parameters, pretrained to 1 trillion tokens, comparing to base models with up to 8B parameters. — Read More

#performance

DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level

Through a joint collaboration between the Agentica team and Together AI, we release DeepCoder-14B-Preview, a code reasoning model finetuned from Deepseek-R1-Distilled-Qwen-14B via distributed RL. It achieves an impressive 60.6% Pass@1 accuracy on LiveCodeBench (+8% improvement), matching the performance of o3-mini-2025-01-031 (Low) and o1-2024-12-17 with just 14B parameters. We’ve open-sourced our dataset, code, training logs, and systems optimizations for everyone to progress on scaling and accelerating intelligence with RL. — Read More

#devops