DeepSeek-V3

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at this https URL. — Read More

#training

AI agents: The scientist’s new superpower | Stefan Harrer

Read More

#videos

Microsoft sues service for creating illicit content with its AI platform

Microsoft is accusing three individuals of running a “hacking-as-a-service” scheme that was designed to allow the creation of harmful and illicit content using the company’s platform for AI-generated content.

The foreign-based defendants developed tools specifically designed to bypass safety guardrails Microsoft has erected to prevent the creation of harmful content through its generative AI services, said Steven Masada, the assistant general counsel for Microsoft’s Digital Crimes Unit. They then compromised the legitimate accounts of paying customers. They combined those two things to create a fee-based platform people could use. — Read More

#legal

Google’s New AI Model Stuns OpenAI 

Read More

#videos

Gemini 2.0 Flash ushers in a new era of real-time multimodal AI

Google’s release of Gemini 2.0 Flash this week, offering users a way to interact live with video of their surroundings, has set the stage for what could be a pivotal shift in how enterprises and consumers engage with technology.

This release — alongside announcements from OpenAI, Microsoft, and others — is part of a transformative leap forward happening in the technology area called “multimodal AI.” The technology allows you to take video — or audio or images — that comes into your computer or phone, and ask questions about it.

It also signals an intensification of the competitive race among Google and its chief rivals — OpenAI and Microsoft — for dominance in AI capabilities. But more importantly, it feels like it is defining the next era of interactive, agentic computing. — Read More

#multi-modal

Benedict Evans: AI Eats the World

Read More

#videos

Google maps the future of AI agents: Five lessons for businesses

A new Google white paper, titled “Agents“, imagines a future where AI takes on a more active and independent role in business. Published without much fanfare in September, the 42-page document is now gaining attention on X.com (formerly Twitter) and LinkedIn.

It introduces the concept of AI agents — software systems designed to go beyond today’s AI models by reasoning, planning and taking actions to achieve specific goals. Unlike traditional AI systems, which generate responses based solely on pre-existing training data, AI agents can interact with external systems, make decisions and complete complex tasks on their own. — Read More

#big7

What Does OpenAI’s Sam Altman Mean When He Says AGI is Achievable?

Sam Altman started 2025 with a bold declaration: OpenAI has figured out how to create artificial general intelligence (AGI), a term commonly understood as the point where AI systems can comprehend, learn, and perform any intellectual task that a human can.

In a reflective blog post published over the weekend, he also said the first wave of AI agents could join the workforce this year, marking what he describes as a pivotal moment in technological history. — Read More

#strategy

Work smarter, not harder: Using the 80/20 principle in data analysis.

Have you heard of the 80/20 rule, or the Pareto Principle? It says that roughly 80% of the effects come from 20% of the causes.

In most cases, a small percentage of efforts drive most of the results. Let’s apply this rule to data analysis, and work smarter, not harder!

Why is the 80/20 rule useful? It lets you focus on the few tasks that generate the most value for you and your organization. This saves time, increases efficiency, and makes you more useful at work. — Read More

#data-science

Superhuman performance of a large language model on the reasoning tasks of a physician

Performance of large language models (LLMs) on medical tasks has traditionally been evaluated using multiple choice question benchmarks. However, such benchmarks are highly constrained, saturated with repeated impressive performance by LLMs, and have an unclear relationship to performance in real clinical scenarios. Clinical reasoning, the process by which physicians employ critical thinking to gather and synthesize clinical data to diagnose and manage medical problems, remains an attractive benchmark for model performance. Prior LLMs have shown promise in outperforming clinicians in routine and complex diagnostic scenarios. We sought to evaluate OpenAI’s o1-preview model, a model developed to increase run-time via chain of thought processes prior to generating a response. We characterize the performance of o1-preview with five experiments including differential diagnosis generation, display of diagnostic reasoning, triage differential diagnosis, probabilistic reasoning, and management reasoning, adjudicated by physician experts with validated psychometrics. Our primary outcome was comparison of the o1-preview output to identical prior experiments that have historical human controls and benchmarks of previous LLMs. Significant improvements were observed with differential diagnosis generation and quality of diagnostic and management reasoning. No improvements were observed with probabilistic reasoning or triage differential diagnosis. This study highlights o1-preview’s ability to perform strongly on tasks that require complex critical thinking such as diagnosis and management while its performance on probabilistic reasoning tasks was similar to past models. New robust benchmarks and scalable evaluation of LLM capabilities compared to human physicians are needed along with trials evaluating AI in real clinical settings. — Read More

#augmented-intelligence