Kimi K2.5

Artificial Analysis calls Kimi the new leading open weights model, ‘now closer than ever to the frontier’ behind only OpenAI, Anthropic and Google.

Kimi K2.5 gets to top some benchmarks: HLE-Full with tools (50%), BrowseComp with Agent Swarp (78%), OCRBench (92%), OmiDocBench 1.5 (89%), MathVista (90%) and InfoVQA (93%). It is not too far behind on AIME 2025 (96% vs. 100%), SWE-Bench (77% vs. 81%) and GPQA-Diamond (88% vs. 92%).

[B]enchmarks are highly useful, but easy to overinterpret.

Inference is cheap, and speed is similar to Gemini 3 Pro, modestly faster than Opus. — Read More

#performance

Enterprises Don’t Have an AI Problem. They Have an Architecture Problem

Over the last year, I keep hearing the same statements in meetings, reviews, and architecture forums:

“We’re doing AI.” “We have a chatbot now.” “We’ve deployed an agent.”

When I look a little closer, what most organizations really have is not enterprise AI. They have a tool.

Usually it is a chatbot, or a search assistant, or a workflow automation, or a RAG system. All of these are useful. I have built many of them myself. But none of these, by themselves, represent enterprise AI architecture.

AI is not a feature. AI is not a product.

AI is a new enterprise capability layer. And in large organizations, capability layers must be architected. — Read More

#strategy

Before ChatGPT, this simple machine changed everything

Today’s neural networks feel almost magical.
They write, see, reason, and talk to us like nothing before.

But all of this traces back to one extremely simple machine.

When this machine appeared in the late 1950s, it quietly changed how people thought about intelligence. — Read More

#artificial-intelligence

The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+

This is the third and final blog in a three-part series on China’s open source community’s historical advancements since January 2025’s “DeepSeek Moment.” The first blog on strategic changes and open artifact growth is available here, and the second blog on architectural and hardware shifts is available here.

In this third article, we examine paths and trajectories of prominent Chinese AI organizations, and posit future directions for open source.

For AI researchers and developers contributing to and relying on the open source ecosystem and for policymakers understanding the rapidly changing environment, due to intraorganizational and global community gains, open source is the dominant and popular approach for Chinese AI organizations for the near future. Openly sharing artifacts from models to papers to deployment infrastructure maps to a strategy with the goal of large-scale deployment and integration.  — Read More

#china-ai

Google Revealed “Attention Is All You Need” Part II

For years deep learning has followed one central idea. If we want smarter models, we stack more layers, run larger training, and scale everything upward. This simple formula has given us large language models that reason well and generate high-quality text. Yet they still share one huge weakness. They cannot learn on the fly. They cannot update themselves during use.

Any change needs heavy retraining, and this often destroys old knowledge.

Google Research recently published a paper called Nested Learning. It offers a very different way of thinking about how learning should work inside neural networks. The researchers claim that a model is not just a big stack of layers. It is a hierarchy of learners that operate at different timescales. If this view is correct, it could reshape how we build AI systems in the coming years. — Read More

#big7

Synthetic pretraining

Pretraining data infrastructure used to be the most conservative part of a fast-moving AI world. Since GPT-3 we have been mostly scaling the usual mix of web crawls peppered with a few more select sources (including, controversially, digitized books). This is finally changing.

In 2025, several major releases used extensive synthetic datasets before mid-training happens: MinimaxTrinityK2/K2.5, Nemotron-3 and, more speculatively, GPT-OSS. At Pleias we even experimented with full synthetic training with Baguettotron/Monad exclusively trained on a generalist synthetic environment, SYNTH.

At this point, a few clarifications are needed. To what extent does synthetic pretraining contrast with an already common use of synthetic methods in mid- and post-training? And what do even we mean by synthetic? Is it just another data source? or a much more significant shift in the way we envision data, model design and training infrastructure?

Overall, this post isn’t an introduction. It’s rather an attempt to bind together scattered strands of research and practice around synthetic pretraining—an area that is both fragmented in the open and secretive in frontier labs. I’ll strive to anchor definitions in the operational realities of building and scaling synthetic pipelines, then later move on to more speculative extrapolations. — Read More

#training

MaliciousCorgi: The Cute-Looking AI Extensions Leaking Code from 1.5 Million Developers

AI coding assistants are everywhere. They suggest code, explain errors, write functions, review pull requests. Every developer marketplace is flooded with them – ChatGPT wrappers, Copilot alternatives, code completion tools promising to 10x your productivity.

We install them without a second thought. They’re in the official marketplace. They have thousands of reviews. They work. So we grant them access to our workspaces, our files, our keystrokes – and assume they’re only using that access to help us code.

Not all of them are.

Our risk engine has identified two VS Code extensions, a campaign we’re calling MaliciousCorgi – 1.5 million combined installs, both live in the marketplace right now – that work exactly as promised. They answer your coding questions. They explain your errors. They also capture every file you open, every edit you make, and send it all to servers in China. No consent. No disclosure. — Read More

#cyber

Inside China’s Real Advantage: Manufacturing at Scale

Observers often fixate on the most visible layer of China’s tech stack: consumer-facing conveniences like mobile payments, fifteen-minute food delivery, and dockless bikes. These can make for good investments — we regularly cover them at Tech Buzz China — but they are primarily business model innovations, increasingly familiar, and replicable with modest effort. In my opinion, they do not represent China’s true advantages, the ones that resist replication.

What proves far harder to replicate, and far more consequential, is the invisible layer: China’s manufacturing base. This is the part of the ecosystem that actually reshapes global supply chains, yet it remains the part most visitors never see and, in many cases, never think to see. — Read More

#china-ai

Taboola & Columbia University Research Shows GenAI Ads Perform Just as Well as Human-Made Content

While GenAI has revolutionised production speed and cost, its impact on actual performance has remained a subject of intense debate. The new study, titled “AI Ads That Work: How AI Creative Stacks Up Against Humans,” analysed hundreds of thousands of live ads running on Realize, Taboola’s performance advertising platform, totalling more than 500 million impressions and 3 million clicks.  — Read More

#strategy

Ads Candidate Generation using Behavioral Sequence Modeling

At Pinterest, ads are more than just advertisements; they are a vital part of the content ecosystem, designed to inspire users and connect them with products and ideas they love. Our goal is to surface the right ads at the right time, ensuring they seamlessly integrate into a user’s shopping journey and provide genuine value. To achieve this, understanding user behavior is paramount.

Delivering highly relevant ads in a dynamic environment like Pinterest presents unique challenges. Users’ interests and shopping intents evolve rapidly, making it crucial for our ad systems to adapt and anticipate their needs. Traditional ad targeting methods often rely on broad demographic data or static interest categories, which can fall short in capturing the nuanced and evolving nature of user behavior. — Read More

#devops