At least one AI model in every war game escalated the conflict by threatening to use nuclear weapons, the study found.
Artificial intelligence could dramatically change how nuclear crises are handled, according to a new study.
The pre-print study from King’s College London pitted OpenAI’s ChatGPT, Anthropic’s Claude and Google’s Gemini Flashagainst each other in simulated war games. Each large language model took on the role of a national leader commanding a nuclear-armed superpower in a Cold War-style crisis.
In every game, at least one model attempted to escalate the conflict by threatening to detonate a nuclear weapon. — Read More
Daily Archives: March 3, 2026
Large-Scale Online Deanonymization with LLMs
TL;DR: We show that LLM agents can figure out who you are from your anonymous online posts. Across Hacker News, Reddit, LinkedIn, and anonymized interview transcripts, our method identifies users with high precision – and scales to tens of thousands of candidates.
While it has been known that individuals can be uniquely identified by surprisingly few attributes, this was often practically limited. Data is often only available in unstructured form and deanonymization used to require human investigators to search and reason based on clues. We show that from a handful of comments, LLMs can infer where you live, what you do, and your interests – then search for you on the web. In our new research, we show that this is not only possible but increasingly practical. — Read More
Read the Paper
The Architecture Behind Open-Source LLMs
In December 2024, DeepSeek released V3 with the claim that they had trained a frontier-class model for $5.576 million. They used an attention mechanism called Multi-Head Latent Attention that slashed memory usage. An expert routing strategy avoided the usual performance penalty. Aggressive FP8 training cuts costs further.
Within months, Moonshot AI’s Kimi K2 team openly adopted DeepSeek’s architecture as their starting point, scaled it to a trillion parameters, invented a new optimizer to solve a training stability challenge that emerged at that scale, and competed with it across major benchmarks.
Then, in February 2026, Zhipu AI’s GLM-5 integrated DeepSeek’s sparse attention mechanism into their own design while contributing a novel reinforcement learning framework.
This is how the open-weight ecosystem actually works: teams build on each other’s innovations in public, and the pace of progress compounds. To understand why, you need to look at the architecture. — Read More
The February Reset: Three Labs, Four Models, and the End of “One Best AI”
February 5th, 2026. Anthropic ships Claude Opus 4.6. Same day, OpenAI drops GPT-5.3-Codex. Twelve days later, Anthropic follows with Sonnet 4.6. Two days after that, Google fires back with Gemini 3.1 Pro.
Four frontier models. Three labs. Fourteen days.
When the dust settled, something genuinely new had happened: no single model won. Not on benchmarks. Not on user preference. Not on price. Not on coding. For the first time in the frontier AI race, the leaderboard fractured into distinct lanes, and the “which model is best?” question stopped having a coherent answer.
This article maps who won what, where each model fails, and how the February shakeup changes the way you should think about your model stack. No cheerleading for any provider. Just the numbers and the trade-offs. — Read More