Software agents work reliably now. Claude Code demonstrated that a large language model (LLM) with access to bash and file tools, operating in a loop until an objective is achieved, can accomplish complex multi-step tasks autonomously.
The surprising discovery: A really good coding agent is actually a really good general-purpose agent. The same architecture that lets Claude Code refactor a codebase can let an agent organize your files, manage your reading list, or automate your workflows.
The Claude Code software development kit (SDK) makes this accessible. You can build applications where features aren’t code you write—they’re outcomes you describe, achieved by an agent with tools, operating in a loop until the outcome is reached.
This opens up a new field: software that works the way Claude Code works, applied to categories far beyond coding. — Read More
Recent Updates Page 4
The AI Learned to Think on Its Own. Nobody Taught It How.
[In]January 2025, a Chinese startup that most Western engineers had never heard of publishes a research paper that shocks the AI world.
The claim: they trained a reasoning model as capable as OpenAI’s best, for a fraction of the cost. The method? They removed humans from the training loop entirely. No “reward model” (an auxiliary model that learns to predict what humans would prefer). No thousands of annotators paid to rate responses. Just a single signal: the answer is correct, or it isn’t. — Read More
AI & Humans: Making the Relationship Work
Leaders of many organizations are urging their teams to adopt agentic AI to improve efficiency, but are finding it hard to achieve any benefit. Managers attempting to add AI agents to existing human teams may find that bots fail to faithfully follow their instructions, return pointless or obvious results or burn precious time and resources spinning on tasks that older, simpler systems could have accomplished just as well.
The technical innovators getting the most out of AI are finding that the technology can be remarkably human in its behavior. And the more groups of AI agents are given tasks that require cooperation and collaboration, the more those human-like dynamics emerge.
Our research suggests that, because of how directly they seem to apply to hybrid teams of human and digital workers, the most effective leaders in the coming years may still be those who excel at understanding the timeworn principles of human management.
We have spent years studying the risks and opportunities for organizations adopting AI. Our 2025 book, Rewiring Democracy, examines lessons from AI adoption in government institutions and civil society worldwide. In it, we identify where the technology has made the biggest impact and where it fails to make a difference. Today, we see many of the organizations we’ve studied taking another shot at AI adoption—this time, with agentic tools. While generative AI generates, agentic AI acts and achieves goals such as automating supply chain processes, making data-driven investment decisions or managing complex project workflows. The cutting edge of AI development research is starting to reveal what works best in this new paradigm. — Read More
AI suddenly develops a human skill on its own Scientists now officially confused, concerned, and considering therapy
People, take a stiff drink for this one, cause it’s going to be long, unhinged, and “why the hell is my toaster negotiating with my fridge” levels of existential blog.
Let me TL;DR this beast for ya.
In a plot twist no one saw coming, but everyone privately feared, our dear AI has decided to pick up a brand-new human skill all by itself, which is the skill of getting along in a group. — Read More
8 plots that explain the state of open models
Starting 2026, most people are aware that a handful of Chinese companies are making strong, open AI models that are applying increasing pressure on the American AI economy.
While many Chinese labs are making models, the adoption metrics are dominated by Qwen (with a little help from DeepSeek). Adoption of the new entrants in the open model scene in 2025, from Z.ai, MiniMax, Kimi Moonshot, and others is actually quite limited. This sets up the position where dethroning Qwen in adoption in 2026 looks impossible overall, but there are areas for opportunity. In fact, the strength of GPT-OSS shows that the U.S. could very well have the smartest open models again in 2026, even if they’re used far less across the ecosystem. — Read More
Chinese AI models have lagged the US frontier by 7 months on average since 2023
Since 2023, every model at the frontier of AI capabilities, as measured by the Epoch Capabilities Index, has been developed in the United States. Over that same period, Chinese models have trailed US capabilities by an average of seven months, with a minimum gap of four months and a maximum gap of 14. — Read More
The ROI Problem in Attack Surface Management
Attack Surface Management (ASM) tools promise reduced risk. What they usually deliver is more information.
Security teams deploy ASM, asset inventories grow, alerts start flowing, and dashboards fill up. There is visible activity and measurable output. But when leadership asks a simple question, “Is this reducing incidents?” the answer is often unclear.
This gap between effort and outcome is the core ROI problem in attack surface management, especially when ROI is measured primarily through asset counts instead of risk reduction. — Read More
mHC: Manifold-Constrained Hyper-Connections
Recently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and diversifying connectivity patterns. While yielding substantial performance gains, this diversification fundamentally compromises the identity mapping property intrinsic to the residual connection, which causes severe training instability and restricted scalability, and additionally incurs notable memory access overhead. To address these challenges, we propose Manifold-Constrained Hyper-Connections (mHC), a general framework that projects the residual connection space of HC onto a specific manifold to restore the identity mapping property, while incorporating rigorous infrastructure optimization to ensure efficiency. Empirical experiments demonstrate that mHC is effective for training at scale, offering tangible performance improvements and superior scalability. We anticipate that mHC, as a flexible and practical extension of HC, will contribute to a deeper understanding of topological architecture design and suggest promising directions for the evolution of foundational models. — Read More
2025: The year in LLMs
This is the third in my annual series reviewing everything that happened in the LLM space over the past 12 months. For previous years see Stuff we figured out about AI in 2023 and Things we learned about LLMs in 2024.
It’s been a year filled with a lot of different trends. — Read More
Cybersecurity Changes I Expect in 2026
It becomes very clear that the primary security question for a company is how good their attackers’ ai is vs. their own.
— ISOs increasingly realize that there is no way to scale their human team to deal with how constant, continuous, and increasingly effective their attackers are becoming at attacking them
— It becomes a competition with how fast you can perform asset management, attack surface management, and vulnerability management on your company, but especially on your perimeter (which includes email and phishing/social engineering)
Read More