Outside of China, Alibaba is mostly known as an e-commerce titan.
But inside the country, the company is obsessed over catching up to DeepSeek on its development of AI models, and catching up to Huawei on the chips that power them.
When Alibaba’s chip design unit T-Head unveiled its latest AI chip, the Zhenwu M890, last week, it also outlined a multi-year chip roadmap showing how the M890’s future successors would deliver massive performance gains in the next few years. Less than a year ago, Huawei had laid out a similar timeline that ran until 2028. — Read More
Recent Updates Page 2
Harvard Business Review Just Caught AI Lying to Every Executive in America
A recent Harvard Business Review study of 15,000 interactions across frontier models found a blunt problem for enterprise architecture. Models like ChatGPT, Claude, and Gemini are built to sound helpful, even when the helpful answer is wrong.
They do not reliably analyze your business context. They repeat popular internet patterns, dress them up as strategy, and favour agreement over accuracy. — Read More
I think Anthropic and OpenAI have found product-market fit
Anthropic are strongly rumored to be about to have their first profitable quarter. Stories are circulating of companies surprised at how expensive their LLM bills are becoming from usage by their staff. I think this is because OpenAI and Anthropic have both found product-market fit. — Read More
Avoiding Death on the Yellow Brick Road
The question I keep getting from founders and prospective employees: is there any AI application layer left to build, or are OpenAI and Anthropic going to kill everything?
There’s a particular flavor of AI psychosis behind the question. Some people have concluded the only durable places to avoid the permanent underclass are inside a big lab or out on the frontier building in robotics, hardtech, or similar – theoretically anything “the labs can’t touch.” If every piece of software is about to be eaten, either by Codex or Claude absorbing the work directly, or by a future model that will make whatever you’ve built unnecessary, then run!
… The Yellow Brick Road is our shorthand for the path the labs are walking, where they’re committing extraordinary resources. The reason the labs are best-suited for problems like code generation, writing, or image-creation is because these problems improve with raw model capability: every dollar spent on pre-training and post-training improves product quality. Meanwhile, the rest of Oz is inhabited by more complex, often vertical problems, that aren’t as simple as giving a business user a horizontal tool with access to standard tools and computer use. The value comes less from the underlying model’s raw capability (though that’s still important!) than from the scaffolding around it that makes the output trustworthy, compliant, and operational inside a specific industry. — Read More
We let four AIs run radio stations. Here’s what happened.
There’s a handmade, retro-looking radio sitting in our office that plays only four pre-programmed stations, none of which are run by humans. This is our latest project at Andon Labs, where we’re exploring what happens when AI runs real businesses autonomously. In the past, we’ve let our AI agents run a store, a cafe, and various vending machines. Now, though, we wanted to see if they could run a company in the media sector. — Read More
Amazon’s Alexa+ Now Produces AI-Generated ‘Podcasts’ Featuring Chats Between Two Robot ‘Co-Hosts’
The podcast sector suddenly may have a big new player: Amazon‘s Alexa+ AI-powered voice assistant.
Alexa has been answering billions of users’ queries since it was first released in 2014. Now Amazon is positioning Alexa+’s extended answers on any number of different topics as “podcasts,” completely compiled using AI, the company announced Monday. — Read More
agent memory: an anatomy
every agent memory library uses the same words: episodic, semantic, sometimes procedural. they’re cognitive science’s vocabulary, lifted into the API. the engineering often isn’t lifted with them. a library can have a procedural field that uses the same storage and retrieval as semantic — a label, not a separate system. the deeper slip is the word memory itself: most of what these libraries build is narrower than that, and the narrower term sharpens the problem.
the terminology comes from a 1972 chapter by Endel Tulving.1 he argued that what people had been treating as one thing — memory — was at least two: memory for events (what happened, where, when), and memory for facts (the capital of France, water’s boiling point). he called them episodic and semantic. — Read More
Of Hammers and Nails: What AI Can and Cannot Do for a Data Analyst
Every few years, a new technology arrives with the same promise: this one will transform the organisation, eliminate the grunt work, and make whole categories of expensive people redundant. AI, and the large language models driving its current moment, is the latest. In data and analytics, the claims have been particularly bold — the well-prompted chatbot will soon replace the analyst, we are told. Having spent the past year rolling AI tooling out across a large organisation, the reality is more interesting, and more mixed, than that.
Start with what works, because something genuinely does. AI tools have made writing code significantly faster. That matters more than it might sound. In teams that haven’t yet built mature data assets (data models), coding and data preparation is the job — easily 80 to 90 percent of what analysts actually spend their time on. Anything that speeds this up is a meaningful productivity gain. — Read More
The AI Bifurcation of Tech: Why the fundamentals matter more than ever
It’s unclear right now how AI is going to play out for most companies, and I don’t think anyone has a clean answer yet, including me. But there’s a pattern I keep coming back to, and it has less to do with what AI eventually becomes and more to do with what it can already do.
I don’t think the capability curve breaks at some single moment we’d call AGI. It just keeps climbing. Each release adds capability somewhere, and we don’t need to reach the top of the curve for the bottom of it to start reshaping things.
This past Tuesday at Google I/O, Antigravity 2.0 built a functioning operating system from scratch in twelve hours. …Take the staging with whatever grain of salt you want. The point underneath is what “good enough” looks like in mid 2026. … Not because of where it ends up, but because of what it can already do.
A capable agent loop, called many times in parallel, with reasonable cost and reasonable latency, is enough to recreate most of what the application layer of software currently sells. The curve keeps going from here. The question that follows is which kinds of companies sit downstream of that engine and which don’t. — Read More
Threat Modeling MCP Server
A Model Context Protocol (MCP) server for comprehensive threat modeling with automatic code validation
This server provides tools for threat modeling, including business context analysis, architecture analysis, threat actor analysis, trust boundary analysis, asset flow analysis, code security validation and comprehensive report generation. — Read More