Beyond Standard LLMs

From DeepSeek R1 to MiniMax-M2, the largest and most capable open-weight LLMs today remain autoregressive decoder-style transformers, which are built on flavors of the original multi-head attention mechanism.

However, we have also seen alternatives to standard LLMs popping up in recent years, from text diffusion models to the most recent linear attention hybrid architectures. Some of them are geared towards better efficiency, and others, like code world models, aim to improve modeling performance.

After I shared my Big LLM Architecture Comparison a few months ago, which focused on the main transformer-based LLMs, I received a lot of questions with respect to what I think about alternative approaches. (I also recently gave a short talk about that at the PyTorch Conference 2025, where I also promised attendees to follow up with a write-up of these alternative approaches). So here it is! — Read More

#architecture

Architectural debt is not just technical debt

When I was a developer, half of our frustrations were about technical debt (the other were about estimates that are seen as deadlines).

We always made a distinction between code debt and architecture debt: code debt being the temporary hacks you put in place to reach a deadline and never remove, and architectural debt being the structural decisions that come back to bite you six months later.

While I agree that implementing software patterns like the strangler pattern or moving away from singletons is definitely software architecture. Architectural debt goes way beyond what you find in the code. — Read More

#architecture

Meet Project Suncatcher, Google’s plan to put AI data centers in space

The tech industry is on a tear, building data centers for AI as quickly as they can buy up the land. The sky-high energy costs and logistical headaches of managing all those data centers have prompted interest in space-based infrastructure. Moguls like Jeff Bezos and Elon Musk have mused about putting GPUs in space, and now Google confirms it’s working on its own version of the technology. The company’s latest “moonshot” is known as Project Suncatcher, and if all goes as planned, Google hopes it will lead to scalable networks of orbiting TPUs.

The space around Earth has changed a lot in the last few years. A new generation of satellite constellations like Starlink has shown it’s feasible to relay Internet communication via orbital systems. Deploying high-performance AI accelerators in space along similar lines would be a boon to the industry’s never-ending build-out. Google notes that space may be “the best place to scale AI compute.”

Google’s vision for scalable orbiting data centers relies on solar-powered satellites with free-space optical links connecting the nodes into a distributed network. Naturally, there are numerous engineering challenges to solve before Project Suncatcher is real. As a reference, Google points to the long road from its first moonshot self-driving cars 15 years ago to the Waymo vehicles that are almost fully autonomous today. — Read More

#nvidia

Humans, AI, and the space between

Software engineers, product managers, and UX designers each imagine a future where their contributions grow stronger while others’ might seem to fade. Everyone is eager to see how AI can expand their capabilities and impact. The excitement around this shift risks repeating an old mistake: creating silos. This time, it’s one human working with agents in isolation. And silos rarely lead to great products. The real opportunity lies in combining human strengths to build richer collaboration among diverse thinkers, guided and enhanced by intelligent tools. — Read More

#devops

Why aren’t video codec intrinsics used to train generative AI?

Every video we feed into a model carries a hidden companion that seems to be largely ignored. Alongside the frames, the encoder leaves behind a rich trail of signals — motion vectors, block partitions, quantisation/rate-distortion decisions and residual energy. Call them “codec intrinsics”, or simply “codec signals.” They aren’t pixels, but they are shaped by decades of engineering about what people actually see, where detail matters and how motion really flows. If our generators learn from images and videos, why not let them learn from this perceptual map as well? It’s the difference between teaching an AI to paint by only showing it finished masterpieces versus letting it study the painter’s original sketches, compositional notes, and brush-stroke tests. — Read More

#image-recognition

Google pulls AI model after senator says it fabricated assault allegation

Google says it has pulled AI model Gemma from its AI Studio platform after a Republican senator complained the model, designed for developers, “fabricated serious criminal allegations” about her.

In a post on X, Google’s official news account said the company had “seen reports of non-developers trying to use Gemma in AI Studio and ask it factual questions.” AI Studio is a platform for developers and not a conventional way for regular consumers to access Google’s AI models. Gemma is specifically billed as a family of AI models for developers to use, with variants for medical usecoding, and evaluating text and image content. — Read More

#fake