The End of Moore’s Law for AI? Gemini Flash Offers a Warning

For the past few years, the AI industry has operated under its own version of Moore’s Law: an unwavering belief that the cost of intelligence would perpetually decrease by orders of magnitude each year. Like clockwork, each new model generation promised to be not only more capable but also cheaper to run. Last week, Google quietly broke that trend.

In a move that at first went unnoticed, Google significantly increased the price of its popular Gemini 2.5 Flash model. The input token price doubled from $0.15 to $0.30 per million tokens, while the output price more than quadrupled from $0.60 to $2.50 per million. Simultaneously, they introduced a new, less capable model, “Gemini 2.5 Flash Lite”, at a lower price point.

This is the first time a major provider has backtracked on the price of an established model. While it may seem like a simple adjustment, we believe this signals a turning point. The industry is no longer on an endless downward slide of cost. Instead, we’ve hit a fundamental soft floor on the cost of intelligence, given the current state of hardware and software. — Read More

#strategy

Machine Mental Imagery: Empower MultimodalReasoning with Latent Visual Tokens

Vision-language models (VLMs) excel at multimodal understanding, yet their text-only decoding forces them to verbalize visual reasoning, limiting performance on tasks that demand visual imagination. Recent attempts train VLMs to render explicit images, but the heavy image-generation pre-training often hinders the reasoning ability. Inspired by the way humans reason with mental imagery-the internal construction and manipulation of visual cues-we investigate whether VLMs can reason through interleaved multimodal trajectories without producing explicit images. To this end, we present a Machine Mental Imagery framework, dubbed as Mirage, which augments VLM decoding with latent visual tokens alongside ordinary text. Concretely, whenever the model chooses to “think visually”, it recasts its hidden states as next tokens, thereby continuing a multimodal trajectory without generating pixel-level images. Begin by supervising the latent tokens through distillation from ground-truth image embeddings, we then switch to text-only supervision to make the latent trajectory align tightly with the task objective. A subsequent reinforcement learning stage further enhances the multimodal reasoning capability. Experiments on diverse benchmarks demonstrate that Mirage unlocks stronger multimodal reasoning without explicit image generation. — Read More

#vision