We construct evaluation tasks where extending the reasoning length of Large Reasoning Models (LRMs) deteriorates performance, exhibiting an inverse scaling relationship between test-time compute and accuracy. Our evaluation tasks span four categories: simple counting tasks with distractors, regression tasks with spurious features, deduction tasks with constraint tracking, and advanced AI risks. We identify five distinct failure modes when models reason for longer: 1) Claude models become increasingly distracted by irrelevant information; 2) OpenAI o-series models resist distractors but overfit to problem framings; 3) models shift from reasonable priors to spurious correlations; 4) all models show difficulties in maintaining focus on complex deductive tasks; and 5) extended reasoning may amplify concerning behaviors, with Claude Sonnet 4 showing increased expressions of self-preservation. These findings suggest that while test-time compute scaling remains promising for improving model capabilities, it may inadvertently reinforce problematic reasoning patterns. Our results demonstrate the importance of evaluating models across diverse reasoning lengths to identify and address these failure modes in LRMs. — Read More
Daily Archives: July 24, 2025
The Rise of the AI Database: Powering Real-Time AI Applications
As AI rapidly evolves, organizations are racing to build and deploy high-performance gen AI apps that deliver real-time insights and seamless user experiences. Central to this transformation is the emergence of the generative AI database, a new category of data platform optimized for vector search, semantic indexing and full-text retrieval. These systems are designed to address challenges like data silos, data quality and integration for AI and analytics. As the name suggests, a gen AI database is purpose-built to power generative AI models and applications, enabling developers to store, query and analyze both structured and unstructured data at scale, with the data stored in these platforms playing a crucial role in supporting advanced analytics and machine learning. — Read More
Context Engineering: 2025’s #1 Skill in AI
Let’s get one thing straight: if you’re still only talking about “prompt engineering,” you’re behind the curve. In the early days of Large Language Models (LLMs), crafting the perfect prompt was the name of the game.
For simple chatbots in 2022, it was enough. Then came Retrieval-Augmented Generation (RAG) in 2023, where we started feeding models domain-specific knowledge. Now, we have tool-using, memory-enabled agents that need to build relationships and maintain state over time. The single-interaction focus of prompt engineering just doesn’t cut it anymore. — Read More
Experts react: What Trump’s new AI Action Plan means for tech, energy, the economy, and more
“An industrial revolution, an information revolution, and a renaissance—all at once.” That’s how the Trump administration describes artificial intelligence (AI) in its new “AI Action Plan.” Released on Wednesday, the plan calls for cutting regulations to spur AI innovation and adoption, speeding up the buildout of AI data centers, exporting AI “full technology stacks” to US allies and partners, and ridding AI systems of what the White House calls “ideological bias.” How does the plan’s approach to AI policy differ from past US policy? What impacts will it have on the US AI industry and global AI governance? What are the implications for energy and the global economy? Our experts share their human-generated responses to these burning AI questions below. — Read More
America’s AI Action Plan
America is in a race to achieve global dominance in artificial intelligence (AI). Winning this race will usher in a new era of human flourishing, economic competitiveness, and national security for the American people. Recognizing this, President Trump directed the creation of an AI Action Plan in the early days of his second term in office. Based on the three pillars of accelerating innovation, building AI infrastructure, and leading in international diplomacy and security, this Action Plan is America’s roadmap to win the race. — Read More