Anthropic Economic Index: AI’s Impact on Software Development

Jobs that involve computer programming are a small sector of the modern economy, but an influential one. The past couple of years have seen them changed dramatically by the introduction of AI systems that can assist with—and automate—significant amounts of coding work.

In our previous Economic Index research, we found very disproportionate use of Claude by US workers in computer-related occupations: that is, there were many more conversations with Claude about computer-related tasks than one would predict from the number of people working in relevant jobs. It’s the same in the educational context: Computer Science degrees—which involve large amounts of coding—show highly disproportionate AI use. — Read More

#strategy

Working with LLMs: A Few Lessons

An interesting part of working with LLMs is that you get to see a lot of people trying to work with them, inside companies both small and large, and fall prey to entirely new sets of problems. Turns out using them well isn’t just a matter of knowhow or even interest, but requires unlearning some tough lessons. So I figured I’d jot down a few observations. Here we go, starting with the hardest one, which is:

Perfect verifiability doesn’t exist

LLMs inherently are probabilistic. No matter how much you might want it, there is no perfect verifiability of what it produces. Instead what’s needed is to find ways to deal with the fact that occasionally it will get things wrong. — Read More

#devops

China launches ‘Blue Whale’ – world’s first high-speed typhoon-proof uncrewed submersible

AI-equipped research vessel can stay underwater for a month and launch research rockets, marking ‘leap’ for marine exploration, typhoon research

China has launched the world’s first high-speed uncrewed submersible, a vessel that can operate underwater for 30 days, withstand extreme weather, and launch research rockets, marking a major advance in the country’s maritime technology.

The “Blue Whale”, which measures 11 metres (36 feet) long and weighs 12 tonnes, and combines the functions of both a high-speed surface craft and an underwater vessel, was launched in the southern city of Zhuhai on Monday. — Read More

#china-ai

s1: Simple test-time scaling

Test-time scaling is a promising new approach to language modeling that uses extra test-time compute to improve performance. Recently, OpenAI’s o1 model showed this capability but did not publicly share its methodology, leading to many replication efforts. We seek the simplest approach to achieve test-time scaling and strong reasoning performance. First, we curate a small dataset s1K of 1,000 questions paired with reasoning traces relying on three criteria we validate through ablations: difficulty, diversity, and quality. Second, we develop budget forcing to control test-time compute by forcefully terminating the model’s thinking process or lengthening it by appending “Wait” multiple times to the model’s generation when it tries to end. This can lead the model to double-check its answer, often fixing incorrect reasoning steps. After supervised finetuning the Qwen2.5-32B-Instruct language model on s1K and equipping it with budget forcing, our model s1-32B exceeds o1-preview on competition math questions by up to 27% (MATH and AIME24). Further, scaling s1-32B with budget forcing allows extrapolating beyond its performance without test-time intervention: from 50% to 57% on AIME24. Our model, data, and code are open-source at this https URLRead More

#performance

Alibaba unveils Qwen3, a family of ‘hybrid’ AI reasoning models

Chinese tech company Alibaba on Monday released Qwen3, a family of AI models that the company claims can match and, in some cases, outperform the best models available from Google and OpenAI.

Most of the models are — or soon will be — available for download under an “open” license on AI dev platform Hugging Face and GitHub. They range in size from 0.6 billion parameters to 235 billion parameters. (Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.) — Read More

#nlp