LLM Pretraining with Continuous Concepts

Next token prediction has been the standard training objective used in large language model pretraining. Representations are learned as a result of optimizing for token-level perplexity. We propose Continuous Concept Mixing (CoCoMix), a novel pretraining framework that combines discrete next token prediction with continuous concepts. Specifically, CoCoMix predicts continuous concepts learned from a pretrained sparse autoencoder and mixes them into the model’s hidden state by interleaving with token hidden representations. Through experiments on multiple benchmarks, including language modeling and downstream reasoning tasks, we show that CoCoMix is more sample efficient and consistently outperforms standard next token prediction, knowledge distillation and inserting pause tokens. We find that combining both concept learning and interleaving in an end-to-end framework is critical to performance gains. Furthermore, CoCoMix enhances interpretability and steerability by allowing direct inspection and modification of the predicted concept, offering a transparent way to guide the model’s internal reasoning process. — Read More

#training

Will AI Take Your Job, and When?

With the release of Deep Research tools by the likes of OpenAI, Google, and most recently, Perplexity (cheapest option by far and reasonably well tested against the others), concerns about job safety and displacement due to AI are growing.

But we know history repeats, so does history support these fears? And if so, what skills will be necessary to survive in an AI world?

We’ll discuss the impact timelines previous industrial revolutions had on the economy, examine the most recent research on adoption and productivity from one top University and one top AI lab, understand what it means to be an ‘AI Human,’ finally, giving you the best mental model to analyze whether AI will take your job. — Read More

#strategy

ChinAI #300: Artificial Challenged Intelligence [人工智障] in China’s most humble profession

About 150 ChinAI issues ago, in June 2021, I started seeing the phrase 人工智障 [which I translate as “artificial challenged intelligence”] pop up in Chinese media. Bloggers used the term to make fun of billboard displays that intended to name-and-shame jay-walkers but ended up featuring faces from bus ads (ChinAI #144). Comic artists captured the frustrations of using a smart sweeping robot (ChinAI #165). This week’s feature translation (link to original NetEase DataBlog article) examines artificial challenged intelligence in the context of China’s customer service industry.

Key Takeaways: China has made a stark transition to AI customer service — a 17-fold growth in the market in the past seven years — but this has produced more customer dissatisfaction. — Read More

#china-ai

The Only AI Moat is Hardware

And Compute is the Upper Bound for Achievable Intelligence

I have lost count of how many times I have been asked about DeepSeek over the past week — specifically, whether it signals the obsolescence of high-performance AI compute or, by extension, the beginning of the end for NVIDIA.

The answer is “No.” — but if you still need more than one word, here is why. — Read More

#strategy