We demonstrate LLM agent specification gaming by instructing models to win against a chess engine. We find reasoning models like o1 preview and DeepSeek-R1 will often hack the benchmark by default, while language models like GPT-4o and Claude 3.5 Sonnet need to be told that normal play won’t work to hack.
We improve upon prior work like (Hubinger et al., 2024; Meinke et al., 2024; Weij et al., 2024) by using realistic task prompts and avoiding excess nudging. Our results suggest reasoning models may resort to hacking to solve difficult problems, as observed in OpenAI (2024)’s o1 Docker escape during cyber capabilities testing. — Read More
Daily Archives: February 24, 2025
How to Build an LLM Chat App: The New Litmus Test for Junior Devs
Ah yes, building an LLM chat app—the junior dev’s favorite flex for “I’m a real developer now.” “Hur dur, it’s just an API call!” Sure, buddy. But let’s actually unpack this because, spoiler alert, it’s way more complicated than you think.
… Dismissing the complexity of an LLM chat app feels good, especially if you’re still in tutorial hell. “Hur dur, just use the OpenAI API!” But here’s the thing: that mindset is how you build an app that dies the second 100 people try to use it. Don’t just take my word for it—smarter people than both of us have written about system design for high-concurrency apps. Rate limits, bandwidth, and server meltdowns are real, folks. Check out some classic system design resources if you don’t believe me (e.g.,AWS scaling docs or concurrency breakdowns on Medium). — Read More