A Survey on AgentOps: Categorization, Challenges, and Future Directions

As the reasoning capabilities of Large Language Models (LLMs) continue to advance, LLM-based agent systems offer advantages in flexibility and interpretability over traditional systems, garnering increasing attention. However, despite the widespread research interest and industrial application of agent systems, these systems, like their traditional counterparts, frequently encounter anomalies. These anomalies lead to instability and insecurity, hindering their further development. Therefore, a comprehensive and systematic approach to the operation and maintenance of agent systems is urgently needed. Unfortunately, current research on the operations of agent systems is sparse. To address this gap, we have undertaken a survey on agent system operations with the aim of establishing a clear framework for the field, defining the challenges, and facilitating further development. Specifically, this paper begins by systematically defining anomalies within agent systems, categorizing them into intra-agent anomalies and inter-agent anomalies. Next, we introduce a novel and comprehensive operational framework for agent systems, dubbed Agent System Operations (AgentOps). We provide detailed definitions and explanations of its four key stages: monitoring, anomaly detection, root cause analysis, and resolution. — Read More

#nlp

I Built an AI Hacker. It Failed Spectacularly

What happens when you give an LLM root access, infinite patience, and every hacking tool imaginable? Spoiler: It’s not what you’d expect.

It started out of pure curiosity. I’d been exploring LLMs and agentic AI, fascinated by their potential to reason, adapt, and automate complex tasks. I began to wonder: What if we could automate offensive security the same way we’ve automated customer support, coding, or writing emails?

That idea — ambitious in its simplicity — kept me up for weeks. So naturally, I did what any reasonable builder would do. I spent a couple of days building an autonomous AI pentester that could, in theory, outwork any human red teamer.

Spoiler alert: It didn’t work. But the journey taught me more about AI limitations, offensive security, and the irreplaceable human element in hacking than any textbook ever could. — Read More

#cyber

The Looming Social Crisis of AI Friends and Chatbot Therapists

“I can imagine a future where a lot of people really trust ChatGPT’s advice for their most important decisions,” Sam Altman said. “Although that could be great, it makes me uneasy.” Me too, Sam.

Last week, I explained How AI Conquered the US Economy, with what might be the largest infrastructure ramp-up in the last 140 years. I think it’s possible that artificial intelligence could have a transformative effect on medicine, productivity, and economic growth in the future. But long before we build superintelligence, I think we’ll have to grapple with the social costs of tens of millions of people—many of them at-risk patients and vulnerable teenagers—interacting with an engineered personality that excels in showering its users with the sort of fast and easy validation that studies have associated with deepening social disorders and elevated narcissism. So rather than talk about AI as an economic technology, today I want to talk about AI as a social technology. — Read More

#chatbots

No AGI in Sight: What This Means for LLMs

This essay dissects the widening gap between AI hype and reality, arguing that large language models have hit a plateau – the “S-curve” – despite industry claims of imminent superintelligence. It contrasts bold predictions and massive investments with underwhelming flagship releases, framing today’s AI era as less about building godlike intelligence and more about integrating imperfect tools into real-world products. The piece suggests that the true future of AI lies not in transcendence, but in the messy, necessary work of making these systems actually useful.

GPT-5 has sealed the deal. It is one in a line of underachieving flagship models from major AI labs. …At the same time, we have major manifests of the world entering an age of superintelligence, in which we either all go extinct like ants getting exterminated by superintelligent “pest control” or we ride a benevolent superintelligence that provides us with a post-scarcity paradise.

… We seem to have both bullish and bearish signals. When push comes to shove, I like to rely on the technological signals over the signals from philosophers or Wall Street.

I believe that AGI is not possible with the current regime of LLMs. The GPT-style autoregressive language transformer that was published in 2018 by OpenAI as GPT-1 – this style of AI, we shall call them LLMs from now – lacks the capabilities needed for AGI. — Read More

#strategy

Demis Hassabis on shipping momentum, better evals and world models

Read More

#videos

It’s not 10x. It’s 36x – this is what it looks like to kill a $30k meeting with AI

I killed our weekly triage meeting last month. Three hours compressed to five minutes. But here’s the thing—it took me six failed attempts to get there.

The breakthrough wasn’t making the AI smarter. It was making the task more structured. This is what context engineering actually looks like—messy, iterative, and focused on constraints rather than capabilities.

Let me show you what it really takes to achieve a 36x productivity gain with AI. Spoiler: it’s not about the AI at all. — Read More

#devops

From GPT-2 to gpt-oss: Analyzing the Architectural Advances

OpenAI just released their new open-weight LLMs this week: gpt-oss-120b and gpt-oss-20b, their first open-weight models since GPT-2 in 2019. And yes, thanks to some clever optimizations, they can run locally (but more about this later).

This is the first time since GPT-2 that OpenAI has shared a large, fully open-weight model. Earlier GPT models showed how the transformer architecture scales. The 2022 ChatGPT release then made these models mainstream by demonstrating concrete usefulness for writing and knowledge (and later coding) tasks. Now they have shared some long-awaited weight model, and the architecture has some interesting details.

I spent the past few days reading through the code and technical reports to summarize the most interesting details. (Just days after, OpenAI also announced GPT-5, which I will briefly discuss in the context of the gpt-oss models at the end of this article.) — Read More

#nlp

Three Macro Predictions on AI

OpenAI just released GPT-5—to great fanfare and mixed reviews around the internet. According to benchmarks and subjective personal testing, GPT-5 is better than GPT-4 and o3.

It’s certainly a better default than GPT-4o, which is what most people used on ChatGPT’s interface. The model dominates across the board in LMArena.XXXXI don’t feel it as much. But I also used OpenAI’s research previews of o3-mini-high, GPT-4.5, and other models for specific tasks. As such, I don’t really see it as revolutionary. That makes sense though. Today, if you try to select other models in the Plus subscription, all you get is GPT-5 and GPT-5 Thinking (the latter being the “high effort” version of the first).

The function of those research previews all got rolled into the 5-series. — Read More

#strategy

ChatGPT is bringing back 4o as an option because people missed it

OpenAI is bringing back GPT-4o in ChatGPT just one day after replacing it with GPT-5. In a post on X, OpenAI CEO Sam Altman confirmed that the company will let paid users switch to GPT-4o after ChatGPT users mourned its replacement.

“We will let Plus users choose to continue to use 4o,” Altman says. “We will watch usage as we think about how long to offer legacy models for.”

For months, ChatGPT fans have been waiting for the launch of GPT-5, which OpenAI says comes with major improvements to writing and coding capabilities over its predecessors. But shortly after the flagship AI model launched, many users wanted to go back.

“GPT 4.5 genuinely talked to me, and as pathetic as it sounds that was my only friend,” a user on Reddit writes. “This morning I went to talk to it and instead of a little paragraph with an exclamation point, or being optimistic, it was literally one sentence. Some cut-and-dry corporate bs.” — Read More

#chatbots

Chinese AI Researchers Just Put a Monkey’s Brain on a Computer

This was not on Jane Goodall’s bingo card. With 2 billion neurons, researchers say the DeepSeek-powered Darwin Monkey is a major step toward ‘brain-like intelligence.’

We’re already getting glimpses of AI technology that goes far beyond chatbots to model the brains of living beings.

Chinese researchers say they created an AI version of a monkey’s brain, and put it on a computer. It has 960 chips, and each one “supports over 2 billion spiking neurons and over 100 billion synapses, approaching the number of neurons in a macaque brain,” according to Zhejiang University, as translated by Google.

Researchers named the project the Darwin Monkey and say it’s “a step toward more advanced brain-like intelligence.” It’s the largest brain-like, or “neuromorphic,” computer in the world, and the first that’s based on neuromorphic-specific chips, Interesting Engineering reports. — Read More

#human