Rick's Cafe AI 9:23 am on August 18, 2025
Tags: NLP

Is GPT-5 a “phenomenal” success or an “underwhelming” failure?

It was inevitable that people would be disappointed with last week’s release of GPT-5. That’s not because OpenAI did a poor job, and it’s not even because OpenAI did anything in particular to hype up the new version. The problem was simply that OpenAI’s previous “major” model releases—GPT-2, GPT-3, and GPT-4—have been so consequential.

… So of course people had high expectations for GPT-5. And OpenAI seems to have worked hard to meet those expectations.

…. OpenAI probably should have given the GPT-5 name to o1, the reasoning model OpenAI announced last September. That model really did deliver a dramatic performance improvement over previous models. It was followed by o3, which pushed this paradigm—based on reinforcement learning and long chains of thought—to new heights. But we haven’t seen another big jump in performance over the last six months, suggesting that the reasoning paradigm may also be reaching a point of diminishing returns (though it’s hard to know for certain).

Regardless, OpenAI found itself in a tough spot in early 2025. It needed to release something it could call GPT-5, but it didn’t have anything that could meet the sky-high expectations that had developed around that name. So rather than using the GPT-5 name for a dramatically better model, it decided to use it to signal a reboot of ChatGPT as a product.

… The reality is that GPT-5 is a solid model (or technically suite of models—we’ll get to that) that performs as well or better than anything else on the market today. In my own testing over the last week, I found GPT-5 to be the most capable model I’ve ever used. But it’s not the kind of dramatic breakthrough people expected from the GPT-5 name. And it has some rough edges that OpenAI is still working to sand down. — Read More

#nlp

Rick's Cafe AI 1:25 pm on August 15, 2025
Tags: NLP

Doomprompting Is the New Doomscrolling.

The blank box of ChatGPT, Claude, or your large language model of choice staring back at you felt like a clean slate. Here was a remarkable new technology that put the world’s knowledge at our fingertips, and all it asked of us was intention.

We would never doomscroll an LLM — right?

But even the most promising technologies have an evil twin, and the blank box of curiosity is no exception. Where social media trained us to passively consume, the dark side of AI trains us to passively “converse” and “create.” — Read More

#nlp

Rick's Cafe AI 3:16 pm on August 13, 2025
Tags: NLP

A Survey on AgentOps: Categorization, Challenges, and Future Directions

As the reasoning capabilities of Large Language Models (LLMs) continue to advance, LLM-based agent systems offer advantages in flexibility and interpretability over traditional systems, garnering increasing attention. However, despite the widespread research interest and industrial application of agent systems, these systems, like their traditional counterparts, frequently encounter anomalies. These anomalies lead to instability and insecurity, hindering their further development. Therefore, a comprehensive and systematic approach to the operation and maintenance of agent systems is urgently needed. Unfortunately, current research on the operations of agent systems is sparse. To address this gap, we have undertaken a survey on agent system operations with the aim of establishing a clear framework for the field, defining the challenges, and facilitating further development. Specifically, this paper begins by systematically defining anomalies within agent systems, categorizing them into intra-agent anomalies and inter-agent anomalies. Next, we introduce a novel and comprehensive operational framework for agent systems, dubbed Agent System Operations (AgentOps). We provide detailed definitions and explanations of its four key stages: monitoring, anomaly detection, root cause analysis, and resolution. — Read More

#nlp

Rick's Cafe AI 7:54 am on August 12, 2025
Tags: NLP

From GPT-2 to gpt-oss: Analyzing the Architectural Advances

OpenAI just released their new open-weight LLMs this week: gpt-oss-120b and gpt-oss-20b, their first open-weight models since GPT-2 in 2019. And yes, thanks to some clever optimizations, they can run locally (but more about this later).

This is the first time since GPT-2 that OpenAI has shared a large, fully open-weight model. Earlier GPT models showed how the transformer architecture scales. The 2022 ChatGPT release then made these models mainstream by demonstrating concrete usefulness for writing and knowledge (and later coding) tasks. Now they have shared some long-awaited weight model, and the architecture has some interesting details.

I spent the past few days reading through the code and technical reports to summarize the most interesting details. (Just days after, OpenAI also announced GPT-5, which I will briefly discuss in the context of the gpt-oss models at the end of this article.) — Read More

#nlp

Rick's Cafe AI 10:50 am on August 11, 2025
Tags: NLP

OpenAI launches GPT-5 free to all ChatGPT users

On Thursday, OpenAI announced GPT-5 and three variants—GPT-5 Pro, GPT-5 mini, and GPT-5 nano—what the company calls its “best AI system yet,” with availability for some of the models across all ChatGPT tiers, including free users. The new model family arrives with claims of reduced confabulations, improved coding capabilities, and a new approach to handling sensitive requests that OpenAI calls “safe completions.”

It’s also the first time OpenAI has given free users access to a simulated reasoning AI model, which breaks problems down into multiple steps using a technique that tends to improve answer accuracy for logical or analytical questions. — Read More

#nlp

Rick's Cafe AI 8:48 am on July 30, 2025
Tags: NLP

Subliminal Learning: Language Models Transmit Behavioral Traits via Hidden Signals in Data

tl;dr We study subliminal learning, a surprising phenomenon where language models learn traits from model-generated data that is semantically unrelated to those traits. For example, a “student” model learns to prefer owls when trained on sequences of numbers generated by a “teacher” model that prefers owls. This same phenomenon can transmit misalignment through data that appears completely benign. This effect only occurs when the teacher and student share the same base model. — Read More

Read the Paper; Access the Code

#nlp

Rick's Cafe AI 12:15 pm on July 24, 2025
Tags: NLP

Context Engineering: 2025’s #1 Skill in AI

Let’s get one thing straight: if you’re still only talking about “prompt engineering,” you’re behind the curve. In the early days of Large Language Models (LLMs), crafting the perfect prompt was the name of the game.

For simple chatbots in 2022, it was enough. Then came Retrieval-Augmented Generation (RAG) in 2023, where we started feeding models domain-specific knowledge. Now, we have tool-using, memory-enabled agents that need to build relationships and maintain state over time. The single-interaction focus of prompt engineering just doesn’t cut it anymore. — Read More

#nlp

Rick's Cafe AI 8:06 am on July 22, 2025
Tags: NLP

Context Engineering for AI Agents: Lessons from Building Manus

At the very beginning of the Manus project, my team and I faced a key decision: should we train an end-to-end agentic model using open-source foundations, or build an agent on top of the in-context learning abilities of frontier models?

Back in my first decade in NLP, we didn’t have the luxury of that choice. In the distant days of BERT (yes, it’s been seven years), models had to be fine-tuned—and evaluated—before they could transfer to a new task. That process often took weeks per iteration, even though the models were tiny compared to today’s LLMs. For fast-moving applications, especially pre–PMF, such slow feedback loops are a deal-breaker. That was a bitter lesson from my last startup, where I trained models from scratch for open information extraction and semantic search. Then came GPT-3 and Flan-T5, and my in-house models became irrelevant overnight. Ironically, those same models marked the beginning of in-context learning—and a whole new path forward.

That hard-earned lesson made the choice clear: Manus would bet on context engineering. This allows us to ship improvements in hours instead of weeks, and kept our product orthogonal to the underlying models: If model progress is the rising tide, we want Manus to be the boat, not the pillar stuck to the seabed. — Read More

#nlp

Rick's Cafe AI 11:39 am on July 15, 2025
Tags: NLP

LLM Daydreaming

Despite impressive capabilities, large language models have yet to produce a genuine breakthrough. The puzzle is why.

A reason may be that they lack some fundamental aspects of human thought: they are frozen, unable to learn from experience, and they have no “default mode” for background processing, a source of spontaneous human insight.

To solve this, I propose a day-dreaming loop (DDL): a background process that continuously samples pairs of concepts from memory. A generator model explores non-obvious links between them, and a critic model filters the results for genuinely valuable ideas. These discoveries are fed back into the system’s memory, creating a compounding feedback loop where new ideas themselves become seeds for future combinations. — Read More

#nlp

Rick's Cafe AI 8:01 am on July 7, 2025
Tags: NLP

Context Engineering for Agents

As Andrej Karpathy puts it, LLMs are like a new kind of operating system. The LLM is like the CPU and its context window is like the RAM, serving as the model’s working memory. Just like RAM, the LLM context window has limited capacity to handle various sources of context. And just as an operating system curates what fits into a CPU’s RAM, we can think about “context engineering” playing a similar role. Karpathy summarizes this well:

[Context engineering is the] ”…delicate art and science of filling the context window with just the right information for the next step.”

What are the types of context that we need to manage when building LLM applications? Context engineering as an umbrella that applies across a few different context types:

Tools – feedback from tool calls
Instructions – prompts, memories, few‑shot examples, tool descriptions, etc
Knowledge – facts, memories, etc

— Read More

#nlp

Recent Activity

Rick's Cafe AI

The latest in Artificial Intelligence carefully curated into its own special blend

Tag Archives: NLP

Is GPT-5 a “phenomenal” success or an “underwhelming” failure?

Doomprompting Is the New Doomscrolling.

A Survey on AgentOps: Categorization, Challenges, and Future Directions

From GPT-2 to gpt-oss: Analyzing the Architectural Advances

OpenAI launches GPT-5 free to all ChatGPT users

Subliminal Learning: Language Models Transmit Behavioral Traits via Hidden Signals in Data

Context Engineering: 2025’s #1 Skill in AI

Context Engineering for AI Agents: Lessons from Building Manus

LLM Daydreaming

Context Engineering for Agents