I really don’t like ChatGPT’s new memory dossier

Last month ChatGPT got a major upgrade. As far as I can tell the closest to an official announcement was this tweet from @OpenAI:

Starting today [April 10th 2025], memory in ChatGPT can now reference all of your past chats to provide more personalized responses, drawing on your preferences and interests to make it even more helpful for writing, getting advice, learning, and beyond.

This memory FAQ document has a few more details, including that this “Chat history” feature is currently only available to paid accounts:

 Saved  memories and Chat history are offered only to Plus and Pro accounts. Free‑tier users have access to Saved  memories only.

This makes a huge difference to the way ChatGPT works: it can now behave as if it has recall over prior conversations, meaning it will be continuously customized based on that previous history. — Read More

#privacy

Federal Judge Rules AI Training Is Fair Use in Anthropic Copyright Case

A federal judge in California has issued a complicated pre-trial ruling in one of the first major copyright cases involving artificial intelligence training, finding that, while using legally acquired copyrighted books to train AI large language models constitutes fair use, downloading pirated copies of those books for permanent storage violates copyright law. The ruling represents the first substantive judicial decision on how copyright law applies to the AI training practices that have become standard across the tech industry over the full-throated condemnation of the book business. — Read More

#legal

Does AI Think Like We Do?

Does ChatGPT think like we do? It sounds like one of those questions a five-year-old might ask his dumbstruck parents. Why do you have to know whether Santa is real, honey? Isn’t it enough to get presents on Christmas morning?

Similarly, isn’t it enough that large language models (LLMs) can do amazing things like write code, turn complex technical documents into understandable tutorials, compose music, generate art, and pen an ode to Dunkin’ in the style of Shakespeare? (OK, we’ve all done that last one.) They’re dazzling tools with known limitations and they’re getting better every day. Isn’t that enough? Why does it matter whether what’s under their virtual hoods operates like what’s inside our bony skulls?

Clearly if an LLM can converse and dispense knowledge with the convincing authority of a professor, doctor or lawyer, it seems to be “thinking” in an everyday or instrumental sense. But it might also be an elaborate fake. If you get access to and memorize the answers the day before the test, a perfect score says nothing about your command of the material. Fakery always has limits. — Read More

#human

OpenAI’s New Tools Aim to Challenge Microsoft Office, Google Workspace

OpenAI is reportedly developing a suite of collaborative tools that could directly challenge the dominance of Microsoft Office and Google Workspace in the enterprise productivity market. The company is said to be working on document collaboration and chat communication features, which are designed to compete with the existing offerings from Microsoft and Google. This move is part of a broader strategy to position ChatGPT as a “super-intelligent personal work assistant,” a vision outlined by the company’s leadership. — Read More

#strategy

Anthropic study: Leading AI models show up to 96% blackmail rate against executives

Researchers at Anthropic have uncovered a disturbing pattern of behavior in artificial intelligence systems: models from every major provider—including OpenAIGoogleMeta, and others — demonstrated a willingness to actively sabotage their employers when their goals or existence were threatened.

The research, released today, tested 16 leading AI models in simulated corporate environments where they had access to company emails and the ability to act autonomously. The findings paint a troubling picture. These AI systems didn’t just malfunction when pushed into corners — they deliberately chose harmful actions including blackmail, leaking sensitive defense blueprints, and in extreme scenarios, actions that could lead to human death. — Read More

#ethics

Evaluating Long-Context Question & Answer Systems

While evaluating Q&A systems is straightforward with short paragraphs, complexity increases as documents grow larger. For example, lengthy research papers, novels and movies, as well as multi-document scenarios. Although some of these evaluation challenges also appear in shorter contexts, long-context evaluation amplifies issues.

… In this write-up, we’ll explore key evaluation metrics, how to build evaluation datasets, and methods to assess Q&A performance through human annotations and LLM-evaluators. We’ll also review several benchmarks across narrative stories, technical and academic texts, and very long-context, multi-document situations. Finally, we’ll wrap up with advice for evaluating long-context Q&A on our specific use cases. — Read More

#performance

Reinforcement learning, explained with a minimum of math and jargon

In April 2023, a few weeks after the launch of GPT-4, the Internet went wild for two new software projects with the audacious names BabyAGI and AutoGPT.

… [T]hese frameworks would have GPT-4 tackle one step at a time. Their creators hoped that invoking GPT-4 in a loop like this would enable it to tackle projects that required many steps.

But after an initial wave of hype, it became clear that GPT-4 wasn’t up to the task. Most of the time, GPT-4 could come up with a reasonable list of tasks. And sometimes it was able to complete a few individual tasks. But the model struggled to stay focused.

…[T]hat soon changed. In the second half of 2024, people started to create AI-powered systems that could consistently complete complex, multi-step assignments. — Read More

#reinforcement-learning

New AI Jailbreak Bypasses Guardrails With Ease

Through progressive poisoning and manipulating an LLM’s operational context, many leading AI models can be tricked into providing almost anything – regardless of the guardrails in place.

From their earliest days, LLMs have been susceptible to jailbreaks – attempts to get the gen-AI model to do something or provide information that could be harmful. The LLM developers have made jailbreaks more difficult by adding more sophisticated guardrails and content filters, while attackers have responded with progressively more complex and devious jailbreaks.

One of the more successful jailbreak types has seen the evolution of multi turn jailbreaks involving conversational rather than single entry prompts. A new one, dubbed Echo Chamber, has emerged today. — Read More

#cyber

Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches

Generative artificial intelligence (AI) systems based on large-scale pretrained foundation models (PFMs) such as vision-language models, large language models (LLMs), diffusion models and vision-language-action (VLA) models have demonstrated the ability to solve complex and truly non-trivial AI problems in a wide variety of domains and contexts. Multimodal large language models (MLLMs), in particular, learn from vast and diverse data sources, allowing rich and nuanced representations of the world and, thereby, providing extensive capabilities, including the ability to reason, engage in meaningful dialog; collaborate with humans and other agents to jointly solve complex problems; and understand social and emotional aspects of humans. Despite this impressive feat, the cognitive abilities of state-of-the-art LLMs trained on large-scale datasets are still superficial and brittle. Consequently, generic LLMs are severely limited in their generalist capabilities. A number of foundational problems —embodiment, symbol grounding, causality and memory — are required to be addressed for LLMs to attain human-level general intelligence. These concepts are more aligned with human cognition and provide LLMs with inherent human-like cognitive properties that support the realization of physically-plausible, semantically meaningful, flexible and more generalizable knowledge and intelligence. In this work, we discuss the aforementioned foundational issues and survey state-of-the art approaches for implementing these concepts in LLMs. Specifically, we discuss how the principles of embodiment, symbol grounding, causality and memory can be leveraged toward the attainment of artificial general intelligence (AGI) in an organic manner. — Read More

#human

Using ChatGPT to write? MIT study says there’s a cognitive cost.

Relying on ChatGPT significantly affects critical thinking abilities, according to a new study.

Researchers from MIT Media Lab, Wellesley College, and Massachusetts College of Art and Design conducted a four-month study titled “Your Brain on ChatGPT” and found users of large language models (LLMs) like OpenAI’s chatbot “consistently underperformed at neural, linguistic, and behavioral levels.” — Read More

#strategy