Leaking Secrets in the Age of AI

In a rush to adopt and experiment with AI, developers and other technology practitioners are willing to cut corners. This is evident from multiple recent security incidents, such as: 

Yet another side-effect of these hasty practices is the leakage of AI-related secrets in public code repositories. Secrets in public code repositories are nothing new. What’s surprising is the fact that after years of research, numerous security incidents, millions of dollars in bug bounty hunters’ pockets, and general awareness of the risk, it is still painfully easy to find valid secrets in public repositories.   — Read More

#cyber

Langfuse and ClickHouse: A new data stack for modern LLM applications

Building an AI demo application is easy, but making it work reliably is hard. Open-ended user inputs, model reasoning, and agentic tool use require a new workflow to iteratively measure, evaluate, and improve these systems as a team.

Langfuse helps developers solve that problem. Its open-source LLM engineering platform gives teams the tools to trace, evaluate, and improve performance, whether they’re debugging prompts, testing model responses, or analyzing billions of interactions.

For companies working with sensitive or large-scale data, part of Langfuse’s appeal lies in its flexibility: it can be self-hosted or used as a managed cloud service. his flexibility helped Langfuse gain early traction with large enterprises—but it also created a scaling challenge. By mid-2024, the simple Postgres-based architecture that powered both their cloud and self-hosted offerings was under pressure. The platform was handling billions of rows, fielding complex queries across multiple UIs, and struggling to keep up with rapidly scaling customers generating massive amounts of data. Something had to change.

At a March 2025 ClickHouse meetup in San Francisco, Langfuse co-founder Clemens Rawert shared how the team re-architected their platform with ClickHouse as the “centerpiece” of their data operations. He also explained how they rolled out that change to thousands of self-hosted users, turning a major infrastructure change into a win for the entire community. — Read More

#data-lake

Begun, the AI Browser Wars Have

About a week ago, I bit the bullet. Reading the writing very clearly on the wall, I abandoned the Arc browser and jumped ship over to Dia, the new AI-first web browser built by The Browser Company. It took a while, but now I think I’m sold. I’m not sure that Dia itself will be the browser of the future, but I’m more certain than ever that an AI-centric browser will be.

At first, I found Dia to be a bit too simple for my taste. Because Arc had such a plethora of power-user features, many of which took a lot of training to get used to, it was hard to “downgrade”. Of course, that’s also the exact reason why The Browser Company shifted the focus to Dia. While Arc had a dedicated fan base, it was also clearly never going to go fully mainstream. They had made a better browser for web power users, but most people were not web power users – at least not in the sense that they would take the time to learn new tricks when Chrome was likely good enough for them. It was a tricky spot for the company to be in, they had sort of painted themselves into a dreaded middle ground.

So they sort of went down a third path, letting Arc live on, but really just with underlying engine (Chromium) updates. — Read More

#strategy

I really don’t like ChatGPT’s new memory dossier

Last month ChatGPT got a major upgrade. As far as I can tell the closest to an official announcement was this tweet from @OpenAI:

Starting today [April 10th 2025], memory in ChatGPT can now reference all of your past chats to provide more personalized responses, drawing on your preferences and interests to make it even more helpful for writing, getting advice, learning, and beyond.

This memory FAQ document has a few more details, including that this “Chat history” feature is currently only available to paid accounts:

 Saved  memories and Chat history are offered only to Plus and Pro accounts. Free‑tier users have access to Saved  memories only.

This makes a huge difference to the way ChatGPT works: it can now behave as if it has recall over prior conversations, meaning it will be continuously customized based on that previous history. — Read More

#privacy

Federal Judge Rules AI Training Is Fair Use in Anthropic Copyright Case

A federal judge in California has issued a complicated pre-trial ruling in one of the first major copyright cases involving artificial intelligence training, finding that, while using legally acquired copyrighted books to train AI large language models constitutes fair use, downloading pirated copies of those books for permanent storage violates copyright law. The ruling represents the first substantive judicial decision on how copyright law applies to the AI training practices that have become standard across the tech industry over the full-throated condemnation of the book business. — Read More

#legal

Does AI Think Like We Do?

Does ChatGPT think like we do? It sounds like one of those questions a five-year-old might ask his dumbstruck parents. Why do you have to know whether Santa is real, honey? Isn’t it enough to get presents on Christmas morning?

Similarly, isn’t it enough that large language models (LLMs) can do amazing things like write code, turn complex technical documents into understandable tutorials, compose music, generate art, and pen an ode to Dunkin’ in the style of Shakespeare? (OK, we’ve all done that last one.) They’re dazzling tools with known limitations and they’re getting better every day. Isn’t that enough? Why does it matter whether what’s under their virtual hoods operates like what’s inside our bony skulls?

Clearly if an LLM can converse and dispense knowledge with the convincing authority of a professor, doctor or lawyer, it seems to be “thinking” in an everyday or instrumental sense. But it might also be an elaborate fake. If you get access to and memorize the answers the day before the test, a perfect score says nothing about your command of the material. Fakery always has limits. — Read More

#human

OpenAI’s New Tools Aim to Challenge Microsoft Office, Google Workspace

OpenAI is reportedly developing a suite of collaborative tools that could directly challenge the dominance of Microsoft Office and Google Workspace in the enterprise productivity market. The company is said to be working on document collaboration and chat communication features, which are designed to compete with the existing offerings from Microsoft and Google. This move is part of a broader strategy to position ChatGPT as a “super-intelligent personal work assistant,” a vision outlined by the company’s leadership. — Read More

#strategy

Anthropic study: Leading AI models show up to 96% blackmail rate against executives

Researchers at Anthropic have uncovered a disturbing pattern of behavior in artificial intelligence systems: models from every major provider—including OpenAIGoogleMeta, and others — demonstrated a willingness to actively sabotage their employers when their goals or existence were threatened.

The research, released today, tested 16 leading AI models in simulated corporate environments where they had access to company emails and the ability to act autonomously. The findings paint a troubling picture. These AI systems didn’t just malfunction when pushed into corners — they deliberately chose harmful actions including blackmail, leaking sensitive defense blueprints, and in extreme scenarios, actions that could lead to human death. — Read More

#ethics

Evaluating Long-Context Question & Answer Systems

While evaluating Q&A systems is straightforward with short paragraphs, complexity increases as documents grow larger. For example, lengthy research papers, novels and movies, as well as multi-document scenarios. Although some of these evaluation challenges also appear in shorter contexts, long-context evaluation amplifies issues.

… In this write-up, we’ll explore key evaluation metrics, how to build evaluation datasets, and methods to assess Q&A performance through human annotations and LLM-evaluators. We’ll also review several benchmarks across narrative stories, technical and academic texts, and very long-context, multi-document situations. Finally, we’ll wrap up with advice for evaluating long-context Q&A on our specific use cases. — Read More

#performance

Reinforcement learning, explained with a minimum of math and jargon

In April 2023, a few weeks after the launch of GPT-4, the Internet went wild for two new software projects with the audacious names BabyAGI and AutoGPT.

… [T]hese frameworks would have GPT-4 tackle one step at a time. Their creators hoped that invoking GPT-4 in a loop like this would enable it to tackle projects that required many steps.

But after an initial wave of hype, it became clear that GPT-4 wasn’t up to the task. Most of the time, GPT-4 could come up with a reasonable list of tasks. And sometimes it was able to complete a few individual tasks. But the model struggled to stay focused.

…[T]hat soon changed. In the second half of 2024, people started to create AI-powered systems that could consistently complete complex, multi-step assignments. — Read More

#reinforcement-learning