The Hidden Dangers of Browsing AI Agents 

Autonomous browsing agents powered by large language models (LLMs) are increasingly used to automate web-based tasks. However, their reliance on dynamic content, tool execution, and user-provided data exposes them to a broad attack surface. This paper presents a comprehensive security evaluation of such agents, focusing on systemic vulnerabilities across multiple architectural layers.

Our work outlines the first end-to-end threat model for browsing agents and provides actionable guidance for securing their deployment in real-world environments. To address discovered threats, we propose a defense-in-depth strategy incorporating input sanitization, planner-executor isolation, formal analyzers, and session safeguards—providing protection against both initial access and post-exploitation attack vectors.

Through a white-box analysis of a popular open-source project Browser Use, we demonstrate how untrusted web content can hijack agent behavior and lead to critical security breaches. Our findings include prompt injection, domain validation bypass, and credential exfiltration, evidenced by a disclosed CVE and a working proof-of-concept exploit. — Read More

#trust

snorting the agi with claude code

I was planning to write a nice overview on using claude code for both myself and my teammates. However, the more I experimented with it, the more intrigued I became. So, this is not an introductory article about claude code – Anthropic already released an excellent version of that. Instead:

We will be doing Serious Science™

What does that mean, exactly? Well, some of this is valuable, but other parts are a bit more…experimental, let’s say.

“Sometimes science is more art than science, Morty. A lot of people don’t get that.” – Rick Sanchez

Additionally, I wouldn’t say this is the most budget friendly project. I’m using Claude Max which is $250 a month. I’ll let you decide on how much money you feel comfortable lighting on fire.

Nevertheless, let’s not waste any more time… — Read More

#devops

Godfather of AI: I Tried to Warn Them, But We’ve Already Lost Control! Geoffrey Hinton

Read More

#videos

The Disney approved our insane AI ad to run during the NBA Finals

Read More

#videos

Have LLMs Finally Mastered Geolocation?

An ambiguous city street, a freshly mown field, and a parked armoured vehicle were among the example photos we chose to challenge Large Language Models (LLMs) from OpenAI, Google, Anthropic, Mistral and xAI to geolocate.

Back in July 2023, Bellingcat analysed the geolocation performance of OpenAI and Google’s models. Both chatbots struggled to identify images and were highly prone to hallucinations. However, since then, such models have rapidly evolved.

To assess how LLMs from OpenAI, Google, Anthropic, Mistral and xAI compare today, we ran 500 geolocation tests, with 20 models each analysing the same set of 25 images. — Read More

#chatbots

The AI Eval Flywheel: Scorers, Datasets, Production Usage & Rapid Iteration

Last week I attended the 2025 AI Engineer World’s Fair in San Francisco with a bunch of other founders from Seattle Foundations.

There were over 20 tracks on specific topics, and I went particularly deep on Evals, learning firsthand how companies like Google, Notion, Zapier, and Vercel build and deploy evals for their AI features.

While there were meaningful unique details in each talk, there was also surprising consistency on the general framework which I’m representing with this flywheel. — Read More

#strategy

MCP Explained: The New Standard Connecting AI to Everything

AI agents can write code, summarize reports, even chat like humans — but when it’s time to actually do something in the real world, they stall.

Why? Because most tools still need clunky, one-off integrations.

MCP (Model Context Protocol) changes that. It gives AI agents a simple, standardized way to plug into tools, data, and services — no hacks, no hand-coding.

With MCP, AI goes from smart… to actually useful.Read More

#devops

How the smartest founders are winning in AI

Read More

#investing, #videos

Boston Dynamics Makes AGT HISTORY With Robots Dancing To “Don’t Stop Me Now” by Queen

Read More

#robotics, #videos

Tech giants join government to kick off plans to boost British worker AI skills

A fifth of the UK workforce will be supported with the AI skills they need to thrive in their jobs, breaking down barriers to opportunity and unlocking economic growth.

That’s the message Technology Secretary Peter Kyle delivered this week (Friday 13 June) as he brought together leading tech firms for a first round of focused talks. 

Peter Kyle met the likes of Amazon, Barclays, BT, Google, IBM, Intuit, Microsoft, Sage, and Salesforce, as a new government-industry partnership unveiled by the Prime Minister during London Tech Week formally kicked off its work. — Read More

#strategy