Effective context engineering for AI agents

After a few years of prompt engineering being the focus of attention in applied AI, a new term has come to prominence: context engineering. Building with language models is becoming less about finding the right words and phrases for your prompts, and more about answering the broader question of “what configuration of context is most likely to generate our model’s desired behavior?”

Context refers to the set of tokens included when sampling from a large-language model (LLM). The engineering problem at hand is optimizing the utility of those tokens against the inherent constraints of LLMs in order to consistently achieve a desired outcome. Effectively wrangling LLMs often requires thinking in context — in other words: considering the holistic state available to the LLM at any given time and what potential behaviors that state might yield.

In this post, we’ll explore the emerging art of context engineering and offer a refined mental model for building steerable, effective agents. — Read More

#nlp

Department of War Announces New Cybersecurity Risk Management Construct

The Department of War (DoW) today announced the implementation of a groundbreaking Cybersecurity Risk Management Construct (CSRMC), a transformative framework to deliver real-time cyber defense at operational speed. This five-phase construct ensures a hardened, verifiable, continuously monitored, and actively defended environment to ensure that U.S. warfighters maintain technological superiority against rapidly evolving and emerging cyber threats. — Read More

#dod

Do Humans Really Have World Models?

What if our world models are just as emergent and flimsy as AI’s?

I keep hearing that world models are the way forward for AI.

I tend to agree, and have been saying the same for many years as a technical person in AI but a non-A-tier-AI-researcher working on actual models.

Anyway, I’m up at 3:45AM today with an insane thought.

Why do we think humans have world models?Read More

#human

Building AI for cyber defenders

AI models are now useful for cybersecurity tasks in practice, not just theory. As research and experience demonstrated the utility of frontier AI as a tool for cyber attackers, we invested in improving Claude’s ability to help defenders detect, analyze, and remediate vulnerabilities in code and deployed systems. This work allowed Claude Sonnet 4.5 to match or eclipse Opus 4.1, our frontier model released only two months prior, in discovering code vulnerabilities and other cyber skills. Adopting and experimenting with AI will be key for defenders to keep pace.

We believe we are now at an inflection point for AI’s impact on cybersecurity.

For several years, our team has carefully tracked the cybersecurity-relevant capabilities of AI models. Initially, we found models to be not particularly powerful for advanced and meaningful capabilities. However, over the past year or so, we’ve noticed a shift.  — Read More

#cyber

AI Creation Tilly Norwood Isn’t an ‘Actress’ — So Don’t Call Her That

Let’s be frank: everyone thinks they can act. On a weekly basis, I have people ask me about “getting some voice over work for extra money” or doing a show “for fun.” And I have to wonder if any other industry is viewed this way. Do doctors have friends who suggest popping in for a quick organ transplant for kicks? Do relatives ask cops if they can borrow their gun and badge for a day? There’s a reason acting is so aspirational and yet so hard to succeed at.

When stories broke over the weekend about what people are calling the first AI-generated actress, Tilly Norwood, the response from Hollywood was so negative that one really had to wonder what the creators expected. In a time where the industry has been decimated by COVID, strikes and changing business models, who thought this would be celebrated? Celebrities from Kiersey Clemons to Melissa Barrera quickly weighed in, with the former noting: “How gross, read the room.” Perhaps Oscar-nominated actor Toni Collette said it best, when she posted the story with a series of screaming-face emojis. — Read More

#vfx

How Hackers Hack Websites

Read More

#cyber, #videos

Detecting and countering misuse of AI: August 2025

We’ve developed sophisticated safety and security measures to prevent the misuse of our AI models. But cybercriminals and other malicious actors are actively attempting to find ways around them. Today, we’re releasing a report that details how.

Our Threat Intelligence report discusses several recent examples of Claude being misused, including a large-scale extortion operation using Claude Code, a fraudulent employment scheme from North Korea, and the sale of AI-generated ransomware by a cybercriminal with only basic coding skills. We also cover the steps we’ve taken to detect and counter these abuses. — Read More

#cyber

There Are More Robots Working in China Than the Rest of the World Combined

China is making and installing factory robots at a far greater pace than any other country, with the United States a distant third, further strengthening China’s already dominant global role in manufacturing.

There were more than two million robots working in Chinese factories last year, according to a report released Thursday by the International Federation of Robotics, a nonprofit trade group for makers of industrial robots. Factories in China installed nearly 300,000 new robots last year, more than the rest of the world combined, the report found. American factories installed 34,000. — Read More

#robotics

Becoming a Research Engineer at a Big LLM Lab — 18 Months of Strategic Job Hunting

A couple of days ago, I signed as a research engineer with Mistral, one of the few ML foundation model labs with more than a billion-dollar funding.

My excitement on Twitter found quite some resonance — partly in the form of questions for advice. Getting here was not an accident. I have strategically worked towards this outcome for an extended period, and I have a few things to share about what worked for me. In a sense, this blog post is a sequel to How to become an ML Engineer in 5 to 7 steps, where I covered my self-taught path toward becoming a machine learning engineer from a non-CS (though STEM) background. Here, I outline how I worked towards what I hope will be a career-defining role. I started this work after working in my first ML position for about a year.

This is an account of my personal experiences, which I based on advice I got from friends and found online. I don’t claim it’s original, and my sample is n=1, so cherry-pick what resonates for you. I still hope some find it useful. — Read More

#strategy

Quantifying Human-AI Synergy

We introduce a novel Bayesian Item Response Theory framework to quantify human–AI synergy, separating individual and collaborative ability while controlling for task difficulty in interactive settings. Unlike standard static benchmarks, our approach models human–AI performance as a joint process, capturing both user-specific factors and moment-to-moment fluctuations. We validate the framework by applying it to human–AI benchmark data (n=667) and find significant synergy. We demonstrate that collaboration ability is distinct from individual problem-solving ability. Users better able to infer and adapt to others’ perspectives achieve superior collaborative performance with AI–but not when working alone. Moreover, moment-to-moment fluctuations in perspective taking influence AI response quality, highlighting the role of dynamic user factors in collaboration. By introducing a principled framework to analyze data from human-AI collaboration, interactive benchmarks can better complement current single-task benchmarks and crowd-assessment methods. This work informs the design and training of language models that transcend static prompt benchmarks to achieve adaptive, socially aware collaboration with diverse and dynamic human partners. — Read More

#performance