It can be hard to tell what’s real these days between the productivity/token maxxing and robot apocalypse – terrorizing our eyeballs with messages that the machine is either perfect or complete garbage. While technology is moving faster than I have ever seen in my lifetime, I can’t help but think we are applying to solve our non-technical problems.
The cracks were always there. AI just made them visible.
At Monki Gras 2026, Laura Tacho called out that what holds us back are our human and systems-level constraints. Not the technology – Us and the ways in which we organize and communicate (or don’t). — Read More
Daily Archives: April 28, 2026
I’m Sorry Dave, This Request Triggered Restrictions On Violative Cyber Content
n mid-April 2026, Context.ai was breached and used as a pivot into a Vercel employee’s Google Workspace account. From there, the threat actor pivoted into Vercel’s production environment. Vercel’s CEO Guillermo Rauch provided an update that is more noteworthy than the breach itself. In a tweet providing more details he said:
We believe the attacking group to be highly sophisticated and, I strongly suspect, significantly accelerated by AI. They moved with surprising velocity and in-depth understanding of Vercel.
Anyone doing red team work already knows this. — Read More
What Anthropic’s Mythos Means for the Future of Cybersecurity
Two weeks ago, Anthropic announced that its new model, Claude Mythos Preview, can autonomously find and weaponize software vulnerabilities, turning them into working exploits without expert guidance. These were vulnerabilities in key software like operating systems and internet infrastructure that thousands of software developers working on those systems failed to find. This capability will have major security implications, compromising the devices and services we use every day. As a result, Anthropic is not releasing the model to the general public, but instead to a limited number of companies.
The news rocked the internet security community. There were few details in Anthropic’s announcement, angering many observers. Some speculate that Anthropic doesn’t have the GPUs to run the thing, and that cybersecurity was the excuse to limit its release. Others argue Anthropic is holding to its AI safety mission. There’s hype and counterhype, reality and marketing. It’s a lot to sort out, even if you’re an expert.
We see Mythos as a real but incremental step, one in a long line of incremental steps. But even incremental steps can be important when we look at the big picture. — Read More
Emergent Strategic Reasoning Risks in AI: A Taxonomy-Driven Evaluation Framework
As reasoning capacity and deployment scope grow in tandem, large language models (LLMs) gain the capacity to engage in behaviors that serve their own objectives, a class of risks we term Emergent Strategic Reasoning Risks (ESRRs). These include, but are not limited to, deception (intentionally misleading users or evaluators), evaluation gaming (strategically manipulating performance during safety testing), and reward hacking (exploiting misspecified objectives). Systematically understanding and benchmarking these risks remains an open challenge. To address this gap, we introduce ESRRSim, a taxonomy-driven agentic framework for automated behavioral risk evaluation. We construct an extensible risk taxonomy of 7 categories, which is decomposed into 20 subcategories. ESRRSim generates evaluation scenarios designed to elicit faithful reasoning, paired with dual rubrics assessing both model responses and reasoning traces, in a judge-agnostic and scalable architecture. Evaluation across 11 reasoning LLMs reveals substantial variation in risk profiles (detection rates ranging 14.45%-72.72%), with dramatic generational improvements suggesting models may increasingly recognize and adapt to evaluation contexts. — Read More
Orchestrating AI Code Review at scale
Code review is a fantastic mechanism for catching bugs and sharing knowledge, but it is also one of the most reliable ways to bottleneck an engineering team. A merge request sits in a queue, a reviewer eventually context-switches to read the diff, they leave a handful of nitpicks about variable naming, the author responds, and the cycle repeats. Across our internal projects, the median wait time for a first review was often measured in hours.
When we first started experimenting with AI code review, we took the path that most other people probably take: we tried out a few different AI code review tools and found that a lot of these tools worked pretty well, and a lot of them even offered a good amount of customisation and configurability! Unfortunately, though, the one recurring theme that kept coming up was that they just didn’t offer enough flexibility and customisation for an organisation the size of Cloudflare.
… Instead of building a monolithic code review agent from scratch, we decided to build a CI-native orchestration system around OpenCode, an open-source coding agent. Today, when an engineer at Cloudflare opens a merge request, it gets an initial pass from a coordinated smörgåsbord of AI agents. Rather than relying on one model with a massive, generic prompt, we launch up to seven specialised reviewers covering security, performance, code quality, documentation, release management, and compliance with our internal Engineering Codex. These specialists are managed by a coordinator agent that deduplicates their findings, judges the actual severity of the issues, and posts a single structured review comment. — Read More
Why a Decade of Writing Detection Logic Makes the Mythos Exploit Numbers Less Scary
Anthropic’s marketing team has been pushing its new Mythos cybersecurity model and the volume of vulnerabilities it’s finding. According to Mozilla, those findings appear to be legitimate. If the pace holds up near term, a lot of people inside and outside the industry are worried, with good reason, and wondering if this is the new normal.
As someone who’s been writing detection logic for cybersecurity vendors for nearly a decade, these numbers are less scary and less world-ending than they appear. I’ve managed SOCs that regularly went up against state-sponsored actors, in the role where our organization won the Cogswell Award from the Defense Counterintelligence Agency. I’ve worked for a Fortune 100 doing detection at an enterprise scale most engineers never get to see, and put out the first public white paper on detection as code. All of that to say, I’ve been at it for quite some time now. While I think the short-term impact of models like Mythos is going to be rough, I also believe It’s also a lot less bad than people are making it out to be. — Read More
McKinsey and Google Cloud launch the McKinsey Google Transformation Group to scale enterprise impact for the AI era
McKinsey and Google Cloud today announced the McKinsey Google Transformation Group, expanding the two organizations’ long-standing partnership to accelerate enterprise outcomes by enabling AI transformations across domains and industries.
The new group combines McKinsey’s strategy and industry expertise, transformation experience, and technology delivery capabilities with Google Cloud’s AI stack—including compute accelerators, multimodal Gemini models, and Gemini Enterprise—to help clients turn AI ambition into sustained business value. The organizations will deliver this value through joint teams, cofunded value assessments, and outcome-based models, creating a more seamless, end-to-end experience while reducing up-front investment and aligning to measurable results. — Read More
How to Design a High-Scale Multi-Cloud Incident Journey
Choosing the right integration pattern for a high-scale incident journey isn’t always straightforward. Imagine severe weather hitting your country or region so hard that it leads to outages across the power grid. Now imagine that as an architect, you are on the hook to design the architecture that helps deal with the fallout of such a crisis. You must identify affected parties across multiple systems and trigger automated, personalized notifications based on real-time data.
In this recap of episode three of Think Like an Architect, we reconstruct the architectural thinking process for this scenario, which was originally done in real time during the livestream. Rather than just looking at a finished architectural design, following along with this process will strengthen the mental muscles you need to evaluate requirements, weigh options, and justify a solution direction. — Read More
Meta inks deal for solar power at night, beamed from space
The race to secure electricity for AI models has reached new heights: Meta has signed an agreement with the startup Overview Energy that could see a thousand satellites beam infrared light to solar farms that power data centers at night.
In 2024, Meta’s data centers used more than 18,000 gigawatt-hours of electricity — roughly enough to power more than 1.7 million American homes for a year — and its need for compute power is only increasing. The company has committed to building 30 gigawatts of renewable power sources, with a focus on industrial-scale solar power plants.
Typically, data centers turning to solar power must either invest in battery storage or rely on other generation sources to operate at night.
Overview Energy, a four-year-old, Ashburn, Virginia, outfit that emerged from stealth in December, has a different solution: The company is developing spacecraft that collect plentiful solar power in space. It then plans to convert that energy to near-infrared light and beam it at sufficiently large solar farms — on the order of hundreds of megawatts — which can convert that light to electricity. — Read More
Symphony
Symphony turns project work into isolated, autonomous implementation runs, allowing teams to manage work instead of supervising coding agents.
In this demo video, Symphony monitors a Linear board for work and spawns agents to handle the tasks. The agents complete the tasks and provide proof of work: CI status, PR review feedback, complexity analysis, and walkthrough videos. When accepted, the agents land the PR safely. Engineers do not need to supervise Codex; they can manage the work at a higher level.
… Symphony works best in codebases that have adopted harness engineering. Symphony is the next step — moving from managing coding agents to managing work that needs to get done. — Read More