Research-Driven Agents: What Happens When Your Agent Reads Before It Codes

Coding agents working from code alone generate shallow hypotheses. Adding a research phase — arxiv papers, competing forks, other backends — produced 5 kernel fusions that made llama.cpp CPU inference 15% faster.

Coding agents generate better optimizations when they read papers and study competing projects before touching code. We added a literature search phase to the autoresearch / pi-autoresearch loop, pointed it at llama.cpp with 4 cloud VMs, and in ~3 hours it produced 5 optimizations that made flash attention text generation +15% faster on x86 and +5% faster on ARM (TinyLlama 1.1B). The full setup works with any project that has a benchmark and test suite. — Read More

#devops

Anthropic loses appeals court bid to pause supply chain risk label

A three-judge panel at the D.C. Circuit Court of Appeals on Wednesday rejected a request by the artificial intelligence startup Anthropic to pause the government’s designation of the company as a supply chain risk.

The decision leaves in place at least part of the Defense Department’s official designation of Anthropic’s products as risks to national security. The label — never before applied to an American company — blocks contractors who work with the Pentagon from using Anthropic’s AI models on DOD contracts. — Read More

#dod, #legal

Mythos, the AI too powerful to be released?

In what’s probably the AI news of the week, month, and even the year, Anthropic has announced a model they are too scared to release. Yes, that’s literally the headline.

In other words, we have been introduced (sort of) to what many believe is a total step change in AI capabilities. And as you can guess, the story is making rounds, and for good reason.

The reason behind the non-release?

This model could allegedly break the Internet and basically every piece of software it’s exposed to.

So, is the world as we know it about to change, or is this the ultimate marketing stunt?Read More

#strategy

PentAGI: Penetration testing Artificial General Intelligence

PentAGI is an innovative tool for automated security testing that leverages cutting-edge artificial intelligence technologies. The project is designed for information security professionals, researchers, and enthusiasts who need a powerful and flexible solution for conducting penetration tests. — Read More

#cyber

Patterns for Reducing Friction in AI-Assisted Development

The practices that make human pair programming effective—onboarding, structured design discussion, shared standards—apply equally to working with AI coding assistants. I propose five patterns that bring this collaborative scaffolding to AI-assisted development, shifting the experience from correcting a tool to collaborating with a capable teammate.

PATTERNS
Knowledge Priming
Design-First Collaboration
Context Anchoring
Encoding Team Standards
Feedback Flywheel

Read More

#devops

Claude Managed Agents: get to production 10x faster

Today, we’re launching Claude Managed Agents, a suite of composable APIs for building and deploying cloud-hosted agents at scale.

Until now, building agents meant spending development cycles on secure infrastructure, state management, permissioning, and reworking your agent loops for every model upgrade. Managed Agents pairs an agent harness tuned for performance with production infrastructure to go from prototype to launch in days rather than months.

Whether you’re building single-task runners or complex multi-agent pipelines, you can focus on the user experience, not the operational overhead. — Read More

#devops

Meta debuts the Muse Spark model in a ‘ground-up overhaul’ of its AI

Meta released an AI model on Wednesday called Muse Spark, which marks its “first step” toward an “overhaul of [its] AI efforts.”

Muse Spark is the inaugural model to come out of Meta Superintelligence Labs, which was created last year because CEO Mark Zuckerberg was reportedly unhappy with the progress of Meta and its Llama models and how they lagged behind OpenAI’s ChatGPT and Anthropic’s Claude. Meta recruited former Scale AI co-founder and CEO Alexandr Wang to lead Meta Superintelligence Labs and invested $14.3 billion in the data labeling company for a 49% stake.

Now, it’s time for Zuckerberg to see if his reconfigured AI team can woo users. — Read More

#big7

Cybersecurity in the Age of Instant Software

AI is rapidly changing how software is written, deployed, and used. Trends point to a future where AIs can write custom software quickly and easily: “instant software.” Taken to an extreme, it might become easier for a user to have an AI write an application on demand—a spreadsheet, for example—and delete it when you’re done using it than to buy one commercially. Future systems could include a mix: both traditional long-term software and ephemeral instant software that is constantly being written, deployed, modified, and deleted.

AI is changing cybersecurity as well. In particular, AI systems are getting better at finding and patching vulnerabilities in code. This has implications for both attackers and defenders, depending on the ways this and related technologies improve.

In this essay, I want to take an optimistic view of AI’s progress, and to speculate what AI-dominated cybersecurity in an age of instant software might look like. There are a number of unknowns that will factor into how the arms race between attacker and defender might play out. — Read More

#cyber

Spec-Driven Development Is Waterfall in Markdown

SpecKit has 77,000 GitHub stars. AWS built an entire IDE around spec-driven development. Tessl raised $125 million on the promise that specs, not code, should be the source of truth.

The pitch was clean: stop vibe coding, write a proper specification, let the agent execute against it. Engineers loved it. It felt like rigor. It felt like the adults had finally entered the room.

Then someone actually tested it on a real project. Ten times slower. More ceremony. Same bugs.

The industry built an entire ecosystem around one idea: if we give AI agents a detailed enough spec, they’ll produce working software. It’s the same bet the industry made with outsourcing, with offshoring, with every model that tries to replace understanding with documentation. Write it down clearly enough and someone (or something) on the other side will execute it perfectly. —  Read More

#devops

Project Glasswing: Securing critical software for the AI era

Securing crToday we’re announcing Project Glasswing1, a new initiative that brings together Amazon Web Services, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks in an effort to secure the world’s most critical software.

We formed Project Glasswing because of capabilities we’ve observed in a new frontier model trained by Anthropic that we believe could reshape cybersecurity. Claude Mythos2 Preview is a general-purpose, unreleased frontier model that reveals a stark fact: AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities.itical software for the AI era. — Read More

#cyber