As Deep as the Grave | Official Trailer (Val Kilmer AI Performance, 2026)

Read More
#videos

What If We Prompted AI for Outcomes Instead of Outputs?

I’ve been to a lot of meetups about AI in the last year. Across all of those there’s been a common refrain that gets repeated by the experts and the newly empowered noobs alike. “If you don’t know how to get what you need out of your AI tool, just ask it.” It’s one of the most powerful aspects of the AI revolution. You can’t ask a hammer how to build a cabinet. You can ask Claude how to build the web app you’ve imagined for the last 20 years

In all of these cases though, the prompt is always focused on creating a specific thing, an output. However, there’s a question worth sitting with — one we’ve started discussing internally lately. What would it look like to prompt AI for an outcome instead of an output?Read More

#strategy

Apple UX Principle: How Simplicity Drives Apple’s 5–10% Conversion Rates

The Apple UX Principle is often misunderstood as a design style defined by minimalism and clean interfaces. In reality, what Apple Inc. has built is far more strategic. It is a system designed to influence how people think, feel, and ultimately decide.

This case study explores how Apple applies five core UX principles, usability, communication, functionality, aesthetics, and emotional connection, to create product experiences that consistently outperform industry benchmarks. More specifically, it examines how these principles contribute to Apple’s estimated 5–10% conversion rates, significantly higher than the typical ecommerce average of 2–3%.

The goal is not to replicate Apple’s design, but to understand the mechanisms behind its performance. — Read More

#strategy

Structured-Prompt-Driven Development (SPDD)

LLM programming assistants have demonstrated considerable value, but mostly with individual developers. The internal IT organization in Thoughtworks has been using them for their teams and have developed a method and workflow called Structured Prompt-Driven Development (SPDD). The article describes a simple example of this workflow with details in github. This workflow treats the prompts as a first-class artifact, kept with the code in version control, and used to align development with business needs. We have found that developers need three key skills to be effective: alignment, abstraction-first, and iterative review. — Read More

#devops

The zero-days are numbered 

Since February, the Firefox team has been working around the clock using frontier AI models to find and fix latent security vulnerabilities in the browser. We wrote previously about our collaboration with Anthropic to scan Firefox with Opus 4.6, which led to fixes for 22 security-sensitive bugs in Firefox 148.

As part of our continued collaboration with Anthropic, we had the opportunity to apply an early version of Claude Mythos Preview to Firefox. This week’s release of Firefox 150 includes fixes for 271 vulnerabilities identified during this initial evaluation. — Read More

#cyber

AI evals are becoming the new compute bottleneck

AI evaluation has crossed a cost threshold that changes who can do it. The Holistic Agent Leaderboard (HAL) recently spent about $40,000 to run 21,730 agent rollouts across 9 models and 9 benchmarks. A single GAIA run on a frontier model can cost $2,829 before caching. Exgentic‘s $22,000 sweep across agent configurations found a 33× cost spread on identical tasks, isolating scaffold choice as a first-order cost driver, and UK-AISI recently scaled agentic steps into the millions to study inference-time compute. In scientific ML, The Well costs about 960 H100-hours to evaluate one new architecture and 3,840 H100-hours for a full four-baseline sweep. While compression techniques have been proposed for static benchmarks, new agent benchmarks are noisy, scaffold-sensitive, and only partly compressible. Training-in-the-loop benchmarks are expensive by construction, and when you try to add reliability to these evals, repeated runs further multiply the cost. — Read More

#strategy

Anthropic’s Shared Responsibility Security Model for AI Agents, Explained

Earlier this month Anthropic, the company behind Claude, published a proposal to NIST (the U.S. federal agency that governs technology standards) which, for the first time, outlines the key areas of agentic AI security, and how they should be addressed and governed. Anthropic should be applauded for taking this initiative, since existing standards and frameworks are lacking, creating confusion among end-user organizations. Security practitioners should take heed since NIST standards can later translate into Federal regulations and even legislation.

…Anthropic’s framework divides AI agent security into four layers – ModelHarnessTools, and Environment – with the model provider owning only the first. Anthropic’s own data shows human-in-the-loop oversight has already failed at production scale (93% of permission prompts approved without reading, clarification rate of just 16.4% on complex tasks). And six NIST standards and federal frameworks structurally exclude the most likely agent failure mode: agents causing harm within their authorized permissions. — Read More

#cyber

Flow generation through natural language: An agentic modeling approach

If you’re building AI products on top of closed models, anyone with an API key can get similar capabilities. Lasting differentiation comes from proprietary data, the training recipe, the infrastructure, and the speed of iteration.

Shopify has something most companies don’t: a product surface where millions of merchant interactions directly signal whether the model’s output is any good. That feedback loop is the foundation, but only if you keep learning from it.

We fine-tuned a tool-calling agent to turn natural language into a Shopify Flow for Sidekick, our AI commerce assistant. It’s 2.2x faster, 68% cheaper, and outperforms closed models. — Read More

#devops

Google is testing AI chatbot search for YouTube

Google is bringing conversational AI search to YouTube, marking the company’s latest push to infuse its products with AI-powered discovery tools. The feature, dubbed “Ask YouTube,” started rolling out to YouTube Premium subscribers in the US today as an experimental test. It transforms the platform’s search bar into a chatbot-style interface that pulls results from longform videos, Shorts, and text summaries – essentially giving YouTube its own version of Google’s AI Mode for search. — Read More

#big7

Can agents replace the search stack?

How is search implemented where you work? Probably as a complex set of capabilities on top of retrieval. Our search APIs understand queries, call backend search systems, and finally rerank results.

But if we had an agent in the loop, would we need all that? Could we replace search backends with an agent? After all, an agent understands user requests, calls retrieval tools, and evaluates relevance on its own. We see ChatGPT do this all the time, why can’t our search bar?

In other words, if you give a basic BM25 backend to an agent, could it take the Search API’s job? — Read More

#devops