Meta-Harness: End-to-End Optimization of Model Harnesses

The performance of large language model (LLM) systems depends not only on model weights, but also on their harness: the code that determines what information to store, retrieve, and present to the model. Yet harnesses are still designed largely by hand, and existing text optimizers are poorly matched to this setting because they compress feedback too aggressively. We introduce Meta-Harness, an outer-loop system that searches over harness code for LLM applications. It uses an agentic proposer that accesses the source code, scores, and execution traces of all prior candidates through a filesystem. On online text classification, Meta-Harness improves over a state-of-the-art context management system by 7.7 points while using 4x fewer context tokens. On retrieval-augmented math reasoning, a single discovered harness improves accuracy on 200 IMO-level problems by 4.7 points on average across five held-out models. On agentic coding, discovered harnesses surpass the best hand-engineered baselines on TerminalBench-2. Together, these results show that richer access to prior experience can enable automated harness engineering. — Read More

#devops

Building an AI-Powered Prompt Optimizer Using LLMs

Have you ever asked a question to an AI and received a disappointing answer? It’s not because the AI wasn’t smart enough, but because your question wasn’t quite accurate and you’re not alone.

The quality of answers we get from Large Language Models (LLMs) depends heavily on how we ask our questions.

Today, we’re going to build something interesting: An AI system that automatically improves your questions before answering them.

Think of it as having a smart assistant who rephrases your questions to help you get better answers.Read More

#devops

How Agentic RAG Works?

The main problem with standard RAG systems isn’t the retrieval or the generation. It’s that nothing sits in the middle deciding whether the retrieval was actually good enough before the generation happens.

Standard RAG is a pipeline where information flows in one direction, from query to retrieval to response, with no checkpoint and no second chance. This works fine for simple questions with obvious answers.

However, the moment a query gets ambiguous, or the answer is spread across multiple documents, or the first retrieval pulls back something that looks good but isn’t, RAG starts losing value.

Agentic RAG attempts to fix this problem. It is based on a single question: what if the system could pause and think before answering? — Read More

#devops

App Store | Age of Agent

The App Store Won’t Survive the Age of Agents

When Steve Jobs launched the iPhone in 2007, there was no App Store. His plan was for developers to build web apps accessed through Safari. That lasted about a year. Developers demanded native access, and in 2008 Apple launched the App Store — bundling discovery, distribution, trust, and payment into a single controlled layer.

That bundle has generated hundreds of billions of dollars. But it was built for humans who browse, tap, and swipe. AI agents don’t do any of that. And this mismatch is about to reshape the platform economy. — Read More

#devops

Designing Agentic AI Systems

How do you build an agentic system that works? And how do you spot potential problems during development that can snowball into massive headaches for future you when they go into production?

To answer these questions, you need to break agentic systems into three parts: tools, reasoning, and action. Each layer comes with its own challenges. Mistakes in one layer can ripple through the others, causing failures in unexpected ways. Retrieval functions might pull irrelevant data. Poor reasoning can lead to incomplete or circular workflows. Actions might misfire in production.

An agentic system is only as strong as its weakest link and this guide will show you how to design systems that avoid these pitfalls. The goal: build agents that are reliable, predictable, and resilient when it matters most. Read More:

Part 1 – Architecture
Part 2 – Modularity
Part 3 – Agent 2 Agent Interactions
Part 4 – Data & RAG
Part 5 – Vectorize MCP

#devops

Agent Memory: Why Your AI Has Amnesia and How to Fix It

Today’s AI agents forget everything between conversations. Every interaction starts from zero, with no recall of who you are or what you’ve discussed before.

Agent memory isn’t about bigger context windows. It’s about a persistent, evolving state that works across sessions.

The field has converged on four memory types (working, procedural, semantic, episodic) that map directly to how human memory works.

Building agent memory at enterprise scale is fundamentally a database problem. You need vectors, graphs, relational data, and ACID transactions working together. — Read More

#devops

Stop Writing Prompts. Start Programming LLMs.

I’ve written more prompts than I care to admit. 🙂

During my PhD at the University of Copenhagen, I spent embarrassing amounts of time tweaking system prompts, adjusting few-shot examples, and praying that my carefully crafted instructions would survive the next model update. Spoiler: they rarely did. Then recently I discovered DSPy, and I realized I’d been doing it all wrong.

… DSPy (Declarative Self-improving Python) from Stanford NLP flips the entire paradigm. Instead of writing brittle prompt strings, you write structured Python code. Instead of manually optimizing prompts, you let the framework compile them for you. — Read More

#devops

OpenClaw is the WordPress moment for agents

“openclaw is the wordpress moment for agents. the shopifys and substacks are coming!”

this might be the typical take of “if you’re in crypto pivot to ai”, reality is there’s lots a real software developer will find useful with openclaw.

what really changes? when normal people realise they can use it usefully. — Read More

#devops

I tried Norton’s AI-powered Neo browser and it finally made sense out of my dozens of open tabs

Whether you like it or not, AI is finding its way into all of our devices and the apps we use everyday. From chatbots to image generators, you can’t blink without seeing AI somewhere now. However, I never expected to try and enjoy using an AI-powered browser as much as I have over the past week while testing Norton Neo.

After going hands-on with OpenAI’s ChatGPT Atlas browser when it first released last year, I have to admit the bar was quite low. Although both it and Neo are Chromium-based browsers, they do things quite differently, especially when compared to my go-to browser, Google Chrome.

While ChatGPT Atlas tries to turn the traditional web browser on its head, Neo follows in the footsteps of Opera Air and its more mindful approach to how you use the web. Instead of taking the actual browsing out of your hands like ChatGPT Atlas does with its agents, Neo focuses more on refining the browsing experience by making it calmer and smarter at the same time. — Read More

#devops

Meet the new Stitch, your vibe design partner

Here are 5 major upgrades to help you create, iterate and collaborate:

AI-Native Canvas
Smarter Design Agent
Voice
Instant Prototypes
Design Systems and DESIGN.md

Read More

#devops