How I Structure My Data Pipelines: The Silver Layer

… Dimensional modeling is more important than ever.

The methodology has decades of literature behind it. The patterns are documented, the edge cases are known, and there’s no need to invent solutions from scratch. Facts and dimensions are composable primitives that mix and match to answer questions nobody has thought of yet. Paired with an ERD, tests, and naming conventions, Silver becomes something people can navigate without asking questions.

Gold models are the primary consumers of Silver. Every metric view, every wide table, every consumption artifact in Gold starts by referencing Silver facts and dimensions. 

Overview
The Bronze Layer
The Silver Layer

#data-science

China’s Military Uses Hawk and Wolf Behavior to Train AI Weapon Swarms

On January 23, China’s National University of Defense Technology demonstrated something that’s reshaping how autonomous weapons work: a single operator supervising over 200 drones simultaneously during urban combat exercises. The swarm operated with minimal human input, relying on what the People’s Liberation Army calls “effect-based control,” designed to function even when communication signals are jammed.

The technology didn’t emerge from traditional programming. It came from watching hawks hunt.

Engineers at Beihang University, a military-linked institution, observed how hawks select vulnerable prey and trained defensive drones to replicate that behaviour, according to The Wall Street Journal. In parallel tests, attack drones mimicked pigeons to evade threats. The result: in a five-versus-five combat simulation, the hawk-trained drones eliminated all opponents in 5.3 seconds, according to a patent filed in April 2024. — Read More

#china-ai

Apple’s  AI Game is Misunderstood

Apple’s AI strategy has become a Rorschach test for the technology industry. Critics see a company falling dangerously behind. Needham analyst Laura Martin claims it is one to two years behind its competitors. But almost all of this commentary, whether bullish or bearish, focuses on the wrong question.

The standard narrative compares Apple’s AI capex to Microsoft’s, Apple’s Siri to Google’s Gemini, Apple’s foundation models to OpenAI’s GPT-4. By these metrics, Apple looks behind. But these comparisons assume Apple is trying to win the same race. The evidence suggests it isn’t. — Read More

#big7

Corollary Discharge Dysfunction to Inner Speech and its Relationship to Auditory Verbal Hallucinations in Patients with Schizophrenia Spectrum Disorders 

Auditory-verbal hallucinations (AVH)—the experience of hearing voices in the absence of auditory stimulation—are a cardinal psychotic feature of schizophrenia-spectrum disorders. It has long been suggested that some AVH may reflect the misperception of inner speech as external voices due to a failure of corollary-discharge-related mechanisms. We aimed to test this hypothesis with an electrophysiological marker of inner speech.

… This study provides empirical support for the theory that AVH are related to abnormalities in the normative suppressive mechanisms associated with inner speech. This phenomenon of “inner speaking-induced suppression” may have utility as a biomarker for schizophrenia-spectrum disorders generally, and may index a tendency for AVH specifically at more extreme levels of abnormality. — Read More

#human

The Personal AI Mentor Setup I Wish I Had at 20

Mentorship prompts for learning, career, money, and creativity.

…[A] simple setup that does one job really well:

It asks better questions than I do.
It turns messy goals into a plan.
It pushes me to take action.

Not a “magic AI.”

personal AI mentor setup — built the right way.

Read More

#chatbots

On This Day… 1776

On This Day… 1776 is Darren Aronofsky’s short-form series focusing on key moments from that revolutionary year. The fact-based short films use a “combination of traditional filmmaking tools and emerging AI capabilities,” SAG voice actors and AI visuals, to dramatize scenes from the year’s most pivotal moments. The series draws on Aronofsky’s Primordial Soup’s partnership with Google DeepMind having each episode drop on the 250th anniversary of the event it depicts. — Read More

#vfx

Conditional Memory via Scalable Lookup:A New Axis of Sparsity for Large Language Models

While Mixture-of-Experts (MoE) scales capacity via conditional computation, Transformers lack a native primitive for knowledge lookup, forcing them to inefficiently simulate retrieval through computation. To address this, we introduce conditional memory as a complementary sparsity axis, instantiated via Engram, a module that modernizes classic N-gram embedding for O(1) lookup. By formulating the Sparsity Allocation problem, we uncover a U-shaped scaling law that optimizes the trade-off between neural computation (MoE) and static memory (Engram). Guided by this law, we scale Engram to 27B parameters, achieving superior performance over a strictly iso-parameter and iso-FLOPs MoE baseline. Most notably, while the memory module is expected to aid knowledge retrieval (e.g., MMLU +3.4; CMMLU +4.0), we observe even larger gains in general reasoning (e.g., BBH +5.0; ARC-Challenge +3.7) and code/math domains~(HumanEval +3.0; MATH +2.4). Mechanistic analyses reveal that Engram relieves the backbone’s early layers from static reconstruction, effectively deepening the network for complex reasoning. Furthermore, by delegating local dependencies to lookups, it frees up attention capacity for global context, substantially boosting long-context retrieval (e.g., Multi-Query NIAH: 84.2 to 97.0). Finally, Engram establishes infrastructure-aware efficiency: its deterministic addressing enables runtime prefetching from host memory, incurring negligible overhead. We envision conditional memory as an indispensable modeling primitive for next-generation sparse models. — Read More

#performance

The 80% Problem in Agentic Coding

… Some time ago I wrote about “the 70% problem” – where AI coding took you to 70% completion, then leave the final 30% last mile for humans. That framing may now be evolving. The percentage may shift to 80% or higher for certain kinds of projects, but the nature of the problem changed more dramatically than the numbers suggest.

Armin Ronacher’s poll of 5,000 developers compliments this story: 44% now write less than 10% of their code manually. Another 26% are in the 10-50% range. We’ve crossed a threshold. But here’s what the triumphalist narrative misses: the problems didn’t disappear, they shifted. And some got worse. — Read More

#devops

Why We’ve Tried to Replace Data Analytics Developers Every Decade Since 1974

This article was inspired by Stephan Schwab’s excellent piece “Why We’ve Tried to Replace Developers Every Decade Since 1969” which traces the recurring dream of eliminating software developers from COBOL through to AI. Reading it, I recognised the same pattern playing out in my own field of data warehousing, data analytics and business intelligence; a fifty-year cycle of tools promising to democratise data work, each delivering genuine value while leaving the fundamental need for specialists stubbornly intact.

Every decade brings new promises: this time, we’ll finally make building analytics platforms simple enough that we won’t need so many specialists. From SQL to OLAP to AI, the pattern repeats. Business leaders grow frustrated waiting months for a data warehouse that should take weeks, or weeks for a dashboard that should take days. Data teams feel overwhelmed by request backlogs they can never clear. Understanding why this cycle persists for fifty years reveals what both sides need to know about the nature of data analytics work. — Read More

#data-science

The private cloud returns for AI workloads

A North American manufacturer spent most of 2024 and early 2025 doing what many innovative enterprises did: aggressively standardizing on the public cloud by using data lakes, analytics, CI/CD, and even a good chunk of ERP integration. The board liked the narrative because it sounded like simplification, and simplification sounded like savings. Then generative AI arrived, not as a lab toy but as a mandate. “Put copilots everywhere,” leadership said. “Start with maintenance, then procurement, then the call center, then engineering change orders.”

… The most valuable AI use cases were those closest to people who build and fix things. Those people lived near manufacturing plants with strict network boundaries, latency constraints, and operational rhythms that don’t tolerate “the provider is investigating.” Within six months, the company began shifting its AI inference and retrieval workloads to a private cloud located near its factories, while keeping model training bursts in the public cloud when it made sense. It wasn’t a retreat. It was a rebalancing. — Read More

#training