Rick's Cafe AI 12:39 pm on July 9, 2026
Tags: Data Science

How to build robust data pipelines with AI

Writing a data pipeline with AI has never been easier. You type a prompt, wait a minute, and something that runs shows up. The pipeline is green. The number it returns is… wrong.

Indeed, a pipeline that runs and a pipeline that is correct are two different things, and AI is very good at the first one. Two things work against you.

First, AI is non-deterministic. Ask three times, get two different implementations, and the green checkmark won’t tell you which one is correct.

Second, often, AI can’t see your data and metadata. So it guesses: your schema, what counts as a duplicate, what the units are. Those guesses are silent bugs. Again, no error will be seen in the pipeline, just plausible wrong numbers. — Read More

#data-science

Rick's Cafe AI 11:04 am on June 8, 2026
Tags: Data Science

Structure vs. Concept

Heather Hedden, author of The Accidental Taxonomist and one of the clearer voices in the controlled vocabulary world, recently posed a question on her blog that turns out to be deceptively subtle: Is a taxonomy an ontology? The question came up at this year’s Knowledge Graph Conference, where she noted that many practitioners conflate the two — treating taxonomies as merely “simpler” ontologies, or assuming that synonyms and alternative labels belong to the ontological layer rather than the taxonomic one.

Heather’s answer is characteristically practical: taxonomies and ontologies are distinct in purpose, even if a taxonomy can be modelled as an instance of an upper ontology. … I largely agree. But I want to push deeper, … [a] taxonomy is precisely the layer that an LLM can use. — Read More

#data-science

Rick's Cafe AI 8:49 am on May 28, 2026
Tags: Data Science

Of Hammers and Nails: What AI Can and Cannot Do for a Data Analyst

Every few years, a new technology arrives with the same promise: this one will transform the organisation, eliminate the grunt work, and make whole categories of expensive people redundant. AI, and the large language models driving its current moment, is the latest. In data and analytics, the claims have been particularly bold — the well-prompted chatbot will soon replace the analyst, we are told. Having spent the past year rolling AI tooling out across a large organisation, the reality is more interesting, and more mixed, than that.

Start with what works, because something genuinely does. AI tools have made writing code significantly faster. That matters more than it might sound. In teams that haven’t yet built mature data assets (data models), coding and data preparation is the job — easily 80 to 90 percent of what analysts actually spend their time on. Anything that speeds this up is a meaningful productivity gain. — Read More

#data-science

Rick's Cafe AI 11:28 am on May 19, 2026
Tags: Data Science

Claude Code as a Data Analyst: From Zero to First Report

As data analysts we’ve all been there, the dreaded request for the monthly/yearly [insert topic] report, an essential task that’s also a massive time sink.

My thoughts for the last week? “Can’t AI just… do this?” Surely, it can whip up a simple data analysis report. Right? — Read More

#data-science

Rick's Cafe AI 11:26 am on April 16, 2026
Tags: Data Science

Being a Staff+ Data Scientist in 2026

I became a data scientist in 2013 when the title was young. It was so new that most companies had no idea what a data scientist should be doing, only that they desperately needed one or they would be left behind. Sound familiar?

I’ve tried to survey the job description of data science a couple of times with varying degrees of success, most recently to go with some informal recommendations for creating data science degree programs. Together with a group of colleages we tried to summarize what data scientists do and the data science subtypes of maker, oracle, detective, generalist. But in the face of changing expectations this doesn’t feel like enough anymore. It’s time for a refresh. — Read More

#data-science

Rick's Cafe AI 9:39 am on April 6, 2026
Tags: Data Science

The Revenge of the Data Scientist

Is the heyday of the data scientist over? The Harvard Business Review once called it “The Sexiest Job of the 21st Century.”¹ In tech, data scientist roles were often among the best paid.² The job also demanded an unusual mix of skills.

In addition to creating a high-barrier to entry, these skills enabled data scientists to build predicitive models, measure casuality and find patterns in data. Of these, predicitive modeling paid best. Companies later peeled that work off into a new title: Machine Learning Engineer (“MLE”).

For years, shipping AI meant keeping data scientists and MLEs on the critical path. With LLMs, this stopped being the default. Foundation-model APIs now allow teams to integrate AI independently.

Getting cut out of the loop rattled data scientists and MLEs I know. If the company no longer needs you to ship AI, it is fair to wonder whether the job still has the same upside. The harsher story people tell themselves: unless you are pretraining at a foundation-model lab, you are not where the action is.

I read it the other way. Training models was never most of the job. — Read More

#data-science

Rick's Cafe AI 9:25 am on March 31, 2026
Tags: Data Science

AI Is Here, But The Hard Parts Haven’t Changed

I just got back from San Francisco, where I gave a talk at Undercurrent, a small, intimate data engineering event put on by Confluent. I shared the stage with some legends (Maxime Beauchemin, Josh Wills, Holden Karau, Shinji Kim. The attendees were also stacked, with lots of talented and storied engineers and leaders. I talked to one guy who built and modernized the data warehouses at both LinkedIn and Uber. The Bay Area is like that. Legends everywhere. Conversations like this are the reason I still get on the road.

But the real reason I’m writing today is some new data. I closed the March 2026 Practical Data Pulse Survey on March 21st and used its results as the backbone of my Undercurrent talk. 194 data professionals responded. These are mostly data engineers, some analytics engineers, and some leaders – all people using AI tools in their data engineering work.

The TL;DR? AI has changed everything except the hard parts. — Read More

#data-science

Rick's Cafe AI 11:49 am on February 2, 2026
Tags: Data Science

How I Structure My Data Pipelines: The Silver Layer

… Dimensional modeling is more important than ever.

The methodology has decades of literature behind it. The patterns are documented, the edge cases are known, and there’s no need to invent solutions from scratch. Facts and dimensions are composable primitives that mix and match to answer questions nobody has thought of yet. Paired with an ERD, tests, and naming conventions, Silver becomes something people can navigate without asking questions.

Gold models are the primary consumers of Silver. Every metric view, every wide table, every consumption artifact in Gold starts by referencing Silver facts and dimensions.

Overview
The Bronze Layer
The Silver Layer

#data-science

Rick's Cafe AI 12:07 pm on January 29, 2026
Tags: Data Science

Why We’ve Tried to Replace Data Analytics Developers Every Decade Since 1974

This article was inspired by Stephan Schwab’s excellent piece “Why We’ve Tried to Replace Developers Every Decade Since 1969” which traces the recurring dream of eliminating software developers from COBOL through to AI. Reading it, I recognised the same pattern playing out in my own field of data warehousing, data analytics and business intelligence; a fifty-year cycle of tools promising to democratise data work, each delivering genuine value while leaving the fundamental need for specialists stubbornly intact.

Every decade brings new promises: this time, we’ll finally make building analytics platforms simple enough that we won’t need so many specialists. From SQL to OLAP to AI, the pattern repeats. Business leaders grow frustrated waiting months for a data warehouse that should take weeks, or weeks for a dashboard that should take days. Data teams feel overwhelmed by request backlogs they can never clear. Understanding why this cycle persists for fifty years reveals what both sides need to know about the nature of data analytics work. — Read More

#data-science

Rick's Cafe AI 11:57 am on November 20, 2025
Tags: Data Science

The Network is the Product: Data Network Flywheel, Compound Through Connection

The value of a data product is never contained within its boundaries. It emerges from the number, quality, and friction of its connections, and the signals from its produce. Connectivity is the architecture that turns isolated signals into coordinated intelligence. The mistake most teams make is assuming insight comes from accumulation, when in reality it comes from interaction. — Read More

#data-science

Recent Activity

Rick's Cafe AI

The latest in Artificial Intelligence carefully curated into its own special blend

Tag Archives: Data Science

How to build robust data pipelines with AI

Structure vs. Concept

Of Hammers and Nails: What AI Can and Cannot Do for a Data Analyst

Claude Code as a Data Analyst: From Zero to First Report

Being a Staff+ Data Scientist in 2026

The Revenge of the Data Scientist

AI Is Here, But The Hard Parts Haven’t Changed

How I Structure My Data Pipelines: The Silver Layer

Why We’ve Tried to Replace Data Analytics Developers Every Decade Since 1974

The Network is the Product: Data Network Flywheel, Compound Through Connection