I thought becoming a data engineer meant mastering tools. Instead, it meant learning how to see. I thought the hardest part would be learning the tools — Hadoop, Spark, SQL optimization, and distributed processing. Over time, I realized the real challenge wasn’t technical. It was learning how to think.
Learning to think like a data engineer — to see patterns in chaos, to connect systems to human behavior, to balance simplicity and scale — is a slow process of unlearning, observing, and reimagining. I didn’t get there through courses or certifications. I got there through people.
Four mentors, in four different moments of my life, unknowingly gave me lessons that shaped how I approach engineering, leadership, and even life. Each taught me something not about data, but about thinking systems.
What follows isn’t a tutorial. It’s a map of how four people — and their lessons — rewired how I think. — Read More
Tag Archives: Data Science
Data Modeling for the Agentic Era: Semantics, Speed, and Stewardship
In data analytics, we’re facing a paradox. AI agents can theoretically analyze anything, but without the right foundations, they’re as likely to hallucinate a metric as to calculate it correctly. They can write SQL in seconds, but will it answer the right business question? They promise autonomous insights, but at what cost to trust and accuracy?
These days, everyone is embedding AI chat in their product. But to what end? Does it actually help, or would users rather turn to tools like Claude Code when they need real work done? The real questions are: how can we model our data for agents to reliably consume, and how can we use agents to develop better data models?
After spending the last year exploring where LLMs have genuine leverage in analytics (see my writing on GenBI and Self-Serve BI), I’ve identified three essential pillars that make agentic data modeling actually work: semantics as the shared language both humans and AI need to understand metrics, speed through sub-second analytics that lets you verify numbers before they become decisions, and stewardship with guardrails that guide without constraining. The TL;DR? AI needs structure to understand, humans need speed to verify, and both need boundaries to stay productive. — Read More
The Complete AI Engineering Roadmap for Beginners
Hey there, future AI engineer!
Feeling overwhelmed by all the AI buzz and wondering where to start? Don’t worry. This roadmap will take you from “What’s AI?” to building real AI systems, one step at a time. Think of this as your GPS for the AI journey ahead!
Here’s your friendly guide to breaking into the world of AI Engineering. — Read More
AI-Ready Data: A Technical Assessment. The Fuel and the Friction.
Most organizations operate data ecosystems built over decades of system acquisitions, custom development, and integration projects. These systems were designed for transactional processing and business reporting, not for the real-time, high-quality, semantically rich data requirements of modern AI applications.
Research shows that 50% of organizations are classified as “Beginners” in data maturity, 18% are “Dauntless” with high AI aspirations but poor data foundations, 18% are “Conservatives” with strong foundations but limited AI adoption, and only 14% are “Front Runners” achieving both data maturity and AI scale. — Read More
Meta’s Data Scientist’s Framework for Navigating Product Strategy as Data Leaders
One question that I often get is what makes Product Data Scientist special at Meta. My answer has always been “You are by default a product leader, navigating product directions with data”. This is true across all levels, from new grads to directors. Data scientists at Meta don’t just analyze data — they transform business questions into data-driven product visions that help building better human connections.
The challenge? Product strategy development exists across a spectrum of conditions. Here I’ll explore how data scientists at Meta can drive product strategies across four distinct scenarios defined by data availability (low to high) and problem clarity (broad to concrete). — Read More
10 Years of Experience in 10 Minutes — A Data Analyst’s Problem-Solving Guide
Data analytics isn’t just about crunching numbers — it’s about solving real business problems with clarity and efficiency. Over the past decade, I’ve faced countless challenges, from messy datasets to indecisive stakeholders. This guide is my way of condensing 10 years of hard-earned experience into 10 minutes of actionable insights. Whether you’re just starting or refining your approach, these lessons will help you think and work like an experienced data analyst. — Read More
Work smarter, not harder: Using the 80/20 principle in data analysis.
Have you heard of the 80/20 rule, or the Pareto Principle? It says that roughly 80% of the effects come from 20% of the causes.
In most cases, a small percentage of efforts drive most of the results. Let’s apply this rule to data analysis, and work smarter, not harder!
Why is the 80/20 rule useful? It lets you focus on the few tasks that generate the most value for you and your organization. This saves time, increases efficiency, and makes you more useful at work. — Read More
Why Do Companies Focus on Data Structures and Algorithms in Tech Interviews?
Data Structures and Algorithms (DSA), is a skill you must learn if you want to work as a programmer/developer or data scientist, particularly in large tech giants. Although it may not directly relate to coding, having a solid understanding of DSA helps the software development process run well. It assists a programmer in adopting a reasoned strategy for understanding and resolving a problem.
Most businesses use DSA to evaluate a candidate’s skills. The importance of DSA for your coding career is discussed in this blog, along with tips on how to get ready for interviews. Read More
An Interview With the Guy Who Has All Your Data
It’s 10 pm. Do you know where your data is? Chad Engelgau does. He’s the CEO of Acxiom, a data broker. Your info is probably on one of his servers.
Chad Engelgau is the CEO of Acxiom, a data broker that operates one of the world’s biggest repositories of consumer information. The company claims to have granular details on more than 2.5 billion people across 62 different countries. The chances that Acxiom knows a whole lot about you, reader, are good
In many respects, data brokering is a shadowy enterprise. The industry mostly operates in quiet business deals the public never hears about, especially smaller firms that engage with data on particularly sensitive subjects. Compared to other parts of the tech industry, data brokers face little scrutiny from regulators, and in large part they evade attention from the media. Read More
Unstructured Data Challenges for 2023 and their Solutions
Unstructured data is information that does not have a pre-defined structure. It’s one of the three core data types, along with structured and semi-structured formats.
Examples of unstructured data include call logs, chat transcripts, contracts, and sensor data, as these datasets are not arranged according to a preset data model. Unstructured data must be standardized and structured into columns and rows to make it machine-readable, i.e., ready for analysis and interpretation. This makes managing unstructured data difficult. Read More