As AI rapidly evolves, organizations are racing to build and deploy high-performance gen AI apps that deliver real-time insights and seamless user experiences. Central to this transformation is the emergence of the generative AI database, a new category of data platform optimized for vector search, semantic indexing and full-text retrieval. These systems are designed to address challenges like data silos, data quality and integration for AI and analytics. As the name suggests, a gen AI database is purpose-built to power generative AI models and applications, enabling developers to store, query and analyze both structured and unstructured data at scale, with the data stored in these platforms playing a crucial role in supporting advanced analytics and machine learning. — Read More
Tag Archives: Data Lake
Langfuse and ClickHouse: A new data stack for modern LLM applications
Building an AI demo application is easy, but making it work reliably is hard. Open-ended user inputs, model reasoning, and agentic tool use require a new workflow to iteratively measure, evaluate, and improve these systems as a team.
Langfuse helps developers solve that problem. Its open-source LLM engineering platform gives teams the tools to trace, evaluate, and improve performance, whether they’re debugging prompts, testing model responses, or analyzing billions of interactions.
For companies working with sensitive or large-scale data, part of Langfuse’s appeal lies in its flexibility: it can be self-hosted or used as a managed cloud service. his flexibility helped Langfuse gain early traction with large enterprises—but it also created a scaling challenge. By mid-2024, the simple Postgres-based architecture that powered both their cloud and self-hosted offerings was under pressure. The platform was handling billions of rows, fielding complex queries across multiple UIs, and struggling to keep up with rapidly scaling customers generating massive amounts of data. Something had to change.
At a March 2025 ClickHouse meetup in San Francisco, Langfuse co-founder Clemens Rawert shared how the team re-architected their platform with ClickHouse as the “centerpiece” of their data operations. He also explained how they rolled out that change to thousands of self-hosted users, turning a major infrastructure change into a win for the entire community. — Read More
Modern Enterprise Data Architecture
Learn modern enterprise data architecture perspectives, including solution approaches and architectural models to develop new-age solutions.
Data plays a vital role in conceptualizing the preliminary design for an architecture. You may want to decide the requirements for security, performance, and infrastructure to handle workload, scalability, and agility in design. In this case, you need to understand data models and how to handle architectural decisions, including data privacy and security, compliance requirements, data size to handle, and user handling requirements.
This is the reason that data-driven architecture is the driving factor for an enterprise design development. The modern enterprise architectures that are referred to in this article include microservices, cloud-native applications, event-driven solutions, and data-intensive solutions. The article intends to share modern enterprise data architecture perspectives, including solution approaches and architectural models to develop new-age solutions catering to velocity, veracity, volume, and variety of data handling services. Read More
Data Warehouse vs. Data Lake vs. Data Streaming: Friends, Enemies, Frenemies?
The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems. Storing data at rest for reporting and analytics requires different capabilities and SLAs than continuously processing data in motion for real-time workloads. Many open-source frameworks, commercial products, and SaaS cloud services exist. Unfortunately, the underlying technologies are often misunderstood, overused for monolithic and inflexible architectures, and pitched for wrong use cases by vendors. Read More
Who’s Who in the Modern Data Stack Ecosystem (Spring 2022)
Welcome to the Spring 2022 Edition of the Modern Data Stack Ecosystem. In this article, we’ll provide an in-depth look at the Modern Data Stack (MDS) ecosystem, updated from our Fall 2021 edition. We also highly recommended our article, The Future of the Modern Data Stack, to anyone who is new to the MDS and wants to learn about its history. Read More
There’s no such thing as data
Data is the new oil, we are told. Every country needs a data strategy, and all of us should own our data, and be paid for it. But really, there is no such thing as data, it’s not yours, and it’s not worth anything.
Technology is full of narratives, but one of the loudest is around something called ‘data’. AI is the future, and it’s all about data, and data is the future, and we should own it and maybe be paid for it, and countries need data strategies and data sovereignty. Data is the new oil!
This is mostly nonsense. There is no such thing as ‘data’, it isn’t worth anything, and it doesn’t really belong to you anyway.
Most obviously, ‘data’ is not one thing, but innumerable different collections of information, each of them specific to a particular application, that aren’t interchangeable. Siemens has wind turbine telemetry and Transport for London has ticket swipes, and you can’t use the turbine telemetry to plan a new bus route. If you gave both sets of data to Google or Tencent, that wouldn’t help them build a better image recognition system. Read More
Defining enterprise AI: From ETL to modern AI infrastructure
The promise of enterprise AI is built on old ETL technologies, and it relies on an AI infrastructure effectively integrating and processing loads of data. … Effective data integration is critical for enterprise AI. Data is the lifeblood of enterprise AI applications and its extraction and storage must be optimized. Read More
Key Trends in Data Lakes
Data lakes have become a key tool for mining competitive insight from large repositories of data.
The term data lake has been with us for many years. It’s origin is attributed to James Dixon who coined the term while writing, “If you think of a data mart as a store of bottled water – cleansed, packaged, and structured for easy consumption – the data lake is a large body of water in a more natural state.”
Many a subsequent writer has questioned whether organizations were creating data lakes with business value or data swamps with limited or no value. Given this, Marco Iansiti and Karim Lakhani have suggested that the data lake, data in it is original source, is part of a data platform with “data flowing from bottom to top…And the data platform aggregates, cleans, refines, and processes data” captured in the data lake.
Given this more refined view, the question is: where is the data lake within its hype cycle? To answer this question, I asked CIOs and industry experts for their opinions. Read More
Digital Strategy Series Part I: Creating a Data Strategy that Delivers Value
Oh, the strategy pundits hate me! It’s not because I’m tall, good looking and from Iowa (well, 2 out of 3 ain’t bad), it’s because I think Strategy as a “Discipline” is way overblown. I won’t go as far as the Harvard Business Review to state that “Strategy is Dead”, but the importance of carefully defining a strategy (typically done in the ivory towers of the puzzle palace) and then commanding all the little soldiers to follow the strategy script are over!
… The Internet and Globalization have mitigated the economic, operational and cultural impediments traditionally associated with time and distance. We are an intertwined global economy. … So, my next two blogs are going to discuss: How does one develop and adapt data and AI strategies in a world of continuous change and transformation? Read More
Why Data Is Not The Next Oil
Marketing, at least in the IT sector, has been replaced by memes. Every so often a Gartner slide deck goes viral, and the next thing anyone knows, pithy and mostly meaningless phrases and sayings are driving Fortune 500 strategies. Execs commit to multi-billion-dollar initiatives to make sure that their companies are perceived as being hip or cool (or, to use the more typical phrases, competitive and lean), big projects get greenlit, and at the end of the day, after a forced death march, the system goes live with a great big “meh”. Those same execs may see one or two quarters boost from the system of a couple of percentage points, but then the hemorrhaging begins anew.
The meme-factories recently spit out the meme “Data is the next Oil”. Translating from the Memespeak, what I believe this expression is intended to imply is that the data within your organization is valuable and that if you do not transform your organization to more effectively utilize that data, you will get left behind.
The problem with this is that it is true only for a small percentage of companies or for a limited period of time. Read More