Architecture and data models are deeply intertwined, especially for analytics. This is a two-part series looking at the various mainstream data architectures and components that will influence your data model (and vice versa).
Here, we’ll look at two classical approaches - the data warehouse and the data mart.
The next part of this series looks at contemporary architectures and components, including data lakehouses, open table formats, streaming, semantic layers, and much more.
Much more coming soon, including part 2 of this series (duh), dimensional modeling, contemporary analytical modeling (OBT, JBOT, etc), data vault, etc.
Thanks,
Joe
As upstream data comes in various shapes, forms, and velocities, it all comes together in analytics. Because there’s no single architecture for source data and there’s no one-size-fits-all data model upstream, analytical data modeling will need to account for this variety.
As we explored in the last part on application modeling, it’s hard to understand data modeling without considering the architectures and components involved. I often find that people confuse modeling with architecture, and vice versa. A common thing I hear is “a data warehouse is a Kimball data model.” This statement shows a gross misunderstanding of both data warehousing and Kimball’s modeling approach, which you’ll learn about in the next chapter.
We’ll consider architecture from the perspective of its impacts on data modeling. A goal of this section is to clarify some common misconceptions about data models and architecture. I often find people equate data modeling with the architecture in which the data model resides.
In analytics, there are a few mainstream architectural flavors. We’ll look at the data warehouse, data marts, the data lakehouse, and streaming. We’ll also examine some key components of these architectures. After reading this, you’ll have working knowledge of the various systems you’ll encounter and how you can choose your data model appropriately.
Let’s start by looking at the origins of analytical architectures.
Origins of Analytical Architectures
Before we dive into the various architectures, it's helpful to understand the context behind why these architectures and components exist in the first place. As you’ll see, a general theme is the continued separation between operational and analytical data.
Going back to the 1960s, operational systems processed transactions. Data was stored as files, which were very messy to deal with. Databases were invented to consolidate these data files into a central place, where they could be stored and queried. Applications and databases were created to run various parts of businesses, often in a disjoint manner. The OLTP systems at the time were very primitive, constrained, and disjointed from each other. There may be separate systems for accounting, customer service, and warehouse operations. If you wanted to get a comprehensive view of your business, an enormous amount of effort and toil would be spent pulling data from these various source systems.
Because data was not integrated in a central location, every analytical question was ad hoc and bespoke, requiring the same old painful process to be repeated.
In the 1970s, computers were almost entirely dedicated to transactional processing. At the same time, people began to recognize the power of the data stored in these computers. Early attempts at channeling this data for decision making became popularly known as Decision Support Systems (DSS). DSS’s founder, Michael Scott-Morton, described DSS in the following way in his 1979 paper, “Decision Support Systems: Emerging Tools for Planning.”
“...there is a class of problem (20) where a Decision Support Systems (D.S.S.) approach can pay off; situations where the concern is with a specific decision or decisions that must be made, not with the overall functioning of a particular part of the organization. A situation where the focus is on supporting the manager, or managers, not replacing them with an automated system; where the focus is on the system, including the organizational context in which decisions must be made, and not just the computer itself. Thus a decision support system is a different kind of a tool than a functionally based clerical-replacing computer system.”
DSS underwent various manifestations in the 1980s, culminating in the development of the data warehouse. You’ve probably heard of a data warehouse. But what is it? Let’s find out.
Classical Analytical Architecture Patterns
Two classical analytical architectures were created in the 1990s that form the backbone of today’s analytical architectures: the data warehouse and the data mart. Let’s look at them.
Keep reading with a 7-day free trial
Subscribe to Practical Data Modeling to keep reading this post and get 7 days of free access to the full post archives.