Levels of Data Modeling
An Introduction to Conceptual, Logical, Physical Modeling, and much more
Ok, here’s where the fun part of data modeling starts - the levels of data modeling. In this chapter series, we’ll look at conceptual, logical, and physical data modeling. We’ll also examine the age-old complaint - “data modeling takes too long.”
The treatment of these levels will be similar to what you’ve seen before. I’m also making a massive departure from the traditional treatments of ERDs, conceptual modeling, UML, etc. While great, there’s decades worth of material on these topics. Just covering those topics alone is lazy. I’m not here to replay the data modeling greatest hits album. It’s still relevant, but the world’s moving in new directions where we need to update our thinking and approaches to data modeling. I’ll cover stuff like applying these levels to unstructured data, using AI in your data modeling efforts, and much more.
Enjoy this short introduction. New sections will drop and be revised over the next couple of weeks.
As always, comment where you wish. If you find errors, please submit them here.
Thanks,
Joe
Are you struggling to find value in enterprise data modeling? Ellie.ai is an AI-powered data modeling tool that enables you to design and build data products 10x faster.
Learn More About Ellie's Data Modeling Practices
This post is sponsored by Ellie.ai
When I see people model data, it’s done with varying levels of intention and rigor. Sometimes, people are highly methodical about deriving high-level semantics, vocabulary, and information flows and translating them into highly refined data models. Others create their data model by writing a bunch of SQL code and plopping data into a database. “Good enough,” they might say, and perhaps it is. Regardless of how you model your data, you’re starting with at least a rough idea of the model.
Formally, the levels of data modeling follow a linear progression, from capturing high-level business ideas to the nuts and bolts of physical implementation in a data system. These are called conceptual, logical, and physical data models, respectively. Data modeling legend Steve Hoberman calls this “align, refine, design.” Whatever you call it, it’s all the same thing. You start at a high level and decide what to model, why it needs to be modeled, and so on. Even in the example above of cranking out SQL code and tables, the modeler has a rough idea of the tables to create, the purpose of those tables, and the data going into those tables.
Let’s start with an analogy most people can relate to - designing and building your dream house. I’m guessing you’ve at least seen a house under construction. When you design a house, you usually engage with an architect who will help you design your vision. My dream home is a mid-century modern home inspired by Frank Lloyd Wright, with plenty of open space and a sleek and minimal wood, concrete, and metal design. You probably have something different in mind. Hold that image. That dream home in your mind is your high-level concept, a picture of what you want and how its parts fit and flow together. In data modeling parlance, we’ll call this the conceptual model. Next, you need to figure out how to build it. Your architect will design detailed blueprints of the home, translating the high-level design into something more practical for implementation. A general contractor will look at these blueprints to understand what to build. This is the logical model. Finally, the general contractor oversees the construction crew to build your home, or the physical model. Hopefully, the high-level, mid-level, and low-level designs and implementation match! If so, you’ve successfully designed and built your home.
The order of operations in data modeling matters. Back to our home analogy, if you forget the blueprint, go to a plot of land, and start installing plumbing, you’re getting ahead of yourself and building things backwards. What’s the purpose of the pipes, and where will they go? Will they match the rest of the home? What does this home even look like? This is why home builders don’t just build a home willy-nilly. They start by planning at a high level, then iterate to finer and finer details. Once the plan is solid and agreed upon, the building begins.
With data modeling, the best approach is to start at a high level of understanding and work down to the implementation details. This approach works when you’re modeling data for an application, analytics, or ML/AI. Sadly, I find that quite a few data models are the equivalent of laying the plumbing first, then patching on different parts of the home as you build them - countertops, stairs, and maybe pouring the foundation last. It’s easy to get caught up in the gory details of your creation and miss the big picture of how it all ties together. I hear “It’s just code and data, and we can refactor.” Yeah, this seldom happens in practice. What starts as a prototype ends up as production-grade technical debt. Resist this temptation at all costs.
In this chapter, we’ll look at the traditional approaches to data modeling applied to databases. However, there are several decades of material on this, so this won’t be the chapter's primary focus. Extending these levels of data modeling to unstructured data and ML/AI is of greater interest to me. Unstructured data, like text, images, audio, video, etc, is far more common than structured data. And given the explosion of AI, I think this is where data modeling deserves more study and attention. Finally, we’ll address the elephant in the room - “data modeling takes too much time” and how you can speed up your data modeling efforts.
With that, let’s get started with conceptual data modeling.
Hi Joe, your example of the home building is very reminiscent of the example of a data town plan at the heart of John Giles' new book "The Data Elephant in the Board Room - driving data integration from the enterprise-side Data Town Plan". If you haven't read it then you might want to have a look. https://technicspub.com/data-elephant/
Had your Freestyle Friday on my mind... For building the a house scenario, why not a citizen architect, citizen electrician, citizen plumber? Home Depot is around the block! In IT, seems certain foundational skills are respected. A wizard at building great BI reports does not translate into a data creator. With less specialization, less data modelers/data architects, data engineers, taking up role of data modeling, they become a jack of all trades, master of none - and Tasks/Sprints take longer. Building analogy, what we lack is intentional frameworks and inspections (Arch review, QA). Dread them when the county/city has to inspect your house addition/deck etc. Standards, building codes - lead to better quality, less defects... Good models lead to data quality, foundationally.