This is an excerpt draft from Chapter 4 (Building a Data Model) of my new data modeling book. I’m still noodling on this core idea, but the general idea will likely be included in the book. Let me know your thoughts.
Thanks,
Joe
Data modeling is a way to tell a story with data. A good data model organizes and structures data in a way that captures your business’s rules, vocabulary, relationships between “things,” and information and process workflows. Learning data modeling shouldn’t be hard, and a big focus of this chapter is showing how to think about data modeling in an easy-to-understand way.
However, when I see data modeling taught, it’s often in an intimidating, jargon-heavy, and academic way. This scares many people away from trying to learn data modeling (and some people wonder why it has lost traction over the decades).
Today’s world is very data-heavy. Understanding data modeling is also a highly effective way to understand and drive your business. You might have a lot of fun with it. The approach I provide in this chapter is meant to be as simple and jargon-free as possible. Data modeling should be accessible to all, regardless of your role-whether you’re a software developer, a data analyst, or an ML engineer. Ultimately, you’re trying to map how your business works onto data. The better you get at thinking like a data modeler, the better success you’ll have with data, no matter the use case.
The 5W’s of journalism1 is a helpful framework for thinking about a data model. Imagine the job of a journalist. She has to ensure a comprehensive and clear account of an event story, often under immense time pressure because of deadlines. The 5W’s framework is a battle-tested way to provide comprehensive and clear information in a highly efficient, credible, engaging, and versatile way. It’s also something you were probably taught in school, so you probably have some familiarity with it. As a reminder, the 5W’s are who, what, where, when, why, and how (optional).
In data modeling, the 5W’s can be viewed in this context.
Who will this data model serve? These are the stakeholders and users of the data model.
Why does this data model need to be built? What is the purpose and objective of the data model?
What are the data model’s core entities, attributes, and relationships? This is where you learn to see the business through the lens of data.
When is the timeframe for this data model? Will it cover all historical data, predict the future, or mix the two? Is this data model meant to represent a historical artifact, live within an application, or provide predictions?
Where is the data located? Is the data even available, or does it need to be created?
How will we build this data model? Consider the implementation details (tools, techniques, updating the data model, data governance, etc) of building the data model.
This ordering is intentional. I see people wasting a ton of time with their data modeling efforts without knowing who the data model is for and why it’s needed. This produces a lot of wasted effort and frustration for the business. Decades ago, data modelers could get away with creating a grand overarching enterprise data model that covered the entire scope of the business. That’s not the reality we’re in today. Things move at warp speed. Priorities shift. New features need to be delivered quickly. The data model must be created and maintained against time and money constraints. Data modeling must be done in a way that adds business value and can be rapidly iterated as the business landscape changes. And because the consequences of ignoring intentional data modeling can be severe, it’s more important than ever that anyone who touches data is at least familiar with the basics of data modeling and the various approaches they can use, given their situation.
Please note that the 5W’s in this section are distinctly different from the 7W’s of the BEAM data warehouse modeling approach. The 7W’s - who, what, where, when, why, how, how many - is an excellent way to gather requirements for a dimensional data model (you’ll learn what that is in upcoming chapters). The 5W’s I propose here is a comprehensive framework for data modeling, regardless of the specific implementation approach like BEAM. So have your 5W’s and your 7W’s! Thanks to
for bringing attention to this distinction.
Note - I’m sure some are going to ask if I’m covering conceptual, logical and physical modeling in this chapter. Yes.
Very solid approach, which is common framework (5W1H) in the academic settings of ontology engineering and conceptual modeling. The other term is “competency questions”, which serve as the requirements for the data model/ontology.