In my last post, “My Definition of Data Modeling (for today),” I looked at what data modeling is. As a reminder, I defined data modeling as
“A data model is a structured representation that organizes and standardizes data to enable and guide human and machine behavior, inform decision-making, and facilitate actions.”
I speak with many practitioners and engage in plenty of online discussions about data modeling. When discussing data modeling, people often jump straight to the implementation details. For example, when discussing analytical data modeling, I often hear, “Data modeling is Kimball.” Or if you talk with someone from the relational modeling camp, they’ll say something like, “Data modeling is about constraining data, using predicate logic.” These explanations drive me a bit nuts, to be honest. They’re not exactly wrong, but they miss the bigger picture of what’s possible with data modeling and why we do it at all, which I’ll discuss in upcoming articles.
Based on my experience, conversations, and observations, here are some things I think data modeling is NOT.
Perfect. To quote my friend Gordon Wong, “Data modeling is not perfect.” At best, a data model closely approximates the thin slice of reality you’re trying to represent. However, this model is inherently flawed. When you model data, you create a simplified representation of complex real-world phenomena, systems, and workflows. On top of this, information is always incomplete, as data can’t capture everything for a given point in time. Since data is incomplete, it will be inaccurate and imperfect, and your data model will reflect these limitations. Sometimes, your model is a pretty good representation of reality; other times, it’s terrible. As Voltaire said, “Perfect is the enemy of good.” Stop striving for perfection, and ensure your data model is good enough for your use case.
Only about physically storing data. Data modeling primarily focuses on defining the structure, relationships, and meaning of data rather than the specifics of how data is physically stored in data systems. Engineers often approach data modeling as an exercise to shoehorn data into a particular system or database. This is like cutting off one’s limbs to fit a bed1. A better approach is to build the bed to fit a range of people's body sizes, not make the body size conform to the bed size. Sadly, this is how data modeling is often viewed - cramming data into a system without regard to how it connects to the organization. For instance, a data engineer first approaches a data model by considering how well it fits into a columnar OLAP database. This is backward. I view a strictly physical approach to data modeling as myopic and dangerous because it misses the bigger picture of how data weaves itself into the fabric of your organization. If a data model's intent is only stored in a database, with no connection back to the organization, then the data has a limited (or negative) purpose.
A specific approach or technique. When I ask people what data modeling is, I often hear responses like “Kimball,” “Inmon,” “Entity-relationship,” “Relational,” “Data Vault,” “predicate logic and set theory,” and more. Often, these responses are coupled with a religious tirade of why a specific technique is the one true way for data modeling. These are all certainly valid techniques and approaches to data modeling. But I wouldn’t consider any of them to singlehandedly define data modeling, no more than saying throwing a jab or a left hook defines the sport of boxing. Data modeling involves many approaches and techniques. Know the various approaches and techniques, and use what works for your situation.
A one-time process. The world is never static but rather messy, dynamic, and chaotic. Your data model will go stale. My business partner and co-author Matt Housley and I are fond of saying that data has entropy. Data will eventually drift from the underlying concepts it intended to capture. For example, I often see the definitions of a “customer” or “revenue” drift and evolve as an organization grows and evolves. Whereas these concepts meant one thing at a particular time, data modeling should be an iterative process, not a one-time task that ends once the initial model is created. Good data models are flexible and can adapt to changes in business requirements or technological advancements. Models must evolve with the organization’s reality and needs.
Only for massive enterprises. Data modeling is often seen as something reserved for massive enterprises. Data modeling is actually for organizations of all sizes and maturities. It’s best to embrace data modeling as early as possible. An old Chinese proverb says, “The best time to plant a tree was 20 years ago. The second best time is today.” Data modeling exists on a spectrum of easy to complex. Getting easy wins early on means your data is in great shape as your organization grows, and also highlights the value of data modeling. Ignore data modeling until you become a big enterprise, and you’ll get to untangle a gigantic (and preventable) mess (if that’s even possible). Even small projects can benefit from the clarity and structure that a well-thought-out data model provides. The goal is not to create complex or overly technical models but to build models that accurately represent and communicate the data requirements in a simple, easy-to-understand way for all stakeholders.
Limited to technical stakeholders. Data modeling is not just for DBAs, software and data engineers, architects, and other technology stakeholders. All too often, technical stakeholders view data modeling as a naval-gazing exercise of storage and query optimization. This is definitely important but misses the big picture of who and what data is supposed to serve. Effective data modeling involves collaboration with non-technical stakeholders, including end-users, management, and executives. Especially at a high level, a good data model captures the business rules, logic, vocabulary, and workflows of your organization. Also, non-technical stakeholders might engage in business process modeling. You might ask, “Is data modeling the same as business process modeling?” They’re related but different. Data modeling is not a substitute for business process modeling. Data modeling focuses on the structure and relationships of data elements, whereas business process modeling focuses on business activities and workflows. The two can undoubtedly influence each other, especially in the direction of business processes guiding the data model. A data modeler should know how to see business processes and translate these into a data model.
I can continue to split hairs about what data modeling is and is not, but you get the idea. The point is that your data model won’t be perfect, no universal approach exists, and data modeling encompasses much more than cramming data into systems. Everyone operates in a different context and situation. The data modeling approach at a Fortune 10 company is very likely unsuitable for a 5-person startup, and vice versa. Understand your situation and do your best, given your constraints.
This is the Bed of Procrustes. https://en.wikipedia.org/wiki/Procrustes
Great article, Joe, and I really liked this statement:
"Data modeling primarily focuses on defining the structure, relationships, and meaning of data rather than the specifics of how data is physically stored in data systems."
I would change the order to: "Meaning, Relationships and Structure" to reflect the flow, but that's a nit.
For sure. And also what it can’t solve…