What is a data model? I like to ask this question during my conference talks, and the answers are all over the place. I’ve never seen a group of people consistently give a single definition. Before I give my working definition, let’s look at a few ways some notable experts define it.
“A data model is a wayfinding tool for both business and IT professionals, which uses a set of symbols and text to precisely explain a subset of real information to improve communication within the organization and thereby lead to a more flexible and stable application environment.” - Steve Hoberman, Data Modeling Made Simple, p. 13
“Data models are techniques for representing information, and are at the same time sufficiently structured and simplistic as to fit well into computer technology” - William Kent, Data and Reality, p 119.
“A data model tells a story, and the story is about how a group of people come together to use data to solve a business problem (or take advantage of a business opportunity). The data model becomes a record of the journey from the conception of the problem to its solution.” - Larry Burns, Data Model Storytelling, p. 16
“...every model represents some aspect of reality or an idea that is of interest. A model is a simplification. It is an interpretation of reality that abstracts the aspects relevant to solving the problem at hand and ignores extraneous detail.” - Eric Evans, Domain Driven Design, p. 2
As you can see, a data model has many definitions and interpretations. If you ask ten people what they think a data model is, you’ll get at least ten answers. There are some common threads in these and other definitions you’ll find in other books and articles on data modeling. First, a data model represents reality abstractly, not as a complete mirror image. As Hoberman points out, these could be concepts, events, and relationships represented as a set of symbols and text. Second, a data model organizes and standardizes data precisely yet simplistically. Third, a data model improves communication, provides utility, promotes action, and solves problems.
I define a data model as follows. Note that this is a working definition that will very likely change as I write my data modeling book (in fact, it differs from the definition I gave in yesterday’s 5-Minute Friday). But the general points should remain the same.
A data model is a structured representation that organizes and standardizes data to enable and guide human and machine behavior, inform decision-making, and facilitate actions.
The emphasis on machines is a departure from other definitions of data modeling you might encounter. Historically, data modeling focused on making data understandable and valuable for humans. This definition recognizes that data is modeled for humans and machines. In fact, most data is modeled for machines, not humans. Think of the computer systems you interact with daily - your computer, smartphone, and other smart devices. Since the dawn of computing, humans modeled data for computer systems and applications to perform automated tasks. While humans use data models (often of the analytical variety) to make decisions and take actions, the number of machine-oriented actions is far greater. And with the rise of machine learning and AI, machines will increasingly become autonomous. Data is growing to a scale where humans have increasing trouble fully reasoning about it without the assistance of machines, and these machines will require better hardware, different types of data processing, and AI. Our thinking and approach to data modeling needs to evolve past a human-centric worldview.
Another major departure is the definition is agnostic to particular approaches like modeling for operational or analytical uses. It subtly recognizes the continuously evolving nature of data models. The definition still preserves the conceptual, logical, and physical phases of data modeling (or, as Steve Hoberman calls it, “align, refine, define”). Those phases won’t go away. However, the definition widens the possibilities for data modeling. Historically, data modeling is often viewed through the lens of particular use cases, like analytics, as if that’s the only place where data is modeled. This ignores the data models that exist upstream and downstream from analytics. The discussion of data modeling needs to be expanded away from myopic fixation on any particular approach or use case. Due to the continuously flowing nature of data, all modeling approaches are worthy of consideration, with varying degrees of utility for the situation at hand. I’ll discuss my thoughts on expanding data modeling in an upcoming discussion on Mixed Model Arts, which approaches data modeling across various use cases the same way you’d approach a mixed martial arts fight - you better know various techniques and when to apply them for the best outcomes. There is not one true way to fight, and there’s no one true approach in data modeling. There are many approaches. Pick what works for your situation.
Again, this definition is a work in progress and will likely evolve as I write my book and receive input from readers. Even people like Steve Hoberman, who I view as the OG of data modeling, regularly review and revise his definition of data modeling. I think that’s a recognition that data modeling isn’t a static thing, and as an industry, we need to constantly challenge our assumptions and evolve. That’s how we grow as an industry. But it’s also important to put a stake in the ground for now, and this is how I view data modeling today.
Thanks to everyone who’s joined Practical Data Modeling. It’s super cool to see the discussions so far, which I think are some of the best I’ve seen anywhere at the moment.
I’ll be writing more data modeling articles over there, along with early draft chapters of my new book, so stay tuned.
I’m wondering if this deserves a different term than “data model”? The risk of using data model is that it overloads and already well worn term. Thoughts?
Getting more and more excited about the book 🫶 I really like the agnostic nature of the definition between operational and analytical use cases. I think this has been missing.
On "a structured representation that organizes ... data" I also see "a representation of structure" and how it connects to the idea in systems theory and systems thinking that the structure of a system largely determines its behavior (e.g., Conway's law).
For me, this makes the MMA idea especially powerful. There are different archetypes for structuring data and data systems with inherent tradeoffs. Instead of fighting about which modeling approach is best, I find it extremely useful to understand which archetypes exist, their tradeoffs and how they affect the behaviors of the system so that we can choose the one best suited for each particular use case.