This is an early draft of my book, Practical Data Modeling, with a working title. I’m opening up the curtain, so to speak, and showing you the writing process of my book. Hopefully, you’ll learn something and maybe want to write a book.
A note on early drafts and first chapters:
Early drafts are meant to be a “good enough” stab at a topic. It’s not final, and things will change. As the old saying goes, the real act of writing starts in the editing and revision process. I don’t expect my early drafts to be perfect. Feel free to beat the crap out of them.
As a writer, the first chapter is almost always the hardest to write. It’s the first chapter people see, and you’ve gotta nail it. In the first chapter, a writer will make a ton of mistakes that will only become clear as the book is written. So, this chapter, in particular, will likely see many revisions due to many things—new ideas, great critiques and feedback, second-guessing, etc.
Thanks,
Joe Reis1
What is data modeling? If you ask a group of people this question, you’ll get as many answers as the number of people you asked. Let’s start by defining data and models and then clarifying what data modeling is and is not.
It Starts With Data
“We do not, it seems, have a very clear and commonly agreed upon set of notions about data - either what they are, how they should be fed and cared for, or their relation to the design of programming languages and operating systems.” - George Mealy, Another Look at Data (1967)
Mealy’s claims in 1967 of data’s elusive nature still hold today. When I ask professionals with “data'' in their title about how they define data, they often get flustered, like asking a fish to describe water. Because many of us continually work with data, we’re almost too close to it. It’s sometimes difficult to understand what we’re working with. I notice this frustration percolates to the broader organization. I see “data-driven” companies struggle to “get value” from their data, ignoring the feeding and caring of data. Data sits around, misunderstood and underutilized.
Since this book is about data modeling, we need a working definition of data. Here are some definitions.
“…data is a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted formally.” - Wikipedia.
“…facts or information, especially when examined and used to find out things or to make decisions” - Oxford Dictionary.
“A re-interpretable representation of information in a formalised manner suitable for communication, interpretation or processing.” - DAMA UK.
To summarize, data is a collection of values that convey information, such as quantity, quality, or facts, that can be used to make decisions or to gain knowledge. Data is represented in a formalized manner for communication, interpretation, or processing.
Is this definition perfect? Probably not. Mealy's frustration continues. As far as I can tell, we’re still no closer to having “a very clear and commonly agreed upon set of notions about data.” But this is a “good enough” working definition of data for this book.
Before I move on, I know some of my friends will fixate on the word “knowledge.” Let’s make a short pit stop and discuss the popular data, information, knowledge, and wisdom hierarchy (DIKW). In this lens, data are the raw ingredients that create information, knowledge, and wisdom in that order. DIKW is a neat and convenient construct. Reality is, unfortunately, a lot messier than a simple hierarchy can accommodate. I’ll try to limit the time spent being overly philosophical and pedantic, and I will have more to say about DIKW throughout this book.
Next, what is a model?
Keep reading with a 7-day free trial
Subscribe to Practical Data Modeling to keep reading this post and get 7 days of free access to the full post archives.