“We Don’t Have Time For Data Modeling”
1. The Tension of Deep Work and Delivery Work in Sprints
Here’s the final section on the levels of data modeling. A major complaint I hear is, “We don’t have enough time for data modeling.” I’ll break down common considerations and critiques of applying these levels in practice.
Let’s start with an extremely common one - how to do data modeling in Sprints.
More of these sections will trickle in over the next few weeks, as I’m getting many inquiries.
If you have any other criticisms or considerations of data modeling you’d like me to address, post in the comments or message me. Assuming it’s a good fit for the book or an article, I’m happy to address them.
Next week, I’ll start dropping sections of the analytical data modeling chapter(s).
Thanks,
Joe
Ellie makes data modeling as easy as sketching on a whiteboard, so even business stakeholders can contribute effortlessly. By skipping redraws, rework, and forgotten context and keeping all dependencies in sync, teams report saving up to 78% of modeling time.
Bridge reality with Data! Read more here.
Thanks to Ellie.ai for sponsoring this article.
“We don’t have time for data modeling.” - Lots of people I talk with
A major complaint against data modeling is that it takes too much time. Today’s businesses move extremely fast, and anything that slows down business is seen as an impediment. Unfortunately, data modeling is notorious for taking too long and not delivering results. This isn’t without good reason. In conversations with people opposed to data modeling as a practice, they bring up stories of glacially slow and laborious data modeling initiatives.
Infamous among these tales is the Big Design Up Front of a humongous Enterprise Data Model that took several years and an army to build. These enterprise data models had difficulty adapting to a faster business climate. Sadly, when this data model was produced, it was outdated and didn’t accurately reflect the business. And the slow introduction of the model was only the down payment. Assuming the Enterprise Data Model was launched, the maintenance was painful. Dedicated data modelers, architects, and DBAs were often responsible for maintaining this model. Any additions or edits to the data model took much time and ceremony. It wasn’t uncommon for a new column to take several months to be introduced into a database table. Given the time and cost involved, the model inevitably died quietly, as the business moved on to other initiatives. Data modeling still carries this baggage today.
Another impediment is that even if there’s time to model data, the education and awareness of how to model data might be missing. This is especially true with newer technical professionals, who might attend boot camps or college courses with only minimal discussion of data modeling, if they occur at all. I’ve seen many data teams where nobody understands the basics of data modeling, let alone the levels of data modeling. The education gap is a big reason I wrote this book.
Finally, some see these levels as ceremonial and unnecessary. Conceptual modeling can be laborious, particularly if it involves aligning with stakeholders on definitions and understanding their world. Stakeholders have work to do, and there’s often little incentive to discuss with data modelers. Surely, one can surmise the stakeholder’s world without talking with them? As the lines between logical and physical modeling blur, the temptation is to throw data into the physical system and call it a day. Physical-first modeling is pervasive; it is the default way I see data modeling done today.
Part of the problem is how we’ve been doing data modeling. It’s been too ceremonial and pedantic for far too long, slowing things down. Success in today’s world means delivering quality on time and under budget. I’ll suggest some things we can do to speed up the data modeling process. Even if you don’t intentionally model data, you’re implicitly going through these levels.
Should you make time for data modeling, and can you approach these levels in a way that enables fast iteration and delivery? I think so. Let’s go through some ways to model data faster.
The Tension of Deep Work and Delivery Work in Sprints
“There is never enough time to do it right, but there is always enough time to do it over.” - John W. Bergman
The levels of data modeling started in the era of top-down Waterfall. Back then, requirements were bestowed from above onto a team in charge of implementing them. Things were built to specification, and there was little room for iteration or flexibility.
Nowadays, many tech, data, and ML/AI teams operate under some form of Agile. The goal of Agile is rapid and iterative delivery of a product and value. Incremental learning is encouraged. Moving fast is the name of the game. Much has been written about how Agile and data modeling can work together, and it’s been proven to work if the willpower and support exist to enable proper agility. Sadly, I find Agile in name only, where stakeholders, leaders, and teams pay lip service to the notion of continuous value delivery.
Another tension I notice is shipping at the expense of thinking. This is manifested in how teams do their work. Often associated with Agile is Scrum (though these are not the same), whose goal is to bucket work into time intervals called Sprints, with the goal of incremental delivery of value. Most data teams I see work in sprints, usually lasting around two weeks. I’ve worked in Sprints myself. They’re effective for time-boxing certain types of work. But I struggled with whether everything needed to fit into a Sprint, and a big question I’ve pondered for many years is whether data modeling fits into Sprints. Let’s look at the tension between deep and delivery work.
Data modeling is a combination of deep work and delivery work. The deep work in data modeling is the thoughtful work when you’re researching, designing, and interviewing people. Especially in the conceptual data modeling phase, discovering the patterns in your business requires more than surface-level work. Conceptual modeling is done through “seeing” - talking to people, learning their workflows and vocabulary, and translating this to a data model. It involves a natural curiosity about the world. This type of deep work doesn’t fit neatly into time buckets.
On the other end of the spectrum is delivery work, where you write the SQL code to create or modify database tables, create or tweak an app feature, or train an ML model. In the case of data modeling, the deep work invested in designing the model has already been done. Delivery work incrementally improves this model to deliver value through data products and ad hoc requests. This type of work is appropriate for Sprints.
Data is a thinking person’s game. Yet the deep work in data is often treated like delivery work and shoehorned into Sprints. There are ways to handle this shoehorning. You can incorporate Kanban or Sprint Spikes, allowing for research and scoping flexibility. Logical modeling might be considered Deep Work, but I believe this to be more straightforward since the conceptual design has already been done. Logical data modeling is a form of design, but you’re translating from the CDM to the middle state before you physically implement the data model. Depending on the complexity and scope of the conceptual data model you’ve designed, logical and physical modeling is tactical work that can be broken into Sprints.
Making your Sprints amenable to a mix of deep and delivery work is possible. Data modeling can move quickly in the right environment and fail miserably in the wrong one. As I’ve written in many places, data is a silent killer. Data models tend to evolve more slowly than code, and the impacts of poor data models are sneaky. You won’t know you’re in danger until it’s too late. So, take the time to do things right. Move slowly to move faster and safer over the long haul.
Highly recommend Lawrence Corr's Agile Data Warehouse Design book in case anyone hasn't read it already. It's very much a Just Enough Design Upfront (JEDUF) approach that encourages iteration.
Well said Joe, to add on this topic - I think it is a misunderstanding that the whole enterprise / department / subject area need to be fully modelled before starting implementation.
With data modelling patterns like Data Vault, Anchor, Focal or any other form of Ensemble modelling you are able to start implementing a small part of the model while the modelling is ongoing. These parts can also be tested and used for reporting analysis without worrying to redo all the work when other parts are added. This has to do with the separation between identifiers, relationships and context (Unified Decomposition).
When I discovered this way of modelling it really made a big difference in progress, perception and communication.