Just-In-Time Data Modeling
Levels of Data Modeling Considerations - Ways to Speed Up Data Modeling
This is likely the final section of a short series on ways to speed up data modeling. Here, we talk about data modeling in a just-in-time manner. This differs quite a bit from how data modeling is traditionally taught and historically practiced, which is usually some form of Big Design Up Front or waterfall. These had a time and place, but things move way too fast for those approaches to be first considerations.
Read on for some tips on data modeling in an extremely fast and iterative way that matches today’s cadence of business.
Coming up - analytical data modeling (everyone’s been asking about this), graphs, and more.
Thanks,
Joe
Ellie makes data modeling as easy as sketching on a whiteboard, so even business stakeholders can contribute effortlessly. By skipping redraws, rework, and forgotten context, and by keeping all dependencies in sync, teams report saving up to 78% of modeling time.
Bridge reality with Data!
Thanks to Ellie.ai for sponsoring this newsletter.
The world moves very fast, often far too fast for data modelers to keep up. Stakeholders want answers yesterday. Engineering teams are juggling feature requests, bug fixes, and AI experiments. In this environment, the old-school idea of “let’s spend months or years modeling everything first” doesn’t hold up.
Enter Just-In-Time Data Modeling (JITDM). Like the other suggestions in this section for speeding up data modeling, you’re agile, responsive, and focused on doing just enough to solve today’s problem. Similar to just-in-time manufacturing, with JITDM, you’re delivering only what the user needs when they need it.
This contrasts with Big Design Up Front (BDUF), which we’ve discussed earlier. We’re almost taking the opposite approach with JITDM, where we ignore everything except what the user needs from their data model. At the risk of acronym overload, we’re taking the philosophy of You Ain’t Gonna Need It (YAGNI). Think of it like overpacking for a trip. For a 3 day trip, I’m not taking 15 pairs of socks. I’m a one-pair-of-socks kind of guy. Two, max. Stick to the bare minimum of what you need to get the job done. Or, as John Giles says1, “There is a difference between “you ain’t gonna need it ever” versus “you don’t need it yet.”” Unlike BDUF, with JITDM, YAGNI. Okay, maybe we don’t need so many acronyms either. I’ll stop.
What does JITDM look like in practice? You might get a request from your boss or product manager to improve the functionality of an app, answer a new question, or create or tweak an ML/AI model to perform an action. You’ll sketch the request, maybe just a quick drawing on a whiteboard, paper, or modern data modeling tool. Identify the entities and attributes you need, and connect the dots in terms of how they relate to each other and to what might already exist. Next, build the working model in code, SQL, notebooks, or whatever you’re using. Finally, deploy the model for feedback (ideally in a development or test branch). Does the new model meet their needs? If so, great, and if not, iterate. If this model gets reused or becomes critical, you can push it to production, evolve it, document it, and fold it into your broader ecosystem.
The JITDM approach works well for collaborative situations where the request is ad hoc (“I need an answer to X.”), schemas might evolve, or you’re testing out new functionality. The downsides are potential duplication and data redundancy, model drift, and technical debt. JITDM works when there’s a balance between moving fast in a lightweight way and applying just the right amount of discipline and rigor to make sure things work in production to meet the user's needs.
Here are some tips to be effective with JITDM:
Standardize your naming conventions. If you add or change fields, maintain consistency to prevent your team from being bogged down in unnecessary work.
If a model is used more than a few times, consider promoting it to a shared dataset or standard data model.
Write comments so you and others understand the intent of the data model
Use version control and work in branches. Don’t work in the main branch for prototyping.
Today’s world moves extremely fast. JITDM is not a rejection of traditional modeling, but rather a response to ever-increasing velocity. Balance speed with awareness. You’re not abandoning good practice. You’re adapting it to fit the tempo of modern data work. Not everything needs a heavyweight model. Sometimes, the best model is the one that gets built just in time to deliver value.
The Nimble Elephant, pg. 110
I see JITDM in the wild more often than I care to admit. Obviously, it is initially very efficient and works great until it doesn’t. The majority of data teams don’t have the proper data governance in place to avoid, as Joe put it, “duplication and data redundancy, model drift, and technical debt.”
One of the main questions to ask with JITDM is whether or not the long term data model mess is worth the initial, and sometimes immediate, business value. In rare cases, the answer may be yes.
Great article. Agree 100% with JITDM!
At the risk of adding another acronym I’m going to try and take a punt for JEDMO or Just Enough Data Modeling. JEDMO is based on the idea that a data design is always based on some model so let’s try and add a bit of data model thinking to the mix. If the team has decided on a OBT design then one thing that might emerge as you examine the data is that certain columns in the OBT are dependent on each others (e.g. a product hierarchy) and you might be able to persuade the team to model these dimensions to help OBT queries navigate the data.
Another example is the team that wants to get straight into physical design and has no time for logical or conceptual design. As you start to design the physical tables, especially when you have a strong source data aspect coming from SaaS products, the team starts to realise that there are some concepts that you need to clarify so starting to draw up the business aspects as a conceptual model and not just rely on mapping SaaS tables is useful for their design.
In both the examples above it is important to recognise that you do not have a ‘complete’ data model. You have just modeled enough of the concepts or physical relationships in order to make your work easier.
As the project ebbs and flows in terms of urgency of deliverables your data modeling effort can ‘ebb and flow’ to match the needs of the project. In essence this is my concept of JEDMO or Just Enough Data Modeling. It is a week by week thing. Maybe the data model is never complete and this is ok …