"Good Enough" Data Models

Mar 9, 2024

Data modeling along the spectrum of perfect vs "good enough"

22 Comments

Mar 9, 2024

I know this is not a popular opinion, but I don't understand why no one else is talking about Data Modeling from the MVC.

Software Engineers are supposed to understand the business to better understand how the user will interact with the controller, and the view, when the Software Engineers rarely seem to have a basic understanding of Data Modeling in their Software Model, and the Data Professionals are left to clean up their mess when we could be actually focused on actual Data Science and LLMs with a much smaller requirement for Data Modeling for Data Quality.

Also strange that Software Engineers can tend to be paid more than many Data Professionals when sometimes it feels like I should be constantly failing their Peer Reviews.

Expand full comment

Reply (1)

Joe Reis

Mar 9, 2024

Good points, especially about MVC (wish more people here discussed SWE practices). When I speak with SWE's, there seems to be a massive knowledge gap with respect to data modeling. For example, even if a database is initially modeled in 3 NF, there's a tendency to "just add fields" to the database as new features are introduced. This is exacerbated by ORM's, which make it dead easy to add new fields with little thought to normalization.

Expand full comment

Reply (2)

Josh Gallant

Mar 9, 2024

Well ORM's are such a big issue which is another thing we have to deal with with isn't it?

Yes, they help to keep an application easier to maintain, and not require a ton of costs to re-work the model, but then all that technical debt gets pushed on Data Professionals and we look bad, while SWEs look fantastic, lol.

Expand full comment

Dan Twomey

Mar 9, 2024

Hey Joe, can you elaborate on that a bit more. In my SWE days, we used ORMs to build out our models, and yes, we added content as needed, but the model changes were part of the release communication, and backward compatability was a core value.

Is the issue you were seeing/hearing more with poor modeling techniques in the operational models, poor backward compatability, poor communication with downstream teams, or all of the above?

Expand full comment

Reply (1)

Joe Reis

Mar 9, 2024

Good question. From my perspective, it’s all of the above.

Expand full comment

Reply (1)

Dan Twomey

Mar 9, 2024

Those all feel addressable by bringing the data folks and SWE closer together.

Do you see that problem more in companies where the product itself is more tech forward, or in companies where data and IT are a means to an end for some physical product or service? I feel like it would be more prevalent in the latter.

That's partially what I like about introducing the product operating model into the ways of working. When done right, the technology teams work closer together on solving a business problem, rather than spitting out features that may or may not be used at a frenetic pace. @Shagility has done a great job of beating that drum.

Expand full comment

Reply (1)

Joe Reis

Mar 9, 2024

Agreed, he's doing amazing work here, and yes, introducing a product operating model will help. I hope the divide between developers and data will be a relic by the end of the decade.

Expand full comment

Juha Korpela

Mar 9, 2024

The trick about "good enough" is that you have to know what to focus on. Take Data Vault, for example. Many modelers spend lots of time figuring out multiactive status-tracking satellites or something, when they haven't even picked the right hubs to begin with! Too much attention is paid to the technical finesses of the chosen method instead of just understanding the business first.

Expand full comment

Reply (1)

Joe Reis

Mar 9, 2024

Exactly. I mean, try telling an executive like Sol about the technical details of your solution. If it doesn’t address the business needs against the constraints you’re given (time, money, effort), it’s wasted motion.

Expand full comment

Dan Twomey

Mar 9, 2024

Great article!

This stood out to me:

'They can still be considered "good" if they are fit for their purpose and satisfy the needs of the business.'

Fitness for purpose should really be more of the gold standard than achieving perfection, particularly with change being the only constant, as JT pointed out.

That said, it does require embracing the fact that fitness for purpose in an operational system context is often different than fitness for purpose in an enterprise analytics context, and rolling with it.

Expand full comment

Jaime Tirado

Mar 11, 2024

I would venture to ask what does success look like within the scope of the outlined project. In other words, does the budget and timeline allow for perfection? Or are we teasing out an MVP1 to propel us closer to perfection via the early establishment of best practices.

Expand full comment

Reply (1)

Joe Reis

Mar 12, 2024

All the above? Success is very context and situation dependent.

Expand full comment

Will

Mar 10, 2024

Even "good enough" isn't easy because the model needs to be extensible. We all have experienced modeling to some requirements and then the user "plays with it" for a week and comes back with 10 further asks. Some just require adding columns, but some might require significant redesign of the model into different tables. If you think through what's coming ahead of time, most changes are trivial but if you don't then you end up with bigger problems as the users ask for more stuff.

Expand full comment

Justin Spagnolo

Mar 9, 2024Edited

**Sorry it became a rant :/....

Who determines what is good enough?

I have seen excellent data modeling architects get caught in a spiral when the "Multi-verse" is glimpsed and one realizes consequences are real but non-determinant when space-time paradox causes convergences that eliminate the rules that were once immutable... until confused Spiderman starts showing up everywhere... and then everyone starts acting like they are the exception to the rule... chaos reigns... and sociopathic types thrive in said environments...

I should wonder just how many hours, how much treasure was sunk, chasing perfection, rather than asking data creators/modelers to be accountable to their "good enough" creations... by offering well documented just-in-time context?

Perhaps instead of introducing new data movement tools, or catalog/schema functions, miners/lineage, we as an industry could develop a real methodology that is uniquely DataOps driven and not another retro-fitted "Agile" solution...

Sure new tools are great, but whatever the "goober" is that shuts down the LCH to prevent multi-verse-multi-dimensional cognitive dysphoria... only exists in the comics...

Good Enough as a definition will require principle understanding of when to uphold the rules of conceptual/logical/physical modeling and when to break them for expediency... that type of wisdom only comes with real-world experience... we can write about it... but only those who have seen it will understand it... and unfortunately after 20 years of selling promised technological solutions to business owners... at some point they will also have to accept "Good Enough"... and frankly I don't think they care to define "Good Enough"... unless it is increasing revenues or reducing OpEx/CapEx... so whatever we do as an industry, we better be ready to show the value...

Expand full comment

Reply (1)

Joe Reis

Mar 9, 2024

🕸️

Expand full comment

Reply (1)

Justin Spagnolo

Mar 9, 2024

maybe we should use a Kung Fu analogy? lol

Expand full comment

Robert Sanderson

Mar 9, 2024

Good enough is better than perfect, because data is never perfect! It's always messy, there's always better ways to do things if you had another 10 hours, or another 10 people working on it. Does it meet business needs? Is it comprehensible to the folks who need to use it? Usability is more important than modeling completeness. See slides 15,16,17 here: https://zenodo.org/records/10759740 :)

Expand full comment

Reply (1)

Joe Reis

Mar 9, 2024

Thanks Robert!

Expand full comment

Donald Parish

Mar 9, 2024

By good model, I think no shortcuts. If doing Kimball style dimensional model with star schema, use surrogate keys so you’re ready for slowly changin dimensions (SCD) or a change in business systems.

Expand full comment

Reply (2)

Joe Reis

Mar 9, 2024

That said, I 100% agree that if one adopts an approach (like Kimball, Data Vault, etc), the data person/team and stakeholders need to be agree on what the baseline of things are that will be in place (surrogate keys), and the tolerance for shortcuts (and ensuing technical/data/organizational debt). For example, I've seen many Kimball-lite implementations that skipped stuff like surrogate keys. It was "good enough", but also required the team to refactor later on.

Expand full comment