Great article, Joe, and I really liked this statement:
"Data modeling primarily focuses on defining the structure, relationships, and meaning of data rather than the specifics of how data is physically stored in data systems."
I would change the order to: "Meaning, Relationships and Structure" to reflect the flow, but that's a nit.
1) the "A specific approach or technique" section say me the common mistake we can do: replace my capacity to reflect the real world to try to express the business as similar as possible with specific paradigm. The paradigm help to get a solution so much; but the logic and philosophy behind it is more important to improve my capacity.
2) the "Perfect" & "one-time process" sections say me i need to think beyond about structure of data to have the capacity to reflect the world because it will change in anytime. I think this is the reason i like the idea of "syntactic" approach guided by metadata aligment to generate the structure of data. This idea is based in hyper-agility and focal data modeling.
Flipping around "data modeling is not perfect", I want to say "data modeling is a useful approximation". It's by definition an estimate and shouldn't be confused with reality
Models in general are, right? It’s a projection of a slice of reality onto a 2d screen “flat surface”, with some set of symbols*
Asterisk on “symbols” because one of my favorite data modeling approaches (that’s pretty non-technical friendly) is to spell things out in paragraph form - for example:
Salespeople generate Leads which can turn into Consultations. Consultations can close, leading to a Customer Order, which has associated Line Items
^ could also represent that as an ERD. Wonder if this is where LLM’s come into play…
It seems an underlying theme of this article is that data modeling is just a means to an end; it is a method to usefully model reality. The how is still important, but less important.
Very insightful article, Joe, hope we will continue to see more to come around the topic of data modeling. It's true that your list on what data modeling is Not can grow further. On the other hand, however, the list on what data modeling is or could be may tell different truths and facts, which might still lead to confusions and different views and opinions as expected.
Similar to the word or concept, 'architecture', data modeling is pervasive across many activities in data space but always context-dependent. My 2c is whether this topic needs to be also explored and discussed with other relevant topics, such as
# data modeling vs data architecture, and their relationships
# Ontological analysis of data modeling (DB-based, Big Data-oriented, DA-oriented, DE-related, integration-based, platform-based, ....) and its relations to different parts and aspects of data practice
# data model and architecture management (who has responsibility between DE or DG teams?) which is probably related to other topics (documentation, knowledge management, and organisational learning)
# .....
# Finally, what would a model-driven practice for (or an approach to) data modelling management in an organisation possibly be?
Once again Joe...spot on! Loved your bringing the definition up a level from specific techniques or physical structures. As a first principle, it needs to reflect (and be reasonably understood by) the business. This is exactly what I am anticipating more of in the upcoming book! Thanks for sharing!
Yep...definitely looking through with a critical eye, and on first impression I just find myself headbanging in violent agreement. Unfortunately, there's no hair left to see flailing. 4 kids will do that to ya. 😬
Great article. The “data model” being documented (CDM, LDM, PDM, other designs for specific needs) are driven by many things. Inmon / Kimball as an example are designed to be a “How” most the time but can also be a “What” and adjusted for the physical data model for implementation based on capabilities to be used, internal design and implementation guidelines per our team/Company etc. Also, as you noted, there is much more that is documented and meta content that can be extended in a data model, for example, documenting PII or other Company needs aligned with metadata orchestration and supporting “Data Governance” execution. All these data model designs and documentation are needed to be thought through and decided as be part of the entire business processes to business data ecosystem within your “Enterprise”. Do we approach documenting things from a generic perspective at times or only how the business speaks at them, a combination thereof or one for business and on for IT, etc. Lots to think about for the direction and standards/guidelines to be used to be consistently adopted for the “Enterprise”. That is the exciting and challenging part!
Great insights in your post, Joe! I especially appreciated your emphasis on not shoehorning data into a particular system. Your exploration of the broader aspects of data modeling is refreshing. This reminds me a lot of the challenges we discussed in our article on extracting data from MongoDB to a SQL store:
Solving the physical storage issue is hard enough, but as you keenly pointed out, the real challenge lies in accurately representing the data’s structure and meaning across different formats. In our case, we tackled this by creating a dlt source that efficiently converts MongoDB BSON to JSON, and then flattens it into SQL tables, thereby preserving the structure while making the data more usable. Looking forward to your future posts! 👏
I really like your point about avoiding narrowing down modeling to just physical models. Once I held a meeting with 10+ experts on the definition of collateral types in banking. They all arrived with full confidence that there is one definition (theirs) and it will be a waste of time. We discovered together seven different valid variants (categorizations). I drew a lot on a whiteboard to achieve a common understanding. Was it modelling? :)
Great article, Joe, and I really liked this statement:
"Data modeling primarily focuses on defining the structure, relationships, and meaning of data rather than the specifics of how data is physically stored in data systems."
I would change the order to: "Meaning, Relationships and Structure" to reflect the flow, but that's a nit.
Good, but subtle point!
For sure. And also what it can’t solve…
Great article Joe, i love it so much!
This remember 2 points about the data modeling:
1) the "A specific approach or technique" section say me the common mistake we can do: replace my capacity to reflect the real world to try to express the business as similar as possible with specific paradigm. The paradigm help to get a solution so much; but the logic and philosophy behind it is more important to improve my capacity.
2) the "Perfect" & "one-time process" sections say me i need to think beyond about structure of data to have the capacity to reflect the world because it will change in anytime. I think this is the reason i like the idea of "syntactic" approach guided by metadata aligment to generate the structure of data. This idea is based in hyper-agility and focal data modeling.
Flipping around "data modeling is not perfect", I want to say "data modeling is a useful approximation". It's by definition an estimate and shouldn't be confused with reality
Models in general are, right? It’s a projection of a slice of reality onto a 2d screen “flat surface”, with some set of symbols*
Asterisk on “symbols” because one of my favorite data modeling approaches (that’s pretty non-technical friendly) is to spell things out in paragraph form - for example:
Salespeople generate Leads which can turn into Consultations. Consultations can close, leading to a Customer Order, which has associated Line Items
^ could also represent that as an ERD. Wonder if this is where LLM’s come into play…
I'm thinking so too...
awesome, hopefully I captured our convo correctly...
Sure did!
Excellent, excellent article! I think I will be quoting this one a lot in the future - with attribution of course ;)
Dope, enjoy!
It seems an underlying theme of this article is that data modeling is just a means to an end; it is a method to usefully model reality. The how is still important, but less important.
Very insightful article, Joe, hope we will continue to see more to come around the topic of data modeling. It's true that your list on what data modeling is Not can grow further. On the other hand, however, the list on what data modeling is or could be may tell different truths and facts, which might still lead to confusions and different views and opinions as expected.
Similar to the word or concept, 'architecture', data modeling is pervasive across many activities in data space but always context-dependent. My 2c is whether this topic needs to be also explored and discussed with other relevant topics, such as
# data modeling vs data architecture, and their relationships
# Ontological analysis of data modeling (DB-based, Big Data-oriented, DA-oriented, DE-related, integration-based, platform-based, ....) and its relations to different parts and aspects of data practice
# data model and architecture management (who has responsibility between DE or DG teams?) which is probably related to other topics (documentation, knowledge management, and organisational learning)
# .....
# Finally, what would a model-driven practice for (or an approach to) data modelling management in an organisation possibly be?
stay tuned...I was just writing about the relationship of architecture and data modeling...
Once again Joe...spot on! Loved your bringing the definition up a level from specific techniques or physical structures. As a first principle, it needs to reflect (and be reasonably understood by) the business. This is exactly what I am anticipating more of in the upcoming book! Thanks for sharing!
Thanks! These are early excerpts, so feedback is welcome, good or bad
Yep...definitely looking through with a critical eye, and on first impression I just find myself headbanging in violent agreement. Unfortunately, there's no hair left to see flailing. 4 kids will do that to ya. 😬
Nice article, I would love to hear more about evolvability in data models, seems a complex topic.
Great article. The “data model” being documented (CDM, LDM, PDM, other designs for specific needs) are driven by many things. Inmon / Kimball as an example are designed to be a “How” most the time but can also be a “What” and adjusted for the physical data model for implementation based on capabilities to be used, internal design and implementation guidelines per our team/Company etc. Also, as you noted, there is much more that is documented and meta content that can be extended in a data model, for example, documenting PII or other Company needs aligned with metadata orchestration and supporting “Data Governance” execution. All these data model designs and documentation are needed to be thought through and decided as be part of the entire business processes to business data ecosystem within your “Enterprise”. Do we approach documenting things from a generic perspective at times or only how the business speaks at them, a combination thereof or one for business and on for IT, etc. Lots to think about for the direction and standards/guidelines to be used to be consistently adopted for the “Enterprise”. That is the exciting and challenging part!
Thanks Dave!
Looking forward to this community enumerating the use cases of data modelling. The problem statements that sufficient data modelling can solve.
Great insights in your post, Joe! I especially appreciated your emphasis on not shoehorning data into a particular system. Your exploration of the broader aspects of data modeling is refreshing. This reminds me a lot of the challenges we discussed in our article on extracting data from MongoDB to a SQL store:
https://dlthub.com/blog/mongo-etl
Solving the physical storage issue is hard enough, but as you keenly pointed out, the real challenge lies in accurately representing the data’s structure and meaning across different formats. In our case, we tackled this by creating a dlt source that efficiently converts MongoDB BSON to JSON, and then flattens it into SQL tables, thereby preserving the structure while making the data more usable. Looking forward to your future posts! 👏
Best,
Aman Gupta,
DLT Hub Team
I came into data engineering via operations research. We go by "All models are wrong, but some are useful".
https://en.m.wikipedia.org/wiki/All_models_are_wrong
I really like your point about avoiding narrowing down modeling to just physical models. Once I held a meeting with 10+ experts on the definition of collateral types in banking. They all arrived with full confidence that there is one definition (theirs) and it will be a waste of time. We discovered together seven different valid variants (categorizations). I drew a lot on a whiteboard to achieve a common understanding. Was it modelling? :)
H be