I’m wondering if this deserves a different term than “data model”? The risk of using data model is that it overloads and already well worn term. Thoughts?
Maybe qualify the “type” of data model being done based on usage of the Zachman Framework and/or the “Who”, What”, “When”, “Why” and then “How” we will document the CDM, LDM, and/or PDM data model(s).
I can't help but agree. A model is a representation of something. What are we representing with a "data model"? A business process, concept, behaviour, etc. Not the data itself.
Now you could say that a data model also represents how you implement the data types, relationships, etc. in the database. But to me, in that context a data model is no longer a representation (model), because it's outlining exactly what's implemented.
I think with data modeling, we structure the data to model something else. Data structures is already a meaningful term though.
So I don't know what to say, I think you're on the right track and you'll figure it out, good luck Joe! Looking forward to the book.
Getting more and more excited about the book 🫶 I really like the agnostic nature of the definition between operational and analytical use cases. I think this has been missing.
On "a structured representation that organizes ... data" I also see "a representation of structure" and how it connects to the idea in systems theory and systems thinking that the structure of a system largely determines its behavior (e.g., Conway's law).
For me, this makes the MMA idea especially powerful. There are different archetypes for structuring data and data systems with inherent tradeoffs. Instead of fighting about which modeling approach is best, I find it extremely useful to understand which archetypes exist, their tradeoffs and how they affect the behaviors of the system so that we can choose the one best suited for each particular use case.
My 2 cents : when people talk about data modeling, they immediately think "gold layer". For me, data modeling begins with the organization of directories/topics in the raw layer. And I see too little literature on this subject. I would love to read what you think about that.
I love everything about this definition, Joe...well done!
It encompasses both operational and analytical, accommodates both humans and machines, and allows the flexibility to choose the appropriate technique for each situation. I really think this definition is elegant in both its simplicity and its nuance.
Thank you for the well thought out definition, it adds value to data modeling approach for considering both humans and machines. I'm current working on a model for a business domain and here is my layered approach.
Conceptual layer - We are apply DDD techniques to understand business process and identifying data products at conceptual level per domain.
Logical layer - We breakdown conceptual data product further by identifying key entities, data elements with definitions and relationship between the entities within data product boundary. Design patterns like dimensional etc. are applied here.
Physical layer - This is implementation of the logical and focusses on efficient usage of storage, compute, security and extensibility. This layer is key to experience and also should support agility.
In my view, all the three layers serve a purpose to humans to help them understand and use data, physical layer support machine consumption. I would like to get your thoughts on my approach.
MMA 😂 i love it! I didn’t have machines in my definition, but it makes perfect sense.
Do you see training for data modeling a mostly analytics focus even though, as you said, it’s not constrained to it.
Application data models are different, but they’re still models. And data flows between models as applications --> warehouses --> ML models --> AI systems, these days.
As data engineers (speaking for myself) I’m loving learning backend engineering both to support apps and MLOps. The domain and tactical decisions differ, but the modeling practice seems mostly the same, no?
The higher level approach of conceptual modeling shouldn’t change, at least with respect to the various stages where data is used (apps, analytics, ML is something I’m still pondering). The logical and physical modeling approaches will change change depending on the use case
I’d consider ML its own domain given all the intricacies in feature engineering and ML pipelines that feed both data into these feature sets, versioning of them, and storing the ML models themselves and their versioning. If anything is data modeling for machines, I’d argue this domain is particularly focused on it!
For sure. I DO think there’s space for the ML model to tie back to a conceptual model that’s used across non-ML data models. For example if you’re predicting customer churn, what’s a customer? That definition needs to tie back to the vocabulary used in the business. The implementation of ML is certainly different than other approaches, as you say
Hey Joe, thanks for starting this conversation. As someone who comes from a non-data background, has spent only ~5 years in the data space, and has become extremely passionate about simplifying anything that can be simplified (most things can), I love it when folks come together to discuss and define terminology.
Also, incidentally, I just finished chapter 4 of my book which happens to be titled "Deconstructing Data Modeling" and I would love to share my definition here as well:
👉 "An analytical data model represents a set of rules that, once applied to one or more tables in a database, creates a modified view of an existing table or a new table altogether."
I mention "analytical" because I'm also covering "application data modeling" which precedes "analytical data modeling."
I thought it made more sense to strip the purpose from the definition which as as follows:
👉 The purpose of an analytical data model is to transform data without modifying the source data. When a model is executed, the resulting table is easier for humans to read and analyze, as well as for machines to interpret and process.
Does this make sense to you? Do keep in mind that I'm writing for semi-technical folks, many of whom are totally new to the data space. Hence, I want to keep things as simple as I can.
I’m wondering if this deserves a different term than “data model”? The risk of using data model is that it overloads and already well worn term. Thoughts?
Maybe qualify the “type” of data model being done based on usage of the Zachman Framework and/or the “Who”, What”, “When”, “Why” and then “How” we will document the CDM, LDM, and/or PDM data model(s).
Absolutely necessary. Stay tuned....
My thought, let's term it as "Data Synthesis"
I can't help but agree. A model is a representation of something. What are we representing with a "data model"? A business process, concept, behaviour, etc. Not the data itself.
Now you could say that a data model also represents how you implement the data types, relationships, etc. in the database. But to me, in that context a data model is no longer a representation (model), because it's outlining exactly what's implemented.
I think with data modeling, we structure the data to model something else. Data structures is already a meaningful term though.
So I don't know what to say, I think you're on the right track and you'll figure it out, good luck Joe! Looking forward to the book.
Getting more and more excited about the book 🫶 I really like the agnostic nature of the definition between operational and analytical use cases. I think this has been missing.
On "a structured representation that organizes ... data" I also see "a representation of structure" and how it connects to the idea in systems theory and systems thinking that the structure of a system largely determines its behavior (e.g., Conway's law).
For me, this makes the MMA idea especially powerful. There are different archetypes for structuring data and data systems with inherent tradeoffs. Instead of fighting about which modeling approach is best, I find it extremely useful to understand which archetypes exist, their tradeoffs and how they affect the behaviors of the system so that we can choose the one best suited for each particular use case.
Ahhh, I like your representation of structure idea...thanks. Will ponder this
My 2 cents : when people talk about data modeling, they immediately think "gold layer". For me, data modeling begins with the organization of directories/topics in the raw layer. And I see too little literature on this subject. I would love to read what you think about that.
I love everything about this definition, Joe...well done!
It encompasses both operational and analytical, accommodates both humans and machines, and allows the flexibility to choose the appropriate technique for each situation. I really think this definition is elegant in both its simplicity and its nuance.
Thanks Dan! Much appreciated and see you soon
Thank you for the well thought out definition, it adds value to data modeling approach for considering both humans and machines. I'm current working on a model for a business domain and here is my layered approach.
Conceptual layer - We are apply DDD techniques to understand business process and identifying data products at conceptual level per domain.
Logical layer - We breakdown conceptual data product further by identifying key entities, data elements with definitions and relationship between the entities within data product boundary. Design patterns like dimensional etc. are applied here.
Physical layer - This is implementation of the logical and focusses on efficient usage of storage, compute, security and extensibility. This layer is key to experience and also should support agility.
In my view, all the three layers serve a purpose to humans to help them understand and use data, physical layer support machine consumption. I would like to get your thoughts on my approach.
I need examples on how data model:
1) enable and guide human and machine behavior;
2) inform decision-making;
3) facilitate actions.
I asked ChatGPT 3.5, but the results do not make sense to me.
I moved the question here.
What's your timeline for the book, Joe?
Hoping to have it done this summer. Should’ve been done by now, but I’m also in the middle of creating my new data engineering course:
New data engineering course?
Yes, with deeplearning.ai
Nice! Look forward to that.
Also, appreciate that you have included the machines in your definition of data model. Thanks for that! :)
MMA 😂 i love it! I didn’t have machines in my definition, but it makes perfect sense.
Do you see training for data modeling a mostly analytics focus even though, as you said, it’s not constrained to it.
Application data models are different, but they’re still models. And data flows between models as applications --> warehouses --> ML models --> AI systems, these days.
As data engineers (speaking for myself) I’m loving learning backend engineering both to support apps and MLOps. The domain and tactical decisions differ, but the modeling practice seems mostly the same, no?
The higher level approach of conceptual modeling shouldn’t change, at least with respect to the various stages where data is used (apps, analytics, ML is something I’m still pondering). The logical and physical modeling approaches will change change depending on the use case
I’d consider ML its own domain given all the intricacies in feature engineering and ML pipelines that feed both data into these feature sets, versioning of them, and storing the ML models themselves and their versioning. If anything is data modeling for machines, I’d argue this domain is particularly focused on it!
For sure. I DO think there’s space for the ML model to tie back to a conceptual model that’s used across non-ML data models. For example if you’re predicting customer churn, what’s a customer? That definition needs to tie back to the vocabulary used in the business. The implementation of ML is certainly different than other approaches, as you say
Hey Joe, thanks for starting this conversation. As someone who comes from a non-data background, has spent only ~5 years in the data space, and has become extremely passionate about simplifying anything that can be simplified (most things can), I love it when folks come together to discuss and define terminology.
Also, incidentally, I just finished chapter 4 of my book which happens to be titled "Deconstructing Data Modeling" and I would love to share my definition here as well:
👉 "An analytical data model represents a set of rules that, once applied to one or more tables in a database, creates a modified view of an existing table or a new table altogether."
I mention "analytical" because I'm also covering "application data modeling" which precedes "analytical data modeling."
I thought it made more sense to strip the purpose from the definition which as as follows:
👉 The purpose of an analytical data model is to transform data without modifying the source data. When a model is executed, the resulting table is easier for humans to read and analyze, as well as for machines to interpret and process.
Does this make sense to you? Do keep in mind that I'm writing for semi-technical folks, many of whom are totally new to the data space. Hence, I want to keep things as simple as I can.