Thanks all. I'm not sure if this will be the final version of this section. The challenges for me are:
1) this topic has been written about countless times. I feel like the reader needs a cursory view of the traditional treatment of levels, and provide them references to arguably better resources if they want to pursue this topic in more depth. Entire books have been written on this topic. Also, I feel like these levels were useful in an era that might not exist any longer. The degree to which these levels are used varies a lot, and in some cases, are nonexistent today.
2) The levels chapter will likely end up being 50-60 pages when it's done. That's going to require only keeping what's practical and necessary for the reader to equip them to better model data.
Intresting starting point and nicely follows the CDM-LDM-PDM approach as been preached by many (especially on the IT siude of the spectrum). What I am missingh is the true Business aproach. I think that Dana woiuld tell you that a customer orders products and needs to pay for that order. In her mind that would sound like two different relationships to start with: Customer orders products and Customer pays for order. In the same discussion with Dana you can explain that there need to be a separate Order-Product relationship to avoid repeating Customer-Order for every product (Order is the event/trigger here).
The other relationship, imo, would be Customer-Order-Payment. These are all on the same grain and does right to what hte business is about - a Customer pays the order (Payment is the event / trigger).
If you start with the separation of relationship it might be that you are loosing the buusinees in understanding the model once you go to LDM and PDM.
But these were just my 2 cents, I will always try to be as true to modeling with business as long as I can.
Reminds me of how the dimensional design would “push” the Customer “dimension” down to both the Payments and OrderLines “fact” table. Would repeat the CustomerID, but we would be using a composite id(Customer, Date) in addition to the artificial OrderId. Makes the Customer to Payments & Products Ordered relation explicit. It’s why design is hard. More than one choice.
What I mean is that if I read your story The modeler is asking the questions to Dana (business) but as I am reading it, the modeler is already pushing towards an outcome.
I would think that if you really come from the business you will have 4 entities indeed- with a relationship between Customer and order plus one between Order and product (after the question about multiple products in the order) Plus a relationship between Customer, Order & Payment - because these cannot be separated - The Payment is done by the Customer for a certain Order. In you conceptual model the Payment is done for the Order But who is doing the Payment and if that is the same Customer who also placed the Order .
So a slightly different outcome, but imo a significant difference between business approach and truly business approach. I agree that the PDM the model would be more or less the same although in relational it will have an added Customer-Payment relationship as well.
Love the overall build up, but the CDM conversation reads like the frictionless surface physics assignment to me 😂 I talk to stakeholders, they aren’t talking about many to one relationships! The less technical conversations I’ve noticed are usually spoken in terms of how they interact with the system or data. “I do this …” and “I use it to do that …” and it’s up to me to start putting conceptual structure to the entities they’re thinking about. Getting them to frame the semantics that i can draw up in a bubble chart to model from.
Enjoyed the persona example - kind of reminded me of the "Phoenix Project" book! Which as I am typing this, thinking that could be a good spinoff - The Phoenix Project - The Untold Data Story... :)
There is so much to balance in this chapter, what to include, what to exclude, when to stop. I think you covered it well. Could of done an entire chapter with the first question to Dana - "What is a customer" - LOL.
There is a skill and art into teasing out data requirements. Traditionally, when business analysts had requirement sessions, having data modelers participate worked out well in my experience. As well aided by UI wire frames, reporting needs, and business process discussions.
The importance here is we are creating technical and business metadata that can be consumed by developments tools, data catalogs, lineage ingestion, and data literacy. The perfect time to capture definitions, acronyms, abbreviation and ownership.
Putting on my data governance hat, Mike needs to document the standards (naming, class words, indexes, created at, sorting, etc.) and guidelines - a standards doc is scalable, Mike isn't. I strongly suggest to add the column and table comment into the database via DDL - so the metadata lives with the database - consumable by tools and AI. As a developer, a big bonus to do a show table, and I can see the descriptions! (surprise and delight your consumers). If you are creating a column, you must know the definition of that column! Its the 15 seconds of your time that is the gift that keeps on giving.
As mentioned above, don't envy you balancing content with practicality - a couple of thoughts - Understanding the alternate key / business key is good idea know and enforce for not creating duplicates vs. meaningless ID. Like email or phone for a customer. And also noting reference table options (FK) or constraints for columns like Status or Category ( or with my ADD, stickler for class words, Status_Code or Status_Desc, Category_Cd or Category_Desc.).
Thanks all. I'm not sure if this will be the final version of this section. The challenges for me are:
1) this topic has been written about countless times. I feel like the reader needs a cursory view of the traditional treatment of levels, and provide them references to arguably better resources if they want to pursue this topic in more depth. Entire books have been written on this topic. Also, I feel like these levels were useful in an era that might not exist any longer. The degree to which these levels are used varies a lot, and in some cases, are nonexistent today.
2) The levels chapter will likely end up being 50-60 pages when it's done. That's going to require only keeping what's practical and necessary for the reader to equip them to better model data.
Great comments here and lots to think about.
Intresting starting point and nicely follows the CDM-LDM-PDM approach as been preached by many (especially on the IT siude of the spectrum). What I am missingh is the true Business aproach. I think that Dana woiuld tell you that a customer orders products and needs to pay for that order. In her mind that would sound like two different relationships to start with: Customer orders products and Customer pays for order. In the same discussion with Dana you can explain that there need to be a separate Order-Product relationship to avoid repeating Customer-Order for every product (Order is the event/trigger here).
The other relationship, imo, would be Customer-Order-Payment. These are all on the same grain and does right to what hte business is about - a Customer pays the order (Payment is the event / trigger).
If you start with the separation of relationship it might be that you are loosing the buusinees in understanding the model once you go to LDM and PDM.
But these were just my 2 cents, I will always try to be as true to modeling with business as long as I can.
Reminds me of how the dimensional design would “push” the Customer “dimension” down to both the Payments and OrderLines “fact” table. Would repeat the CustomerID, but we would be using a composite id(Customer, Date) in addition to the artificial OrderId. Makes the Customer to Payments & Products Ordered relation explicit. It’s why design is hard. More than one choice.
Ask 5 data modelers to create a model and get 7 diferent models back.....
“What I am missingh is the true Business aproach.”
I’m confused here…What point are you trying to make?
In the end, the model ends up being the same thing IMO.
What I mean is that if I read your story The modeler is asking the questions to Dana (business) but as I am reading it, the modeler is already pushing towards an outcome.
I would think that if you really come from the business you will have 4 entities indeed- with a relationship between Customer and order plus one between Order and product (after the question about multiple products in the order) Plus a relationship between Customer, Order & Payment - because these cannot be separated - The Payment is done by the Customer for a certain Order. In you conceptual model the Payment is done for the Order But who is doing the Payment and if that is the same Customer who also placed the Order .
So a slightly different outcome, but imo a significant difference between business approach and truly business approach. I agree that the PDM the model would be more or less the same although in relational it will have an added Customer-Payment relationship as well.
yeah, you've got a point about customer-payment. Will add that in.
As far as your comments about the modeler "pushing toward an outcome", that's an... interesting... interpretation. I respectfully disagree.
Love the overall build up, but the CDM conversation reads like the frictionless surface physics assignment to me 😂 I talk to stakeholders, they aren’t talking about many to one relationships! The less technical conversations I’ve noticed are usually spoken in terms of how they interact with the system or data. “I do this …” and “I use it to do that …” and it’s up to me to start putting conceptual structure to the entities they’re thinking about. Getting them to frame the semantics that i can draw up in a bubble chart to model from.
cool, got a note to revisit this example when I go through my edits. Thanks.
Good stuff Joe.
Enjoyed the persona example - kind of reminded me of the "Phoenix Project" book! Which as I am typing this, thinking that could be a good spinoff - The Phoenix Project - The Untold Data Story... :)
There is so much to balance in this chapter, what to include, what to exclude, when to stop. I think you covered it well. Could of done an entire chapter with the first question to Dana - "What is a customer" - LOL.
There is a skill and art into teasing out data requirements. Traditionally, when business analysts had requirement sessions, having data modelers participate worked out well in my experience. As well aided by UI wire frames, reporting needs, and business process discussions.
The importance here is we are creating technical and business metadata that can be consumed by developments tools, data catalogs, lineage ingestion, and data literacy. The perfect time to capture definitions, acronyms, abbreviation and ownership.
Putting on my data governance hat, Mike needs to document the standards (naming, class words, indexes, created at, sorting, etc.) and guidelines - a standards doc is scalable, Mike isn't. I strongly suggest to add the column and table comment into the database via DDL - so the metadata lives with the database - consumable by tools and AI. As a developer, a big bonus to do a show table, and I can see the descriptions! (surprise and delight your consumers). If you are creating a column, you must know the definition of that column! Its the 15 seconds of your time that is the gift that keeps on giving.
As mentioned above, don't envy you balancing content with practicality - a couple of thoughts - Understanding the alternate key / business key is good idea know and enforce for not creating duplicates vs. meaningless ID. Like email or phone for a customer. And also noting reference table options (FK) or constraints for columns like Status or Category ( or with my ADD, stickler for class words, Status_Code or Status_Desc, Category_Cd or Category_Desc.).
Thanks Joe - Good Read.