From the 3NF example: "Category depends on the ProductID, and can be split into its own table."
I absolutely get that this needs to happen, but I don't understand how this is described by the 3NF rule ("no non-key attribute is transitively dependent on the primary key") - is Category transitively dependent on the ProductID? It seems like Category is functionally dependent on the Product ID, no? Is there a 3NF rule that helps to describe this more clearly?
I have a few questions on the 3NF and BCNF examples.
1. Is Supplier_Capabilities part of the 3NF decomposition of Suppliers? There are attributes here that were not in the original.
2. Similarly, in Supplier_Shipping_Capabilities there's a new attribute, ShippingCapabilityID that is new.
3. I'm wondering if having a range (2-3) for EstimatedDays does not contradict the 1NF indivisibility.
Do you intend to include the decomposition algorithms? Or, perhaps, elaborate more on how the decomposition was obtained?
Here's a tip for checking the decomposition: performing the union of the attributes of the decomposed relations should produce the set of attributes of the original relation.
I don't agree with transitive dependencies as a separate category - but knowing the theory I understand why one can argue that they are a separate subcategory. I think you should include the Armstrong's axioms, which transitivity is part of. These 3 axioms govern how we can infer other FDs, and they are a sound and complete set of rules. Based on these axioms we can compute the closure of a set of FDs.
I also want to caution on the language on the definition "Dependencies are *often* denoted as X -> Y, where X is a set of attributes that *might* determine the value of Y. ":
1. they are denoted (remove the "often" - that's the notation), and
It's true that transitive dependencies are also functional dependencies. I'm trying to balance what people need to know in the field against endless axiomatic examples (trust me, I've seen Armstrong and many others) that scare people from learning relational modeling.
In section https://practicaldatamodeling.substack.com/i/154215972/boyce-codd-normal-form-bcnf, I don't see why the Supplier Name and Supplier Phone tables are necessary for BCNF. I'd leave them with Supplier table. That sort of break out looks more like 5NF to me, with the binary type relations. A bridge too far, in my opinion. But I'm certainly not an expert. I just try to read (and re-read C. J. Date) :)
That is the challenge and I see as a core issue. The PDM is actually not what describes the data from a business perspective. NoSQL, Streaming are technical solutions to C-R-U-D the business data.
This will be a tough thing to change. Engineers either don’t acknowledge data modeling at all, or think they do it when they cram data into a stream. Definitely a huge motivator for this book.
Nice article, Joe. It might be worth mentioning that 6NF is also known as Graph Normal Form, as there are emerging graph-relational tech companies using this NF that readers might come across.
I think relational always seems to win out, and the RM + SQL will adapt in the future to absorb new ideas just as it has in the past (e.g. JSON and Property Graph Query in SQL 16 and 23). So I think SQL systems could evolve to incorporate graph features, instead of them being separate database systems as they are now, meaning this could be come much more prevalent. I have no idea just how common that might be though!
As per the explanation of transitive dependency the columns ‘ProductId’, ‘ProductName’, ‘Price’ are transitively dependent right in terms that if we know the Id and name we can determine the price?
Hi Joe. Great content on part 2. Remember that normalization is started within an LDM and not a PDM. So columns do not exist. Attributes do. As an example, I can design and document all the reference entities (“Code Tables) and show all the relationships in a 3NF view. At the physical level, I can create one “Code Table” table to capture and manage all the data within the many logical reference entities. This is not 3NF anymore. I believe this is what is continually skipped and jumping to the end result of a PDM / physical database implementation is a critical issue we are facing. “Transformation” of the LDM into a PDM is not always a one to one view. Components of LDM can be implemented in various normalization levels depending on various constraints and implementation decisions.
That said, like the relational model, CDM and LDM are dying arts as well. Most people start and end with PDM. Much of what I cover in this book acknowledges this reality, and hopefully educates people on the broader implications of only focusing on PDM. But many of the upcoming examples and chapters (NoSQL, Streaming) recognize that there are certain physical systems that exist for a reason. Sadly, part of this book is about meeting people where they are, which is mostly at the physical layer these days.
From the 3NF example: "Category depends on the ProductID, and can be split into its own table."
I absolutely get that this needs to happen, but I don't understand how this is described by the 3NF rule ("no non-key attribute is transitively dependent on the primary key") - is Category transitively dependent on the ProductID? It seems like Category is functionally dependent on the Product ID, no? Is there a 3NF rule that helps to describe this more clearly?
thanks, let me have a look. This chapter likely needs a lot of clarification in quite a few areas.
I have a few questions on the 3NF and BCNF examples.
1. Is Supplier_Capabilities part of the 3NF decomposition of Suppliers? There are attributes here that were not in the original.
2. Similarly, in Supplier_Shipping_Capabilities there's a new attribute, ShippingCapabilityID that is new.
3. I'm wondering if having a range (2-3) for EstimatedDays does not contradict the 1NF indivisibility.
Do you intend to include the decomposition algorithms? Or, perhaps, elaborate more on how the decomposition was obtained?
Here's a tip for checking the decomposition: performing the union of the attributes of the decomposed relations should produce the set of attributes of the original relation.
Good questions. Let me have a look this week. Thanks Ramona
I don't agree with transitive dependencies as a separate category - but knowing the theory I understand why one can argue that they are a separate subcategory. I think you should include the Armstrong's axioms, which transitivity is part of. These 3 axioms govern how we can infer other FDs, and they are a sound and complete set of rules. Based on these axioms we can compute the closure of a set of FDs.
I also want to caution on the language on the definition "Dependencies are *often* denoted as X -> Y, where X is a set of attributes that *might* determine the value of Y. ":
1. they are denoted (remove the "often" - that's the notation), and
2. it does determine (remove the "might").
More to come as I read.
It's true that transitive dependencies are also functional dependencies. I'm trying to balance what people need to know in the field against endless axiomatic examples (trust me, I've seen Armstrong and many others) that scare people from learning relational modeling.
This is the "if you're gonna fight combat or MMA, this is what you need to know", versus "if you want to get a phd in this topic."
Good point - providing the min set of info, I want to call it the candidate info as a reference to candidate key :)
No doubt. It makes you wonder, isn't it? When things are done by "ear" rather than following a systematic set of rules...
it's mostly by ear today. Let's make it better
Also, for https://practicaldatamodeling.substack.com/i/154215972/third-normal-form-nf, if the Suppliers table is in 3NF, it's not the best design perhaps? Not sure what the key is?, but should be on SupplierID for 3NF? Hope that helps, not trying to be too critical. :)
actually, the suppliers table should be split further. There's a dependency I overlooked. Thanks.
In section https://practicaldatamodeling.substack.com/i/154215972/boyce-codd-normal-form-bcnf, I don't see why the Supplier Name and Supplier Phone tables are necessary for BCNF. I'd leave them with Supplier table. That sort of break out looks more like 5NF to me, with the binary type relations. A bridge too far, in my opinion. But I'm certainly not an expert. I just try to read (and re-read C. J. Date) :)
yeah, agreed. They can be combined and be bcnf compliant. Thanks
and actually, shipping zones can also be further fixed, as it's not bcnf as is
"Normal Forms Are Hierarchical"
I wouldn't say it like this.
For me, a hierarchy includes some many-to-one relationships.
I would not say that any NF is a "parent" of another one in a hierarchy.
What I would say is that a NF is a predecessor of another one in the sequence / chain.
So: "Normal Forms are Sequential" (from "weaker" to "stronger") instead of "Hierarchical".
"Normal Forms are Sequential" (from "weaker" to "stronger") instead of "Hierarchical".
That's literally a hierarchy you just described...
Also, if you read the historical record of the relational model, the progression of 1NF to any other form is deliberately called a hierarchy.
That is the challenge and I see as a core issue. The PDM is actually not what describes the data from a business perspective. NoSQL, Streaming are technical solutions to C-R-U-D the business data.
This will be a tough thing to change. Engineers either don’t acknowledge data modeling at all, or think they do it when they cram data into a stream. Definitely a huge motivator for this book.
Nice article, Joe. It might be worth mentioning that 6NF is also known as Graph Normal Form, as there are emerging graph-relational tech companies using this NF that readers might come across.
How common do you think thjs will be in 5 years?
I think relational always seems to win out, and the RM + SQL will adapt in the future to absorb new ideas just as it has in the past (e.g. JSON and Property Graph Query in SQL 16 and 23). So I think SQL systems could evolve to incorporate graph features, instead of them being separate database systems as they are now, meaning this could be come much more prevalent. I have no idea just how common that might be though!
Good call
Hi Joe,
I think you linked the "SQL and Relational Theory" to one of your other articles rather than to the Date's book link. Great article. Thanks!
Thanks Marco. Fixed.
Hi Joe,
As per the explanation of transitive dependency the columns ‘ProductId’, ‘ProductName’, ‘Price’ are transitively dependent right in terms that if we know the Id and name we can determine the price?
Hi Joe. Great content on part 2. Remember that normalization is started within an LDM and not a PDM. So columns do not exist. Attributes do. As an example, I can design and document all the reference entities (“Code Tables) and show all the relationships in a 3NF view. At the physical level, I can create one “Code Table” table to capture and manage all the data within the many logical reference entities. This is not 3NF anymore. I believe this is what is continually skipped and jumping to the end result of a PDM / physical database implementation is a critical issue we are facing. “Transformation” of the LDM into a PDM is not always a one to one view. Components of LDM can be implemented in various normalization levels depending on various constraints and implementation decisions.
That said, like the relational model, CDM and LDM are dying arts as well. Most people start and end with PDM. Much of what I cover in this book acknowledges this reality, and hopefully educates people on the broader implications of only focusing on PDM. But many of the upcoming examples and chapters (NoSQL, Streaming) recognize that there are certain physical systems that exist for a reason. Sadly, part of this book is about meeting people where they are, which is mostly at the physical layer these days.
Thanks and great call-out. This will be covered in the CDM/LDM/PDM section, which should be up around next month.