"Bumped into" your writings recently, and was particularly intrigued by your reference in "Making Sense of 1NF, pt. II" to the archival piece by William Kent ("A Simple Guide to Five Normal Forms ..."). Alas, the discussion of 5NF again disappointed me. As it happened, I was finalising a new addition to my writings about this, which I've now posted ("5NF: The Missing use Case") at DataDemythed.com. In case you'd care to read it for a different perspective.
thank you! I've skimmed it a bit, and I'm going to read it more thoroughly and respond.
One question: did you read Hugh Darwen "An Introduction To Relational Database Theory"?
In the "Note to Teachers" he says: "In Chapter 7, on projection-join normalization, omitting details of normal forms that were defined in the early days but no longer seem useful, leaving just ONE, 5NE, and BCNE. 2NF and 3NF are subsumed by the simpler BCNE, 4NF by the simpler 5NF. 1NF, not being a projection-join normal form, is dealt with (sort of) in Chapter 8. Domain-key normal form (DKNF) serves little purpose in practice and is not mentioned at all."
This sounds very promising.
Correspondingly, the book has entire "Fifth Normal Form" section.
Didn't see Hugh's contribution. And I thought DKNF reflects a misunderstanding of the other normal forms. Particularly 5. :-) Be interested to see what you have to say about my thoughts in the context of Hugh's. Not sure that I agree with the bits about the other forms, either. Perhaps why I think that will become clearer when I finally get to writing the next 3 posts.
Oh, right, you do discuss the subject/course/teacher case.
> "The “textbook treatment"s, relying simply on data model diagrams, do not communicate this information in the depth required. They don’t define the components to allow us to understand the author’s assertion of 4NF and 5NF in each case. This is particularly a problem because readers are unlikely to assume the same definitions as the author when trying to understand the 4NF and 5NF assertions."
I completely agree with you. If you just show nouns (entities), connected with *unlabeled* crow's foot lines, you can't understand the data, and you can't talk about normal forms (because you don't know about dependencies).
Just as an Entity name can't be relied on without referring to it's more detailed definition, neither can a label on a relationship line. Diagrams can *start* our comprehension of the data and the model we have built up around that. But there's only so much we can put into a diagram. Particularly when it gets past the trivial (i.e., unlike my examples), which all practical models do. Ultimately we always have to refer to more detailed definitional information. Generally that's as narrative text, which is always fraught with nuance and vagaries. Few us are trained to read and write it with the precision of mathematical convention.
I thought the article fairly typical for its time. You can read that as not doing 4NF the "justice" I think it deserves.
One thing that caused a serious mental block I had to struggle to read past was on the second page, second column:
> "... does not contain any nontrivial multivalued dependencies."
Normalisation is based on mathematical foundations (Set Theory). I don't know how "nontrivial" can be mathematically defined. To be 4NF, there are no multivalued dependencies based on the data definitions. Simple as that.
I've read your text and I'm still confused by the entire discourse. It seems that I misunderstand some driving force behind this. I think that maybe I even understand why we may want to discuss "algebra" behind the relationship between key and non-key attributes.
But the pragmatics eludes me.
I'm going to ask a few questions if you don't mind.
"Something like this would simply be presented as the only possible transformation to take it to 5NF"
WHY? Why do we want to do this transformation? Why do we even consider that this transformation is allowed?
In my understanding, ERD is used to capture the actual business logic. Then we encode this business logic as physical tables.
It looks to me that people go the other way around: take physical tables, interpret them in terms of normal forms and then somehow "transform" them.
Particularly: and this is my main objection. In your text, the following new piece of business logic is introduced:
Skill{ Skill#, Instrument#, Musician#, … }
Why was it introduced ***without changing the table schema***? This does not make sense to me.
Stepping back from the tables, what does "skill" capture? For each musician, I want to record which instruments they play. I do that because I want to e.g. find people for my next performance: people who are on the market to play a certain instrument.
This is a new business requirement, not related at all to the original "took part in a performance" business case.
That's my main objection. I can take part in the concert, but play the instrument that I don't normally play. E.g. the percussionist was sick, but I only needed to play the triangle a few times, so I agreed to help. I get credited in the performance, I get my money for that. That does not mean that I have "a skill to play the triangle". Particularly, that does not mean that people looking for triangle-players should see me in their queries.
I think that I can provide a similar reasoning for other examples of 5NF, such as brand/model salesman from wikipedia, or subject/lecturer/semester as on some other pages that you get by randomly googling.
- Perhaps in part it was all relatively new, and people had yet to think it through in sufficient depth.
- Perhaps some of it was mimicry: a notable textbook at the time had presented it that way, and others simply copied/adapted it.
- Perhaps there was some laziness: "if I have a 4NF 3-part key of `{ a, b, c }` then the obvious transform to normalize it would be `{ a, b }, { a, c }, {b, c }` but that doesn't guarantee I can get back to the original so the original must be 5NF as well." This misses the alternatives, of which I tried to explore 3 possibilities (1 in the first example, 2 more in the second).
Things may have changed in the exploration of the topic since, but I've given up looking for it.
And it's something that has irked me for many years, so I'm finally getting it off my proverbial chest ...
"... ERD is used to capture the business logic ...": assuming by "business logic" you mean how a business transacts, I have to disagree. ERD captures the business **state** at any point in time ... actually, the succession of business states (i.e., history) over time when done properly. Data at rest has no "dynamics", it "just is." Code represents the business logic, controlling the transition of persisted data (i.e., updating the database) from "current" state to "next" state as business transactions occur.
Normalization comes from identifying the data needed to properly keep records for the business, and then analysing the relationships between the various pieces of data to establish data groupings and the relationships between them. It is only after this has been done that we map it to tables. Except after a system is up and running and we have to retrofit new requirements, which might be a problem depending on how well we normalised in the first place ...
Why "Skill"? Having identified the Instrument-Musician relationship as part of my imagined discussions to further develop the data definitions, I also imagined a "value judgement" in that relationship: a Musician will not be tasked to play a violin if they have no skill/training/experience/ability to do that. But I omitted that from my definitional documentation. And you, as reader, didn't think it through that way. Thank you for reinforcing one of my points (https://datademythed.com/posts/5nf_missing_use_case/#ReaderVAuthor)! Of course, I had it in mind to segue the 2NF discussion.
Not sure what you mean by changing, or not, the "table schema." I certainly changed the model schema. Tables (i.e., implementation details) aren't part of my thinking at this point.
Meant to note that in reading the original formulation, which came with *no* definitions, I also couldn't conceive how either the 4NF or 5NF could be defined for the example to make sense. I had the same problem there as you had here in not being able, as the reader, to "fill the gaps" the way the writer (presumably) intended.
"Bumped into" your writings recently, and was particularly intrigued by your reference in "Making Sense of 1NF, pt. II" to the archival piece by William Kent ("A Simple Guide to Five Normal Forms ..."). Alas, the discussion of 5NF again disappointed me. As it happened, I was finalising a new addition to my writings about this, which I've now posted ("5NF: The Missing use Case") at DataDemythed.com. In case you'd care to read it for a different perspective.
hey,
thank you! I've skimmed it a bit, and I'm going to read it more thoroughly and respond.
One question: did you read Hugh Darwen "An Introduction To Relational Database Theory"?
In the "Note to Teachers" he says: "In Chapter 7, on projection-join normalization, omitting details of normal forms that were defined in the early days but no longer seem useful, leaving just ONE, 5NE, and BCNE. 2NF and 3NF are subsumed by the simpler BCNE, 4NF by the simpler 5NF. 1NF, not being a projection-join normal form, is dealt with (sort of) in Chapter 8. Domain-key normal form (DKNF) serves little purpose in practice and is not mentioned at all."
This sounds very promising.
Correspondingly, the book has entire "Fifth Normal Form" section.
Didn't see Hugh's contribution. And I thought DKNF reflects a misunderstanding of the other normal forms. Particularly 5. :-) Be interested to see what you have to say about my thoughts in the context of Hugh's. Not sure that I agree with the bits about the other forms, either. Perhaps why I think that will become clearer when I finally get to writing the next 3 posts.
Thank you for your response.
Oh, right, you do discuss the subject/course/teacher case.
> "The “textbook treatment"s, relying simply on data model diagrams, do not communicate this information in the depth required. They don’t define the components to allow us to understand the author’s assertion of 4NF and 5NF in each case. This is particularly a problem because readers are unlikely to assume the same definitions as the author when trying to understand the 4NF and 5NF assertions."
I completely agree with you. If you just show nouns (entities), connected with *unlabeled* crow's foot lines, you can't understand the data, and you can't talk about normal forms (because you don't know about dependencies).
Just as an Entity name can't be relied on without referring to it's more detailed definition, neither can a label on a relationship line. Diagrams can *start* our comprehension of the data and the model we have built up around that. But there's only so much we can put into a diagram. Particularly when it gets past the trivial (i.e., unlike my examples), which all practical models do. Ultimately we always have to refer to more detailed definitional information. Generally that's as narrative text, which is always fraught with nuance and vagaries. Few us are trained to read and write it with the precision of mathematical convention.
Btw there is another enigmatic article from 1992: https://dl.acm.org/doi/pdf/10.1145/135250.134515 Margaret S. Wu "The Practical Need for Fourth Normal Form".
I thought the article fairly typical for its time. You can read that as not doing 4NF the "justice" I think it deserves.
One thing that caused a serious mental block I had to struggle to read past was on the second page, second column:
> "... does not contain any nontrivial multivalued dependencies."
Normalisation is based on mathematical foundations (Set Theory). I don't know how "nontrivial" can be mathematically defined. To be 4NF, there are no multivalued dependencies based on the data definitions. Simple as that.
I've read your text and I'm still confused by the entire discourse. It seems that I misunderstand some driving force behind this. I think that maybe I even understand why we may want to discuss "algebra" behind the relationship between key and non-key attributes.
But the pragmatics eludes me.
I'm going to ask a few questions if you don't mind.
"Something like this would simply be presented as the only possible transformation to take it to 5NF"
WHY? Why do we want to do this transformation? Why do we even consider that this transformation is allowed?
In my understanding, ERD is used to capture the actual business logic. Then we encode this business logic as physical tables.
It looks to me that people go the other way around: take physical tables, interpret them in terms of normal forms and then somehow "transform" them.
Particularly: and this is my main objection. In your text, the following new piece of business logic is introduced:
Skill{ Skill#, Instrument#, Musician#, … }
Why was it introduced ***without changing the table schema***? This does not make sense to me.
Stepping back from the tables, what does "skill" capture? For each musician, I want to record which instruments they play. I do that because I want to e.g. find people for my next performance: people who are on the market to play a certain instrument.
This is a new business requirement, not related at all to the original "took part in a performance" business case.
That's my main objection. I can take part in the concert, but play the instrument that I don't normally play. E.g. the percussionist was sick, but I only needed to play the triangle a few times, so I agreed to help. I get credited in the performance, I get my money for that. That does not mean that I have "a skill to play the triangle". Particularly, that does not mean that people looking for triangle-players should see me in their queries.
I think that I can provide a similar reasoning for other examples of 5NF, such as brand/model salesman from wikipedia, or subject/lecturer/semester as on some other pages that you get by randomly googling.
Thanks,
I'm happy for any questions you care to ask. Unfortunately, other matters have kept me from responding sooner. My apologies for that.
Why was the transformation from the first (https://datademythed.com/posts/5nf_missing_use_case/#d1) to the second diagrams (https://datademythed.com/posts/5nf_missing_use_case/#d2) in my article the only thing considered in the early days of textbook treatments? I can only guess.
- Perhaps in part it was all relatively new, and people had yet to think it through in sufficient depth.
- Perhaps some of it was mimicry: a notable textbook at the time had presented it that way, and others simply copied/adapted it.
- Perhaps there was some laziness: "if I have a 4NF 3-part key of `{ a, b, c }` then the obvious transform to normalize it would be `{ a, b }, { a, c }, {b, c }` but that doesn't guarantee I can get back to the original so the original must be 5NF as well." This misses the alternatives, of which I tried to explore 3 possibilities (1 in the first example, 2 more in the second).
Things may have changed in the exploration of the topic since, but I've given up looking for it.
And it's something that has irked me for many years, so I'm finally getting it off my proverbial chest ...
"... ERD is used to capture the business logic ...": assuming by "business logic" you mean how a business transacts, I have to disagree. ERD captures the business **state** at any point in time ... actually, the succession of business states (i.e., history) over time when done properly. Data at rest has no "dynamics", it "just is." Code represents the business logic, controlling the transition of persisted data (i.e., updating the database) from "current" state to "next" state as business transactions occur.
Normalization comes from identifying the data needed to properly keep records for the business, and then analysing the relationships between the various pieces of data to establish data groupings and the relationships between them. It is only after this has been done that we map it to tables. Except after a system is up and running and we have to retrofit new requirements, which might be a problem depending on how well we normalised in the first place ...
Why "Skill"? Having identified the Instrument-Musician relationship as part of my imagined discussions to further develop the data definitions, I also imagined a "value judgement" in that relationship: a Musician will not be tasked to play a violin if they have no skill/training/experience/ability to do that. But I omitted that from my definitional documentation. And you, as reader, didn't think it through that way. Thank you for reinforcing one of my points (https://datademythed.com/posts/5nf_missing_use_case/#ReaderVAuthor)! Of course, I had it in mind to segue the 2NF discussion.
Not sure what you mean by changing, or not, the "table schema." I certainly changed the model schema. Tables (i.e., implementation details) aren't part of my thinking at this point.
Meant to note that in reading the original formulation, which came with *no* definitions, I also couldn't conceive how either the 4NF or 5NF could be defined for the example to make sense. I had the same problem there as you had here in not being able, as the reader, to "fill the gaps" the way the writer (presumably) intended.
Hello, how are you doing?
I have a new text on that topic: https://minimalmodeling.substack.com/p/historically-4nf-explanations-are I'd be happy to hear your perspective on this.
Also, I found a great explanation of how hierarchical database context may be crucial for understanding this way of presenting: https://www.cargocultcode.com/normalization-is-not-a-process/