More on false links
In the previous post, Why Minimal Modeling has no 3-way links, we introduced the idea of false links.
Let’s take a look again at the graphical schema of our “gifts” example:
And the corresponding physical table structure:
CREATE TABLE gifts (
id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
from_user_id INTEGER NOT NULL,
to_user_id INTEGER NOT NULL,
item_id INTEGER NOT NULL
);
Q: How did we arrive from the links definition to this particular table structure? A: All three links have 1:N cardinality: a gift can be given to only one user, can be received by only one user, and can contain only one item. Usually such links are implemented as columns in the main anchor’s table (gifts
in our case).
Now, what if we’re looking at the physical table structure, and we don’t have a clear understanding of the underlying logical structure?
To define links in Minimal Modeling, we spell out the sentences that have a specific structure: anchor, verb, anchor. For our example, it would be:
User buys a Gift;
User receives a Gift;
Item is given as a Gift;
We introduced the idea of false links: sentences that make sense but do not correspond to actual links. For example, “User receives item”. This sentence makes a lot of sense! You can even write the SQL query corresponding to this sentence:
SELECT to_user_id, item_id
FROM gifts;
Modeling musical performances
Why bother? I keep digging into this topic because it seems that this approach would allow us to find practical examples of fourth and fifth normal forms (4NF/5NF), and maybe help prevent modeling mistakes.
Particularly, in the already mentioned text https://datademythed.com/posts/5nf_missing_use_case/ there is a mention of “Skill” which I think is somewhat unexplained. Let’s try to reinterpret the example from this post using Minimal Modeling.
We have the following anchors:
Concert;
Instrument;
Musician;
We need to define more accurately what “Concert” is, because:
We’ll also be using the word “performance” which is somewhat synonymous;
The word “concert” has another meaning, such as: “The Violin Concerto in D major, Op. 61”.
Our Concert anchor corresponds to musical events that happen at a certain place (e.g., “Muziekgebouw, Amsterdam”) at a certain time (e.g., “2023-07-01 20:00”). When the new concert is announced, we insert a row in the table “concerts
”.
Several musicians take part in the concert. (NB: this is an example of a sentence that is NOT a link.) Scratch that. We need to introduce another anchor, let’s call it Performance, following the original post.
Now we can define the links (they may sound awkward):
Performance is part of the Concert;
Performance is played using Instrument;
Performance is performed by the Musician.
To double-check, let’s write down the other direction of the same links:
Concert consists of many Performances;
Instrument is played during Performance;
Musician performs the Performance;
Here is the graphical schema:
It looks pretty much like the “gifts” example. As we mentioned, some of the natural sentences are actually false links:
Musician gives a concert is not a link;
Musician plays instrument is also not a link;
(It would be interesting to find the third sentence, the one that would mention concert and instrument.)
Let’s talk a bit about this data in business terms. What do we record here exactly? Basically, we want to pay people according to their contribution. Say, you get $100 for your part in the concert (you come on time, you play your instrument well, you get money). You can play two or more instruments during the concert. How much you’re going to get paid for that is another matter, let’s assume that it’s just $100 times the number of instruments you’ve played. If the CFO wants to know how much we owe to the musicians, they can run the query: “SELECT 100 * COUNT(*) FROM performances WHERE concert_id = ?
”.
One other reason why we want to record this is because we want to acknowledge everyone’s participation (that’s why we record all the instruments that each musician played). This data would be used to generate the text on the cover of the vinyl record, and submitted to the royalties tracking system.
Now we can re-read the section “Traditional Treatment of 5NF” of the original post and see the following snippet:
Something like this would simply be presented as the only possible transformation to take it to 5NF:
[...]
Appearance{ Appearance#, Concert#, Musician#, … }
Inclusion{ Inclusion#, Concert#, Instrument#, … }
Skill{ Skill#, Instrument#, Musician#, … }
Now we can see that those three are, in our terminology, false links. I’m especially worried about the “Skill” part. Where did it come from? Actually, I have a theory about that:
Skill is an idea from a different business process.
Deconstructing Skill
Let’s quote a bit more from the original post:
Looking at our example a bit further, for instance, it’s easy to conceive of a SkillRating in
[Skill]
. This could be used to help decide which[Musician]
instance(s), with corresponding[Instrument]
instance(s), will be invited to be part of a[Concert]
instance.> [...]
And it, perhaps, shouldn’t be surprising that the
[Instrument]-[Musician]
relationship is pulled out. The mastery of an instrument by a musician will (typically) start long before a performance opportunity is conceived and invitations to musicians are made. I like the way Normalization uses the definitions to seemingly anticipate aspects, and uses, of the data that we might not yet see.
This makes perfect sense in the natural language. But this is not at all what’s going on in our example so far.
Intuitively, it seems plausible that if a person played some instrument during a performance, then they have some skill with this instrument. Later on, we can use this information about skills to find musicians for our next concert, using this data for some sort of marketplace. But this is a different relation, a different link.
Let’s introduce the link that directly handles this business case:
Musician is available to play Instrument.
So, the musician registers on our music marketplace, and adds the instruments they’re willing to play for money. From the other side of the market, ensemble managers can sign up at our music marketplace, choose the instruments they are interested in, and browse the profiles of musicians.
This has nothing to do with who played what at the concerts. Even though supposedly, people who were found on that marketplace and enlisted for the performance, will also be added to our “musical performances” tables, like our intuition suggests.
There are two scenarios that help to come to terms with that. First, somebody can play the instrument during a concert, but that does not mean that they actually want to play that instrument (meaning that they don’t want to be found on the marketplace). It could be a one-off thing because you were covering for a sick colleague; it could be a friendly jam situation, or a joke.
Second, you can have an actual skill in the instrument, and be willing to play it (to be found on the marketplace), but you have no “job history” in the “musical performances” tables. Maybe it’s your first music job. You will have a record in the marketplace database, but not anywhere else (yet).
I’m more and more certain that this exact pattern:
three anchors linked to the fourth anchor as ⪫,
together with the false link,
that actually is a valid link in a different area of the problem domain,
is the key to this confusion. In a followup post we’re going to discuss another modeling use case: training an ML model to recognize Lego blocks in the photos.