Why Minimal Modeling has no 3-way links
In Minimal Modeling, links are defined to have two anchors (e.g., “User placed an Order”). However, people intuitively think that maybe introducing 3-way links would be natural, and such links would be useful for modeling.
So far, every example that I’ve seen could be modeled using only 2-way links. Moreover, this 2-way modeling modeling is quite natural: you don’t even need to stretch the business definition. Even more interesting, a couple of times we found that the hand-waving 3-way links definition actually mis-models the business domain.
Let’s look at the example. Suppose that we have a game where users could give other users some items as gifts (e.g., magical artifacts). At first glance this sentence seems like a very natural link: “User buys an Item as a gift to another User”, with three anchors.
Defining minimal model
Let’s model this more carefully. First, we have two anchors: User and Item.
Now let’s define a third anchor called Gift (it may not be the perfect name, but it’s good enough for now). Then define three links:
User buys a Gift;
User receives a Gift;
Item is given as a Gift;
Let’s write down the table schema, and then discuss why this model is correct.
CREATE TABLE gifts (
id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
from_user_id INTEGER NOT NULL,
to_user_id INTEGER NOT NULL,
item_id INTEGER NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
INDEX(from_user_id),
INDEX(to_user_id),
INDEX(item_id)
);
Attributes are not involved here, but we can sanity-check the model by providing a draft list of possible attributes:
User anchor has attributes like “name”, “password”, etc.
Item anchor has attributes like “name”, “price”, “damage dealt”, etc.
Gift anchor has no need for attributes, and maybe it never will. Its job is only to record the fact of a gift transaction.
For the visually inclined, here is the graphical schema in arrows.app:
Links are unique
One big reason why this is not a 3-way link is that links implicitly establish uniqueness constraints.
For example, looking at the link “User likes a Post” we can see that a certain user (say with ID=3) can like a certain post (say with ID=20345) only once. (This was never discussed in Minimal Modeling materials explicitly, but it just makes sense.)
Why? Because you want to be able both to establish a link AND break the link. If, hypothetically, a user could like the post more than once, then we need to explain how we could:
Like a post;
Like a post second time;
Remove the first like, leaving the second like intact.
How do we identify the first like to remove it? The obvious tuple (3, 20345) wouldn’t allow you to remove only one link, because it matches both hypothetical likes. You would have to provide some sort of extra ID, and that’s exactly why we would need to introduce the Like anchor.
This hypothetical case is not as strained as it seems. For example, on Goodreads.com user *can* actually read the same book multiple times! You can leave a different review each time, a different rating, and you can provide the “date started” and “date finished” attributes for each reading.
This could be modeled easily:
Book, User, Reading anchors;
The usual attributes for Book and User;
Reading has the following attributes: review, rating, date_started, date_finished;
Link #1: Reading involves Book;
Link #2: User registered a Reading;
Now back to our original problem. User A can give Item X to User B many times. This requires us to distinguish each act of gifting, and that requires us to introduce an ID, and that’s why we have a Gift anchor.
Is the Gift atomic?
When people try to explain why they feel that this should be a 3-way link, they make a common argument: "the entire Gift transaction should be atomic. They say that you must provide all three pieces of data simultaneously: 1) who sent the gift, 2) who received the gift, and 3) what was gifted. They say that the transaction would make no sense if one or more pieces were missing.
I think that this is the common pattern that appears in many scenarios. People are focused on the creation phase and they often disregard the rest of the data life cycle.
It makes perfect business sense to look at each of three links of this gift transaction. I’d say that in a big enough company there would be separate teams interested in each of the three links:
People whose job is to increase the number of users who send gifts;
People whose job is to find new users that could become gift recipients;
People whose job is to invent new items so that users would send more of them to each other;
They each would be analyzing different aspects of the gifting dataset, long after the data was inserted into the table.
Another example: suppose that we implement a system that allows people to apply for mortgage, a visa, or something like that. To do that, they need to fill in the application form that commonly has a lot of data, up to several dozen data fields. We need to store all that data in our database.
All that data would be submitted simultaneously, they won’t let you submit the data piece by piece. But that does not create mutual dependency of those attributes on each other. It’s just there for bureaucratic convenience. Later on, all this data would probably be analyzed from different points of view, for statistical or investigative reasons.
False links
Links in Minimal Modeling are defined using sentences in the natural language, but not every sentence defines a link. In this example we can see how natural language can introduce model confusion, unless you validate the model.
Example: “User sends an Item” and “User receives an Item” look very much like link sentences! Unfortunately, they aren’t links, even though each sentence makes perfect sense.
Moreover, if you begin with the table schema, as specified above:
CREATE TABLE gifts (
id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
from_user_id INTEGER NOT NULL,
to_user_id INTEGER NOT NULL,
item_id INTEGER NOT NULL
);
Then it even makes perfect sense to write SQL queries like this:
SELECT from_user_id, item_id
FROM gifts
WHERE from_user_id = 3
or
SELECT to_user_id, item_id
FROM gifts
WHERE to_user_id = 10
The first query is a list of items gifted by a certain user, and the second query is a list of items received by a certain user.
More on 4NF and 5NF
I have a vague but growing conviction that the use case discussed in this post is very much relevant to our common attempts to understand 4NF and 5NF.
If you’re interested, take a look at the post “5NF: The Missing Use Case” https://datademythed.com/posts/5nf_missing_use_case/ (and the entire blog for that matter). Here is a thread with Barry Johnson, the author of that blog: https://minimalmodeling.substack.com/p/making-sense-of-1nf-pt-ii/comment/17202070. If you have anything to add, I’d be happy to hear that.
The main question is: how is the distinction between 4NF and 5NF useful? Which common modeling mistakes (logical and physical) could be corrected by application of 4NF/5NF rules? Are there other ways to correct the same mistakes? Could we avoid such modeling mistakes by construction?
Next: why does our table schema look like that? How did we arrive at it? How is it relevant to the confusion around the normal forms?