Migrations: adding links

Aug 10, 2021

What is this substack about? Here is my list of favourite ideas from the first 25 issues.

This is the third post in the “Schema migrations” series. Previously we’ve discussed adding anchors and attributes, and today we can talk about links between the anchors.

Two anchors (or nouns, or entities) could be linked to each other in a certain way. Examples:

Tasks are assigned to Developers;
Items are included in Orders;
Books are written by Authors;
Users could be friends with each other.

Links are defined by a small sentence, with Subject, Object, and Verb[*]. Subject and Object refer to anchors, and Verb[*] allows to distinguish different kinds of relations between the same Subject and Object.

For example, you could have the following links between User and Tweets:

User likes Tweet;
User retweets Tweet;
User bookmarks Tweet.

Each of those links are independent from each other.

Links may have different cardinalities (commonly known as “one-to-many”, “many-to-many”, etc.).

In many cases it’s not clear which of the two anchors is the main one, and we shouldn’t even strive to find that out (later we’ll see that this is one of the problems with a typical OOP approach). We just use two sentences for each link: “User likes Tweet” and “Tweet is liked by User”. That’s why we have an asterisk after the “Verb”: “likes” is a proper verb, but we just don’t want to invent a new word for “is liked by”.

We won’t talk today about links between more than two anchors, nor about attributes attached to the links (“User gave N stars to this Movie”). Both would be topics for separate posts.

Now we can talk about structural dependency. We have the following set of statements:

Links are independent from each other. You can add new links without consideration of existing links.
Links depend on their two anchors: database schema must have both of the anchors defined before a link could be added.
Links do not depend on attributes of the anchors (except for the special ID attribute).

Physical implementation

In a typical relational database we have two ways of storing links. First, we can have a separate link table with two columns: (ID of subject anchor, ID of object anchor). Second, we can use a field in one of the tables that correspond to one of the anchors.

Separate link table is always used when the relationship is many-to-many. You can also use a separate link table for other cardinalities, such as one-to-many.

However, one-to-many relationships are more often modelled physically as the field in the table that corresponds to the anchor from the “many” side.

Example: consider the “Post has Comments” link. We have an anchor called Post, and an anchor called Comment. There are probably two tables: posts and comments. One Post can have many Comments, so the relationship is one-to-many. To store this link, we can add a field called “comments.post_id” (Comment anchor is on the “many” side).

There are clear parallels between possible physical representations of links and attributes (the latter is described in the previous post). If we use the “field in the main table or one of the side tables” approach, we create a physical dependency between our new link and the attributes that are physically stored in the table. Runtime behaviour of the system may change due to the changes in data layout, even if the link itself is not yet used by any of the code in the system.

Using a separate table for the link, even if the link is one-to-many, does not create a physical dependency. (But of course, using separate tables for one-to-many links may feel a bit awkward.)

Further directions

This post is shorter than usual, because links don’t have a lot of specific concerns that arise during schema migration. Most of the interesting stuff arises around attributes, and some of that insight could be directly transferred to links.

However, links have a lot of subtle logical aspects, not related to migrations directly, such as:

Data validation (see e.g.: “Modeling mutual friendship”);
Data dependency between attributes and links (e.g., what if people manager needs to have a certain job title to have subordinates);
Is there a “change” operation for links?
An uncertain status of “one-to-one” and “one-to-zero” relationships;
Three-anchor links;

Those are some topics for the future posts in this substack.

Returning to our schema migration series, the next post in the series is going to cover getting rid of attributes. This is where really interesting stuff starts to happen.

Minimal Modeling

Discussion about this post