Migrations: Getting rid of anchors is awesome
What is this substack about? Here are the highlights from the first 25 issues.
In the previous posts we discussed how to get rid of attributes and links. You can remove an anchor only when there are no more attributes of that anchor, and there are no links involving it.
Removing an anchor is an important symbolic and procedural step. Physically not much is going to happen when the anchor is removed: all the data has been removed when we removed the attributes. The only information that the anchor keeps by itself is a list of IDs (most often as some sort of an auto-increment primary key column). Most often this column would get deleted together with the last remaining table that stored the attributes of this anchor.
Only if you designed your physical tables as a table per attribute, then you would have a single-column table that stores only IDs (records the existence of real-world phenomena). If that is the case, then you will have to do a pretty straightforward operation, that is already familiar if you’ve read the previous posts:
Remove all mentions of the anchor (including the last remaining table) from the codebase;
Deploy and wait to see if anything breaks;
Rename the physical table to a temporary name (e.g. TO_DROP_Foobar) and see if anything breaks;
After some time, drop the physical table.
More important is the symbolic aspect of dropping anchors. Getting rid of an entire anchor reduces the vocabulary of nouns that you have to keep in mind when working on the system. It means that you get an extra headspace for new nouns, or to better understand the existing concepts. New people will have to learn less during onboarding.
This is also a moment when you have another chance to make sure that no dependencies have been left out. Maybe you’ll find a forgotten dangling attribute that actually belongs to this anchor, or a dangling link.
In a table-centric world one way to miss the link that needs to be removed is to have a 1:N link that is stored as a column of another anchor’s table.
For example, you have Posts that are classified by Topics, and the link is 1:N, that is each post is classified by only a single topic. Most common way to physically model this is to have a topic_id column of the posts table. When you decide to remove the topics entirely, it’s easy to miss this link if you don’t have a comprehensive catalog of anchors, links and attributes.
Anyway, having a dangling, anchor-less attribute or a link is not a big deal, we’ve all seen worse. You’ll just have to remove them later when they are discovered.
Another important activity that you should do during the final anchor clean up is updating and archiving the parts of documentation that mentions this anchor. There was a business reason why this anchor used to exist, and there were probably historical documents that explained the rationale behind it. Even if you do not maintain comprehensive documentation of your system, you may have at least something that could be found by searching for the name of the noun (and its removed attributes). This is the time when you have a chance to do a small clean up of whatever you have, and refresh your understanding of what is still there, documentation-wise.
Of course, this also applies to the code. If you did not properly remove all the code that used the anchor, its attributes and the links, but just disabled it: now is the time.
Sometimes the most efficient way to document some legacy code is to annihilate it.
In the next post we’re going to begin the final, “changing” part of our data migrations journey. The roadmap of the remaining issues:
Pt. VII: changing or migrating attributes;
Pt. VIII: changing or migrating links;
Pt. IX: changing or migrating anchors;
Pt. X: Technical migrations;
Pt. XI: Secondary data migrations.