Migrations: migrating attributes, pt. V
Welcome to Part V of the “Migrating attributes” series. We’ve reached the end of the 10-step procedure of attribute migration (reproduced below for convenience), and today we’re going to discuss steps 6-10.
Prepare the new physical storage (e.g. create a table column);
Implement cross-checking tool that compares old and new storage;
Make application code that sets attribute values to double-write to both old and new storage;
Implement the tool for bulk incremental sync of attribute values from old to new storage;
Make application code that reads attribute values to read from both old and new storage, with the following substeps:
compare the results and use values from the old storage;
compare the results and use values from the new storage;
Make application code to no longer read the old storage;
Prepare old storage for stopping writing to it;
Clean up a) cross-checking tool; b) sync tool;
Make application code stop writing to the old storage [this is the point of no return];
Get rid of the old storage.
Step 6. Make application code to no longer read the old storage
At this step we’ve been running our system long enough to ensure that old and new storage are equivalent. What happens now is that we completely remove the code that reads from the old storage, test it and deploy.
The main consequence of this is that it becomes harder to roll back from this situation, but that’s fine: we’re going to also stop writing to the old storage (which would be the point of no return) really soon.
If we used double-reading then at this point our data access should also become faster.
Step 7. Prepare old storage for stopping writing to it
Before we could stop writing to the old storage, sometimes we need to make some adjustments.
For example, suppose that we had the users table with the required full_name column. We want to migrate this column to a new table called users_pii. We did all the steps discussed above, and now we want to stop mentioning users.full_name anywhere in our codebase. If we just remove this field from the INSERT statement, we’re going to get SQL error, because there would be no value for the required column.
Thus, we need to mark users.full_name as null-able (or provide default value, such as empty string). So, we need to execute something like:
ALTER TABLE users
MODIFY COLUMN full_name VARCHAR(255) NOT NULL DEFAULT ““;
This step is not always needed: just don’t forget to verify and test it.
Step 8. Clean up cross-checking tool and sync tool
We’re about to stop writing to the old storage. At this point the cross-checking tool becomes useless (it’s going to report more and more discrepancies, as expected).
But the sync tool, on the other hand, becomes really dangerous. If somebody accidentally runs it after step 9 is implemented, it’s going to propagate the missing value from the old storage to the new storage, effectively clobbering data.
So before proceeding, we should remove both tools from the codebase.
Step 9. (Point of no return) Make application code stop writing to the old storage
We started double-writing to both old and new storage on step 3. We can now switch to writing only to the new storage. This is the point of no return: all attribute changes made since then would only be available in the new storage. But, if you did all the previous steps carefully, things must be really weird if you would need to revert after this stage.
After this change has been deployed, we can monitor the old storage for some time to confirm that there is no unexpected update traffic. It may arise from forgotten parts of the code base, or from some server stuck at very old version of the code. This unique opportunity may be invaluable to uncover issues with your system that are hard to find otherwise.
Step 10. Finally, get rid of the old storage.
As the last step, you need to physically remove old attribute storage: e.g., drop the column, or entire table. If the attribute used to be stored in a schema-less table, run the DELETE statement, removing the attribute values. If the attribute was stored in a JSON-encoded field, run a small script that deletes the old key from that JSON.
This is needed because you want to manage technical debt, don’t you? Also, this is yet another opportunity to find unaccounted data access patterns. If you drop the column that is supposed to be unused, and something breaks somewhere — you’ve just learned something new about your system.
Also, you save some disk space, and this would allow you to store more data without additional spending. Also, for many typical storage schemas, reducing the amount of data stored may improve performance.
This is the longest sub-series of this substack, and we’re not actually done yet. For completeness, some time in the future we’re going to cover:
migrating anchors (this is going to be a weird one);
Also, we need to write a much more concise checklist-style post for all kinds of migrations. Stay subscribed!