Date's demon
I’d like to introduce an idea of “Date’s demon” (named in honor of Chris Date). Postulating the existence of this imaginary being, we can describe some of the long-term dynamics of database design, implementation and maintenance process. In a sense, Date’s demon is also your adversary in Game of Tables.
Here is the outline of this post:
First we introduce the idea of Date’s demon;
Then we talk about various things that Date’s demon is able to influence, such as the progress in hardware, database engine innovations, and the structure of your business;
We note that logical design is immune to Date’s demon;
We conclude with some suggestions regarding how you could prepare for the inevitable.
Date’s demon is similar to well-known Maxwell’s demon. Maxwell’s demon is an imaginary being that was part of a thought experiment aimed to clarify the second law of thermodynamics. The experiment is set up so that there are two chambers containing gas, separated by a door. Wikipedia: “As individual gas molecules (or atoms) approach the door, the demon quickly opens and closes the door to allow only fast-moving molecules to pass through in one direction, and only slow-moving molecules to pass through in the other. Because the kinetic temperature of a gas depends on the velocities of its constituent molecules, the demon's actions cause one chamber to warm up and the other to cool down.”
Date’s demon
Now let’s describe what Date’s demon does. It is somewhat more powerful and sophisticated than Maxwell’s because it can influence the general progress of technology, and the circumstances affecting the business of your company. It does this just to mess with you personally.
The end result of its efforts is that a few years down the line your original database design will often look so awkward that you will even have to explain yourself. “Times were different!”. Also, the people who redesign your database will look cooler than you, because their design would be so elegant and scalable (performance-wise and, more importantly, organization-wise). But don’t worry, just wait a few more years and it will be your turn to chuckle again, thanks to the relentless efforts of Date’s demon.
Suppose that you build and maintain a database for your business. You design the initial database schema, using all the skills, knowledge and experience available to you. You choose a database server implementation that is best suited for the nature of your business. You create a physical design from scratch, using all the best practices that you’ve learned while preparing for that project. You unveil your new system, and your customers start using it: maybe they buy something, maybe they post photos of their vacations, maybe they are looking for a job or a life partner.
Right after the system is live, Date’s demon starts doing its thing. It works over the span of years. Here are some things that it’s able to do.
Hardware progress
One of the greatest achievements of Date’s demon was the introduction of SSD (Solid State Drives). This has drastically changed the bandwidth/latency equation compared to the previous tech: rotating disk drives. A lot of the accumulated body of knowledge in database design implicitly relied on that equation. Some of the advice that used to be common sense back in the day may now actively hurt efficiency.
Some constraints (things you shouldn’t do) may no longer be relevant, but they may still linger in the collective consciousness. People will instinctively avoid doing some things that used to be bad back then, but are not so bad now.
Another thing that always changes and always stays the same is the increase of disk capacity. That’s one of the tools used by Date’s demon, but it’s not very sophisticated: it’s just too simple, too linear. You could always buy more disk space if you wanted and if your profits allowed. At the same time, you always had a limit on how much disk space you could buy: it was never infinite. Those two sentences do not contradict each other, and they are still true, no matter when you’re reading this. They were true when a typical disk was 64 megabytes, and also when 4 Tb disks became a commodity. They are also true even in elastic cloud environments.
Scalability changes
An interesting change happens when you migrate from raw hardware to elastic resources. If you have a computer: CPU, memory, and a hard disk drive connected via a bus, your system lives in a very rigid “box”. It does not give in, even a little bit, when you reach the sides of that box. You have an exact number of bytes available to you, and not a single byte more. You have a certain maximum throughput that you can achieve in ideal conditions, and not a single byte/sec more. You have a certain latency pattern that is also quite fixed: a response arrives no earlier than x microseconds after the request was submitted.
You’re guaranteed to have this box, and your system may implicitly rely on the characteristics of this box. It basically serves as guardrails. For example, even if you’re doing useless database accesses, that’s fine. At some point you will reach the limits of the box and it will begin to basically shed the load. Load shedding is one of the most important capabilities of a reliable system. When this happens, you can optimize or rearrange your access patterns, and you will have more space for your system. Also, you can buy a bigger box, but it will have the same strictness of limits, and the same load shedding behavior.
With the introduction of elastic (cloud) storage the behavior of this box changes drastically. It no longer sheds the load for you as strictly as it did before. This may even be a good thing, the point is that this changes the constraints of your systems in non-trivial ways. You can now do new things, but doing some old things may now hurt. (Making Date’s demon happy either way, of course.)
Database engine progress
Even within a class of databases (say, classic ACID relational databases), there is a lot of progress going on, with the help of Date’s demon. People find new ways to optimize common access patterns, and that changes the relative equation. Database design approaches that were discouraged before are now performance-neutral. Again, the long-lived folk advice becomes irrelevant or even actively hurts performance (or fails to unlock performance opportunities).
Also, the entire data layout format can change underneath you. As an example, there is a continuous reintroduction of column-based storage, even though row-based storage continues to be dominant. Some access patterns that your system has may be perfectly suited for column-based storage solution. But are there any good offerings available today? Or would you have to wait a few years until it again comes back in style? Of course, Date’s demon strives to ensure that you’re on the losing side of this equation as often as possible.
Traversing the design space
Introducing an entire new class of database engine is a very powerful move available to Date’s demon. It turns out that it can inspire some people to actively investigate what would happen if you disregard one of the traditional assumptions about how a database engine is supposed to work. As a result, some very interesting new options become available for you as a database designer. (Later, of course, some of them would also become antiquated for various reasons, so your excitement may not be very long lived.)
One practical example that I like is the introduction of Cassandra with its unique approach to storing data. The way Cassandra works (or used to work last time I checked, but maybe Date’s demon already managed to make me look uninformed, I won’t even bother to check) is that it never even reads the existing data before making any changes. So if you want to delete a row, it does not find this row and marks it as deleted, it just inserts something called “tombstone”. If you delete the same row several times, you will get several tombstones. If you have too many tombstones, the performance may degrade. This violates an expectation that always was fundamental for me personally: deleting something that does not exist is a null operation. Now you may want to be extra careful with that (but only if you’re in a Cassandra environment, because for a typical relational database, on the contrary, querying first would hurt performance).
Affecting your business
Sometimes Date’s demon influences your customers to change their preferences, to change how they use your system. You may have subtly designed your system under the assumption that action A happens more often than action B. Slowly but surely you may find out that the actual customers prefer action B. Thus, part of your system that handles action A becomes slightly over-optimized, while the other part is somewhat under-optimized. You wish it was the other way around. When you have an opportunity for refactoring, you may move the point of balance to a better position. Now you can easily imagine what happens next (Date’s demon attacks from another side).
Organizational scalability
Many database design decisions are driven by the need to remove organizational bottlenecks. You want separate teams to not be blocked by some other team when they want to evolve the database schema. One example of such an approach is all sorts of “schemaless” designs, such as JSON columns and EAV (Entity-Attribute-Value). One reason for choosing this is that the usual ways to change the database schema may be prohibitively expensive. As years go by, you may find that, for example, this flexibility is not really used. It may have been useful during the initial development phase. But now, when that part of the system has been stabilized, it turns out that the number of data elements is not even as big as was expected.
Also, as part of Date’s demon activities, you may find out that people have invented practical ways to do the normal schema migrations (what is known as online schema change). Or some of the equations change and you find out that you can now directly use anchor modeling on a physical level? So, maybe nowadays you would solve the organizational bottlenecks in a different way.
Logical design is immune to Date’s demon
Note that it’s not possible to avoid the effects of Date’s demon on the physical level. Logical design, however, is immune to it. Logical design only needs to follow the actual changes in the business domain: new pieces of data that you need to store, new relationships, new entities.
You can write down the logical design, for example using the Minimal Modeling approach, and keep updating it as your business changes. Game of Tables would still be happening at the physical level, but it’ll be less noisy because the logical design has been pinned down.
Conclusion
The most important advice that I have in mind is that maybe you should not sweat over the physical design so much anyway. Make sure that logical design correctly corresponds to the business reality. Choose the database and the table design strategy that you like, and implement it. (In the end, if you really need to speed up some query, you can always create a secondary representation that is designed for that specific area.)
No matter what you did at the physical level, there is a strong chance that you will be slightly embarrassed about this design five years later. That’s fine, it’s Date’s demon as usual.