Extending booleans to enums
In one of the previous posts, “A comprehensive guide to storing boolean values”, we discussed that boolean values often can and probably should be treated as two-element enum values. Adding a NULL option, we can bolt on a third element of this “enum”. However, what if we started with enums from the beginning?
Logical level
On a logical level, enum-typed attributes store some state that a noun (or an entity, or an anchor) is in. For example, a credit card payment may be in the following processing states:
unpaid;
processing;
successful;
failed;
charged back.
Enum-typed attributes are considered structural attributes in the framework discussed in the “Structural and pure attributes” post. This means that any change to the set of enum elements is almost always accompanied by a change in the code that works with this attribute.
NB: things like “country” and “language” are not enums generally, because normally we should be able to add new languages and countries without changes in the code. However, in some situations there are specially-handled countries and languages. For example, English is often considered the safe default language for many business applications, and in many financial applications some countries require special handling (sometimes, though, all countries require special handling). This is a very interesting topic that we hope to be able to discuss some day.
There are some other attributes that look enum-typed, but arguably aren’t. For example, a book binding type could be:
hardcover;
paperback;
spiral-bound;
wire-bound;
etc.,
This list of elements could even be defined in code as a constant list, but if the code behaviour does not depend on the type of book binding then we do not consider it an enum. A textbook relational approach would be creating a tiny lookup table containing the list of known book bindings, and using values from that table. This design allows us to freely add new rows and even edit the existing ones. However, there are some interesting drawbacks in this approach (from the software development point of view), which we can discuss some other day.
Anyway, back to proper enums. If we consider an enum-typed attribute as some kind of state, and this state could be changed, then this immediately brings us to the idea of a “state machine” (aka “finite-state machine”, aka FSM). FSM persistence is a huge topic, so let’s put it aside for now, and focus only on storing “static” state.
Enums and either/or data
We discussed either/or data extensively since the beginning of this substack, back in January. See (see “Table of Contents” > “Data types” > “Structural data types” > “Either/or”). Each either/or data type is based on some kind of enum.
Let’s use the same “health declaration” example that we used to illustrate the “either/or” discussions (like in this post). There are three cases: a) has symptoms; b) has no symptoms; c) exempt from testing. For the “has symptoms” case we want to list the observed symptoms; and for the “exempt from testing” we want to record the exemption reason. Those extra pieces of data only make sense for those specific enum elements, so we call them “dependent pieces of data”. There is no extra data for “has no symptoms” at the moment, but we would like to design the storage in such a way that it is possible to extend either/or data as the requirements change. (This was discussed extensively in the “Concatenability of either/or data”).
Taking all of this into account, we’re ready to make a leap forward and propose the following conjecture:
Each proper enum type may and will eventually produce an either/or data structure with dependent pieces of data.
Assuming this, let’s discuss the physical model of enum data types.
JSON representation
As usual, we try designing JSON representation first, to remove the relational biases and assumptions. The simplest idea that comes to mind first is just to use a single key and single value. For example, for the “credit card processing status” example:
{ id: 235711,
...,
processing_state: “failed” }
or
{ id: 235711,
...,
processing_state: “successful” }
However, as we just mentioned, we have a strong feeling that in some of those states there is going to be some additional pieces of data, specific only to that state. For example, for the “failed” state we may want to add the technical reason for the failure, and for the “charged_back” state we may want to add a dispute case reference number. In light of this we propose the following representation that may look a bit weird initially:
{ id: 235711,
...,
processing_state: { failed: {} } }
and
{ id: 235711,
...,
processing_state: { successful: {} } }
The empty objects here (“{}”) serve double purpose. First, they could be easily evaluated to true value in the data processing code. Thus, we can write the following pseudocode:
if (obj.processing_state.successful) {
...;
} elsif (obj.processing_state.failed) {
...;
} elsif (...) {
...;
};
The second purpose of those empty objects is that they provide storage space for an unlimited number of extra pieces of data. For example, we can store the technical reason of failure:
{ id: 235711,
...,
processing_state: { failed: { reason: “insufficient_funds” } } }
This way of organizing dependent pieces of data makes the result much more robust against bugs and mistakes, and also allows for future extensibility.
We discussed this topic extensively back in February, in the following articles:
Frankly, when I was writing those texts, I did not yet understand such a close link between enums and either/or.
Relational representation
First, let’s revisit three earlier posts about either/or data in the relational context:
Those posts explore a big chunk of the design space, but now we understand the missing part: encoding the underlying enum-typed part. In “A comprehensive guide to storing boolean values” we started discussing this, and in the next post we’re going to tie up the knots and discuss storing enum-typed attributes. For now I would invite you to read the four posts linked above.