What is this substack about? Here are the highlights from the first 25 issues.
One of the important structural data types is a pair. A pair consists of two values of pure type, tightly coupled. Common use cases for pairs are:
Monetary amounts (amount and currency);
Geographic coordinates (latitude and longitude);
Physical dimensions (width and height).
NB: people names and street addresses would be treated separately: it’s a huge topic with lots of interesting details. If you know any other use cases for pairs, please leave your comments.
Logical level
In the minimal modeling approach we insist that attributes must be independent from each other. But sometimes you have two pieces of data that are so tightly coupled that it’s not really useful to treat each of them separately.
For example, if you have a geographic coordinate of a certain object, you can’t really do anything with it if you know only its latitude (but see Jules Verne’s “In Search of the Castaways”). Same goes for monetary amounts: “1000” means nothing if we don’t know the currency.
We treat such cases as a single attribute with multi-valued type. For geographic coordinates it’s a pair of numbers; for monetary addresses it’s a number and a string.
Validation
Validating the correctness of monetary amount requires both elements of the pair. This is because different currencies have different restrictions on fractional values. For most currencies in the world the major unit (such as dollar or złoty) is divided in 100 minor units (such as cent, or grosz). For many important currencies this is different: yen has no minor units; Omani rial is divided in 1000 baisa; and Bitcoin is divided in 10.000.000 satoshis.
Thus, as a monetary amount, the pair (100.31, “JPY”) is not valid, while (100.31, “USD”) is. It’s one of the places where the dream of validating data with the means of the database server would most probably fail. Technically you can validate the fractional parts in the “amount” field, taking the “currency” field into account, but the validation rules would be long and need to be carefully maintained. You will probably resort to doing this kind of validation in the application code.
Note, however, that there is another slightly different use case: e.g., currently the price for “Deep Archive Access Tier, All Storage / Month” in Amazon S3 is $0.00099 per GB. This is not a monetary amount: you have to multiply it by the consumed storage size and round up to two decimal places to get the price. So, you must use different validation rules for this.
Extending pairs to tuples
Pairs are a special case of tuples. Pair is a 2-element tuple. Of course, you can consider longer tuples, such as 3-element tuples, but it gets even harder to find useful applications for this. Well, “physical dimensions” example could be extended to (width, height, depth), but that’s probably it. If you have any interesting ideas, please leave a comment.
JSON representation
Two simple representations come to mind first: array-based:
[ 200, “USD” ]
and object-based:
{ amount: 200, currency: “USD” }
Array-based representation is very compact but not very self-documented. It works reasonably well for the monetary amounts, but consider geo coordinates, for example [40.7128, 74.0060]. People will constantly be forgetting if it’s [lat, long] or [long, lat].
Object-based representation is unambiguous:
{ latitude: 40.7128, longitude: 74.0060 }
Physical representation
Let’s reiterate that on the logical level we have a single attribute (say, “price of the Item”). This attribute is pair-typed, and so it has two elements (numeric “amount” and string-typed “currency”). So, how do we represent this in physical tables?
As usual, we have three ways of storing an attribute: main table, side table and per-attribute table. Let’s consider the per-attribute table first. Each column in a typical database can hold only a single value, so it seems that we need to have the three-column table (and not the usual two-column table). For example:
CREATE TABLE item__price (
item_id INTEGER NOT NULL PRIMARY KEY,
amount DECIMAL(15,4) NOT NULL,
currency CHAR(3) NOT NULL
);
This schema ensures that if we set a value for the attribute then both its elements are required (they’re marked as NOT NULL).
For main and side tables the situation is similar, only the names of the fields are different:
CREATE TABLE items (
id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
-- ...other attributes...
price_amount DECIMAL(15,4) NOT NULL,
price_currency CHAR(3) NOT NULL
);
The difference is that the attribute name (“price”) was used in the table name of per-attribute table, and in the field names of the main table.
NB: We used here just one of the possible representations for the amount and currency, namely DECIMAL(15,4) and CHAR(3). There are many possible considerations here, and we’ll discuss how to represent money in a separate post.
This issue interrupts the (quite long and boring, frankly) “Migrations” series (see archive page). In the next issue we’ll either continue with the final migration steps, or we’ll begin discussing another big topic: schemaless representations. Stay subscribed!
> { amount: 200, currency: “USD” }
To interpret currencies correctly every time, one also need to include the date with the amount. Sometimes the number of valid decimals changes (can both increase and decrease); sometimes a major devaluation happens (where the ticker may or may not change)