There are three well known standard lists of codes: currencies, countries and languages. I’d like to talk a little bit about how they are treated in databases, and also show how they could be represented in Minimal Modeling.
There will be no horror stories and “falsehoods programmers believe”-type stuff in this post.
Basics
Countries are represented by two-letter strings such as “no” for Norway and “tr” for Turkey. The complete list is standardized in the ISO 3166-1 alpha-2 standard.
Most common languages are represented by two-letter strings such as “es” for Spanish or “ja” for Japanese. There are many languages, so some of them require three letters. Also, there is an aspect of regional varieties, the most common of those is maybe British English vs American English. This could be handled using the POSIX format for locales, such as “es_AR” and “es_ES” for, correspondingly, Argentinian dialect of Spanish and standard Spanish.
Currencies are represented by three-letter strings such as “EUR” for euro, or “MXN” for Mexican peso. A good practical list is probably Wikipedia: https://en.wikipedia.org/wiki/List_of_circulating_currencies
Short strings are good primary keys
First, you can actually use such short strings as primary and foreign keys directly in your database. If you need to store travel expenses you can create a table that looks like this:CREATE TABLE expenses (
expense_id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
amount DOUBLE(17, 4) NOT NULL,
currency CHAR(3) NOT NULL,
-- other attributes
);
It seems that some people got the impression that your IDs can only be integers. Thus they want to create a “currencies” table with an integer ID and the currency code as an attribute of that ID. This is not necessary. Currency codes, country codes, and language codes are unique and so can be used as primary and foreign keys directly.
Note that I even used the “CHAR(3)
” physical data type in the table definition, and not the usual “VARCHAR(N)
” type. As far as I understand, this data type has zero overhead, so it’s basically three bytes, which is similar to the normal INTEGER
that you may be conditioned to expect as a primary keys. So, if you’re worried about the performance — don’t be.
Actually, by all means, you may absolutely be worried about performance or could even full-scale nerd out about nanoseconds! I’m just trying to point out some possibilities here. You need to study the documentation to your database anyway.
Where to store the list of currencies/countries/languages?
We said that you can use codes as primary and foreign keys. But of which table is it a primary key? This also seems to be a source of some confusion.
Many textbooks and courses on database modeling introduce an idea of lookup tables. Lookup table is a small table that contains just a small number of rows; a lookup table changes rarely, or even never; there is a very small number of columns in the lookup table. The main purpose of the lookup table is to maintain a primary key that other tables would refer to.
If we decide to implement a lookup table for currencies, we can create something like this:
CREATE TABLE currencies (
code CHAR(3) NOT NULL PRIMARY KEY,
name VARCHAR(32) NOT NULL
);
Then we’ll manually add a few currencies we care about:
INSERT INTO currencies VALUES (
(‘EUR’, ‘Euro’),
(‘USD’, ‘US Dollar’),
(‘CHF’, ‘Swiss Franc’));
And we’re done. Now we can refer to this table from other tables. Later, as we need to support more currencies, we’ll insert more rows.
This is good in theory, but won’t help much in practice.
In every real business database we need to carefully manage the list of currencies, countries and languages that our company handles. Basically, adding any new language, new currency or new supported country will need a lot of software development activity, legal approvals, and other organizational alignment.
Even if you have this hypothetical “currencies” table and you manage to insert a new row, without preparation, you will probably get into all sorts of trouble.
So, let’s get back to the original question: where do we store the list of countries, currencies and languages?
The answer is probably “some function in your codebase”! Like, there is a function called “list_of_supported_currencies
”, and it returns something like this (in pseudocode):
{ EUR: { name: “Euro”,
minor_units: 2 },
USD: { name: “US Dollar”,
minor_units: 2 },
CHF: { name: “Swiss Franc”,
minor_units: 2 },
}
Other parts of your system use this as a source of truth. (In reality, most probably, there are several such functions, with subtle differences worth a lot of money in possible fines.)
We’re primarily interested in database modeling, so we won’t discuss software development here.
Minimal Modeling representation
From the point of view of Minimal Modeling, currencies, countries and languages are anchors. Using table notation from the book:
With such a table, we have a central place where you could begin learning the database.
We show that the list of countries is stored in the “geo.Countries
” table. The other two anchors are ephemeral, they are “stored” in the source code. To add a new currency, instead of the INSERT statement you have to use a text editor, version control system and deployment tool.
What attributes do those anchors have? Let’s write down a couple of sample attributes for each anchor:
Of course, different attributes could be stored in different databases. Even if the name of the country is stored in the source code, we could store a list of customer support phone numbers in a small database table.
Also, you can imagine some links, such as:
Which Languages are used in the Country;
Which Currencies are used in the Country;
Which Country is the User a citizen of?
What is the primary Language of the Movie?
etc.
Customizing the standard list of codes
Looking at different businesses, we’ll quickly find that the standard lists of codes are not enough to handle some aspects of the business. Here are a few examples:
You may want to consider Bitcoin or Ethereum as a currency in its own right.
Sometimes you have to treat some territories as separate entities that require special handling; even though your table is named “countries”, the dictionary definition of that word becomes somewhat stretched;
Some dating websites treat “C++” and “Klingon” as languages that users speak;
Thankfully, this is not a problem. As we said before, in every real business database we need to carefully manage the list of currencies, countries and languages that our company handles. It means that we can use standard codes from commonly-accepted sources, but we can also introduce our own codes as needed. Some of the international standards even explicitly allow for user-defined extensions: e.g. country codes starting with “x
”.
So, for example, you can choose “xg” as a internal code for Greenland, as opposed to Denmark. If you do international deliveries it probably makes sense to distinguish between those two.
Summary
Countries, languages and currencies virtually always require special treatment.
You can use short well-defined strings as IDs, and as primary and foreign keys.
Small look-up tables are introduced in educational materials, but they may not necessarily be as useful in production.
Countries, languages and currencies are anchors in terms of Minimal Modeling. You can define attributes and links for those anchors too.
Anchors and attributes could be defined in the source code as a sort of ephemeral “lookup table”. Links, too.
One additional important list is the states of the United States.
Firstly, a Foreign Key (FK) can also be based on an UNIQUE Key, aka Alternate Key (AK). I still used a "traditional" PRIMARY KEY (PK) on "lookup" tables for things like countries and currencies while using the character code-based AK as the basis for FKs. This ensured only valid values were used, but avoided a fetch on the "lookup" table to map the PK to the code. It also allowed "housekeeping" mechanisms to still generically refer to rows in those tables, like all the others, via a tuple { <tablename>, <PK#> }.
Secondly, while the ISO codes for country and currency may be point-in-time unique, **they are not guaranteed to be historically unique**. Unless the standards have changed since I last looked at them, each had provisions (para 6.4.1 in my copy of ISO 3166 for countries, para A.5.2.3 in my copy of ISO 4217 for currencies) against reuse within 5 years. Interestingly, the former says "... should not be used ...", the latter "... will reserve ...".
When Brazil's currency was volatile in the late 1980s (https://en.wikipedia.org/wiki/Brazilian_currency), they went from Cruzados (BRC) to New Cruzados (BRN) to Cruzeiros in just a few years, for which they suggested returning to BRC. Fortunately, the business manager with whom I worked was the Currency Code "gatekeeper" at the time, and knew of my concerns about code reuse. He promised it wouldn't happen on his watch, and his suggestion of BRE for Cruzeiros was accepted by the Brazilians.