A holy grail of data modeling is “making illegal states unrepresentable”. In programming languages theory and practice we want to represent complex data in such a way that it’s impossible to use the data incorrectly. Recently, modern programming languages such as Rust have popularized this approach.
I’d like to present the following challenge: build a relational data model that implements a real-world problem in a way that it’s impossible to store contradictory data.
First, a real-world problem:
Suppose that we want to build an app for tracking habits that we want to avoid, for example takeaway coffee consumption. Every day the user of this app needs to record each time they had takeaway coffee: time of the day, type of coffee, price. If the user managed to avoid coffee all day, they can confirm that by clicking the “coffee-free day” button.
We want to maintain awareness, so we require that the user communicates with the app every day. If the user did not use the app the entire day, we need to distinguish that.
Summarizing, here are three possibilities for each day:
user had takeaway coffee, or several: here is the list;
user had a coffee-free day;
user avoided our app.
Conditions of the challenge
You need to submit a relational database schema that implements the problem described above. This schema must prevent incorrect combinations of data in different tables (see below for examples). The most convenient way would probably be something like https://www.db-fiddle.com/.
The goal is to learn something about relational modeling. It’s possible that somebody, maybe you, would instantly present a solution and it will be obvious in retrospect. That would be the best outcome for me personally, because I have no idea how to solve it. Moreover, I conjecture that it’s impossible within the classic relational model framework. But I may be wrong.
You need to use a classic relational model, not violating the first normal form. You can use standard database constraints.
Using JSON-typed and similar column data types does not count, because it’s trivial to implement the solution in JSON (left as an exercise for the reader).
Using array-typed columns is dubious: it’s okay if arrays contain simple types such as numbers or strings, but not okay if they contain user-defined record types, or something like that. Basically, you can’t smuggle in JSON, because it would violate the first normal form requirement. If you think that arrays also violate 1NF you’re welcome not to use them too.
You can’t use database triggers. Implementing this with triggers is tedious but straightforward (left as an exercise for the reader).
Your solution should work in more or less any reasonable DBMS. It’s okay to submit solutions that rely on specific features of some DBMS, if those features are not JSON or triggers in disguise.
Acceptable (and welcome) solution is a proof that this is in fact impossible, and an explanation why.
It’s probably possible to somehow work around those requirements on some technicality, but this may or may not be an interesting solution. This challenge is not a trick question, I’m looking for algebraically sound responses here, that could be taught to people.
I’m eagerly waiting for what the community has to say on this.
P.S.: textbook schema is non-compliant
Here is an explanation of why the textbook schema is not compliant. This is trivially modelled like this:
CREATE TABLE coffee_free_days (
day DATE NOT NULL PRIMARY KEY
);
CREATE TABLE coffees (
id INTEGER NOT NULL PRIMARY KEY,
day DATE NOT NULL,
daytime VARCHAR(5) NOT NULL,
what VARCHAR(32) NOT NULL,
price DECIMAL(5, 2) NOT NULL
);
If there is a row in the “coffee_free_days
” table, that means that the user had a coffee-free day. In that case, there must be no rows in the “coffees
” table”.
If the user had some coffees, we insert rows into the “coffees
” table. In that case, there must be no row in the “coffee_free_days
” table.
In this textbook implementation it’s possible to create contradictory data:
insert a row into “
coffee_free_days
” with the value (“2025-02-01”
);insert a row into “
coffees
” with values (1234, “2025-02-01”, “13:38”, “Spanish latte”, 8.50
);
There is nothing in this schema that prevents this incorrect combination of rows.
The challenge is to prevent any possible contradiction, using features of the relational model as implemented by commonly available systems.
I’m interested to hear what you have to say on this.
A couple of thoughts, overlapping other comments somewhat:
1. CROSS-TABLE CONSTRAINTS NOT WIDELY SUPPORTED
You want an entry in either 'coffee_free_days' or 'coffees' for a given day, but not both. Enforcing that as part of the schema definition requires a CONSTRAINT across both tables. Alas, the only cross-table CONSTRAINT generally-supported is a FOREIGN KEY, which can't do what you're seeking.
2. TRYING TO IMPLEMENT A SUBTYPE
In this case, you have an entity of 'day', and you want 3 specializations of it:
- implicitly no coffee drunk: nothing recorded for the day.
- explicitly no coffee drunk: "coffee-free day" button clicked.
- one or more 'coffees' recorded for the day.
Relational model doesn't support the concept of subtype. We might get something close in this case with:
CREATE TABLE days(
day DATE NOT NULL PRIMARY KEY ) ;
CREATE TABLE coffees(
id INTEGER NOT NULL PRIMARY KEY,
day DATE NOT NULL REFERENCES days( day ),
time TIME NOT NULL,
what VARCHAR( 32 ) NOT NULL,
price DECIMAL( 5,2 ) NOT NULL ) ;
This changes a couple of semantics:
- 'days' records days when either "coffee-free day" was pressed *or* the first linked "coffees" entry for the day was entered, and
- a 'days' entry *without* linked 'coffees' must have been recorded as a "coffee-free day".
This would required the code to check for the following errors:
- "coffee-free day" is pressed when a "days" entry already exists linked to a "coffees" entry
- a coffee is entered but a "days" entry already exists *without* a linked "coffees" entry implying it was previously recorded as a "coffee-free day".
Actually, I have many more thoughts, but that would be a much longer conversation.
I think this would be implementable using a single table (status_updates) using PostgreSQL exclusion constraints - not sure if you would consider them "trigger-like" or not.
Hopefully this comes through with Substack's formatting; the hash is postgres xor (using btree_gist or gin?), the meaning should be "on a single day, you can only have rows which all have equal is_coffee_free flag (all true or all false)".
`ALTER TABLE status_updates ADD CONSTRAINT either_coffee_or_coffee_free EXCLUDE USING gist (date WITH =, is_coffee_free WITH #);`
You can augment this using a second partial unique index, `CREATE UNIQUE INDEX ON status_updates (date) WHERE (is_coffee_free);`, to ensure there can only ever be a single "coffee-free" check-in.
Haven't tried this specific exclusion constraint, but I've modelled similar problem shapes in a similar way, so it should work?
Edit: Coffee details could either be a nullable FK pointing to 'coffee_details', or a set of nullable columns enforced to be filled out using a similar type of a constraint (`is_coffee_free or (what is not null and price is not null)`)