A couple of thoughts, overlapping other comments somewhat:
1. CROSS-TABLE CONSTRAINTS NOT WIDELY SUPPORTED
You want an entry in either 'coffee_free_days' or 'coffees' for a given day, but not both. Enforcing that as part of the schema definition requires a CONSTRAINT across both tables. Alas, the only cross-table CONSTRAINT generally-supported is a FOREIGN KEY, which can't do what you're seeking.
2. TRYING TO IMPLEMENT A SUBTYPE
In this case, you have an entity of 'day', and you want 3 specializations of it:
- implicitly no coffee drunk: nothing recorded for the day.
- explicitly no coffee drunk: "coffee-free day" button clicked.
- one or more 'coffees' recorded for the day.
Relational model doesn't support the concept of subtype. We might get something close in this case with:
CREATE TABLE days(
day DATE NOT NULL PRIMARY KEY ) ;
CREATE TABLE coffees(
id INTEGER NOT NULL PRIMARY KEY,
day DATE NOT NULL REFERENCES days( day ),
time TIME NOT NULL,
what VARCHAR( 32 ) NOT NULL,
price DECIMAL( 5,2 ) NOT NULL ) ;
This changes a couple of semantics:
- 'days' records days when either "coffee-free day" was pressed *or* the first linked "coffees" entry for the day was entered, and
- a 'days' entry *without* linked 'coffees' must have been recorded as a "coffee-free day".
This would required the code to check for the following errors:
- "coffee-free day" is pressed when a "days" entry already exists linked to a "coffees" entry
- a coffee is entered but a "days" entry already exists *without* a linked "coffees" entry implying it was previously recorded as a "coffee-free day".
Actually, I have many more thoughts, but that would be a much longer conversation.
I think this would be implementable using a single table (status_updates) using PostgreSQL exclusion constraints - not sure if you would consider them "trigger-like" or not.
Hopefully this comes through with Substack's formatting; the hash is postgres xor (using btree_gist or gin?), the meaning should be "on a single day, you can only have rows which all have equal is_coffee_free flag (all true or all false)".
`ALTER TABLE status_updates ADD CONSTRAINT either_coffee_or_coffee_free EXCLUDE USING gist (date WITH =, is_coffee_free WITH #);`
You can augment this using a second partial unique index, `CREATE UNIQUE INDEX ON status_updates (date) WHERE (is_coffee_free);`, to ensure there can only ever be a single "coffee-free" check-in.
Haven't tried this specific exclusion constraint, but I've modelled similar problem shapes in a similar way, so it should work?
Edit: Coffee details could either be a nullable FK pointing to 'coffee_details', or a set of nullable columns enforced to be filled out using a similar type of a constraint (`is_coffee_free or (what is not null and price is not null)`)
> Coffee details could either be a nullable FK pointing to 'coffee_details', or a set of nullable columns enforced to be filled out using a similar type of a constraint (`is_coffee_free or (what is not null and price is not null)`
Your comment made me realize several things, I think that my challenge is slightly broken.
First, as I was writing the text, I've invented an unholy contraption of three columns, which I think are equivalent to your solution (but I need to confirm this). This contraption is pure I think, it should work, like, everywhere (but I'm not sure). I'll write up about that.
Second, I found a case where this contraption "breaks": you can have a "coffee day", but empty list of coffees. So I said "AHA" and proceeded with publication.
But reading your comment I realized that my statement that "it's trivial in JSON" is also incorrect, it also can have "coffee day but empty list of coffees" situation. So only triggers seem to be able to maintain the correctness.
Now I am even less sure if that's possible to implement, but I'll think about that more. (Update: my case would be trivially solved if we could literally violate 1NF, having a relation as the data value. That's what this "JSON" thing is really about. Nice.)
Also, thinking about your comment made me clearly realize why I think CHECK CONTRAINTS are weirdly underpowered: they could be made much more powerful basically with minimal changes in syntax and database functionality (Update: now I'm not sure). Even Postgres is weirdly limited with check constraints. I've written about this before: https://minimalmodeling.substack.com/p/persisting-eitheror-values-or-the
There were a few technical issues with the exclusion constraint, as neither bool nor xor were supported by btree_gist, but a few casts solved that. What I created doesn't really work with timezones, as we need to save local timestamps and not UTC (getting a date out of a timestamptz is not an "immutable" operation), but that's to be expected, I think.
I did see your attempt - it is definitely very close to standard SQL and I think it is actually portable(?). But aesthetically, I much prefer the "updates" framing (a little throwback to event sourcing?) - and, well, it's features like this, together with RLS, PostGIS, Postgraphile, and a few other nifty things, that made me choose Postgres :)
If I needed to do this in JSON (columns or elsewhere), this kind of mutually exclusive properties is doable in JSONSchema: in a row-per-day schema that you implied in the assignment, the JSON column would look like "{ coffeeFree: true }" or "{ coffees: [{}, {}] }" and be verified by a schema like the one here: https://www.jsonschemavalidator.net/s/lNrna3NI
(edit: Given that from OracleDB on, databases are starting to support JsonSchema, this would also be a viable approach.)
Thinking from a Kimball dimensional design perspective, perhaps we only record “just the facts” of coffee consumption in the Coffees table. Could have a “fact less fact” coverage table with the Days that People could have coffee. A BI tool could then determine days people did not consume coffee. Perhaps use a set function of Coffees vs Available?
I was too fast to answer. But I suspect store the “status” of each person for each day, then make the PK the Date, Person. So no duplicates enforced by the DB?
(Substack does not send comment notifications reliably, sad.)
You can not even bother with persons, it's enough to just have a day as a PK.
> store the “status” of each person for each day
The main problem of "making illegal states unrepresentable" is to find such a schema that it's impossible to have contradictory information.
For example, what if you have a buggy app? Standard relational schema assumes that application is doing correct changes to the database. In practice this is not a problem, of course, the exercise is primarily theoretical.
Relational model really does handle many scenarios for you, even if your application is buggy. But here I think the simplest case that it's not able to handle, and I'm trying to prove or disprove this.
A couple of thoughts, overlapping other comments somewhat:
1. CROSS-TABLE CONSTRAINTS NOT WIDELY SUPPORTED
You want an entry in either 'coffee_free_days' or 'coffees' for a given day, but not both. Enforcing that as part of the schema definition requires a CONSTRAINT across both tables. Alas, the only cross-table CONSTRAINT generally-supported is a FOREIGN KEY, which can't do what you're seeking.
2. TRYING TO IMPLEMENT A SUBTYPE
In this case, you have an entity of 'day', and you want 3 specializations of it:
- implicitly no coffee drunk: nothing recorded for the day.
- explicitly no coffee drunk: "coffee-free day" button clicked.
- one or more 'coffees' recorded for the day.
Relational model doesn't support the concept of subtype. We might get something close in this case with:
CREATE TABLE days(
day DATE NOT NULL PRIMARY KEY ) ;
CREATE TABLE coffees(
id INTEGER NOT NULL PRIMARY KEY,
day DATE NOT NULL REFERENCES days( day ),
time TIME NOT NULL,
what VARCHAR( 32 ) NOT NULL,
price DECIMAL( 5,2 ) NOT NULL ) ;
This changes a couple of semantics:
- 'days' records days when either "coffee-free day" was pressed *or* the first linked "coffees" entry for the day was entered, and
- a 'days' entry *without* linked 'coffees' must have been recorded as a "coffee-free day".
This would required the code to check for the following errors:
- "coffee-free day" is pressed when a "days" entry already exists linked to a "coffees" entry
- a coffee is entered but a "days" entry already exists *without* a linked "coffees" entry implying it was previously recorded as a "coffee-free day".
Actually, I have many more thoughts, but that would be a much longer conversation.
I think this would be implementable using a single table (status_updates) using PostgreSQL exclusion constraints - not sure if you would consider them "trigger-like" or not.
Hopefully this comes through with Substack's formatting; the hash is postgres xor (using btree_gist or gin?), the meaning should be "on a single day, you can only have rows which all have equal is_coffee_free flag (all true or all false)".
`ALTER TABLE status_updates ADD CONSTRAINT either_coffee_or_coffee_free EXCLUDE USING gist (date WITH =, is_coffee_free WITH #);`
You can augment this using a second partial unique index, `CREATE UNIQUE INDEX ON status_updates (date) WHERE (is_coffee_free);`, to ensure there can only ever be a single "coffee-free" check-in.
Haven't tried this specific exclusion constraint, but I've modelled similar problem shapes in a similar way, so it should work?
Edit: Coffee details could either be a nullable FK pointing to 'coffee_details', or a set of nullable columns enforced to be filled out using a similar type of a constraint (`is_coffee_free or (what is not null and price is not null)`)
> Coffee details could either be a nullable FK pointing to 'coffee_details', or a set of nullable columns enforced to be filled out using a similar type of a constraint (`is_coffee_free or (what is not null and price is not null)`
ahhh interesting! Let me think.
So basically you can have an FK in `coffees` that refers to `coffee_days.coffee_day` (nullable unique constraint). https://www.db-fiddle.com/f/eY7h6H8RmLtdKNz3oPpfFk/0 right?
> EXCLUDE USING gist
wow interesting, I'm going to read about that.
Your comment made me realize several things, I think that my challenge is slightly broken.
First, as I was writing the text, I've invented an unholy contraption of three columns, which I think are equivalent to your solution (but I need to confirm this). This contraption is pure I think, it should work, like, everywhere (but I'm not sure). I'll write up about that.
Update: here is the link: https://www.db-fiddle.com/f/eY7h6H8RmLtdKNz3oPpfFk/0.
Second, I found a case where this contraption "breaks": you can have a "coffee day", but empty list of coffees. So I said "AHA" and proceeded with publication.
But reading your comment I realized that my statement that "it's trivial in JSON" is also incorrect, it also can have "coffee day but empty list of coffees" situation. So only triggers seem to be able to maintain the correctness.
Now I am even less sure if that's possible to implement, but I'll think about that more. (Update: my case would be trivially solved if we could literally violate 1NF, having a relation as the data value. That's what this "JSON" thing is really about. Nice.)
Also, thinking about your comment made me clearly realize why I think CHECK CONTRAINTS are weirdly underpowered: they could be made much more powerful basically with minimal changes in syntax and database functionality (Update: now I'm not sure). Even Postgres is weirdly limited with check constraints. I've written about this before: https://minimalmodeling.substack.com/p/persisting-eitheror-values-or-the
Thank you!
I've spent more time than I would've liked on this but oh, well :)
This is what I'd envisioned: https://onecompiler.com/postgresql/438293d2y (or https://dbfiddle.uk/Srp9DGpa as a backup), as db-fiddle.com doesn't seem to support psql extensions and I didn't find the alternative that I'd used in the past.
There were a few technical issues with the exclusion constraint, as neither bool nor xor were supported by btree_gist, but a few casts solved that. What I created doesn't really work with timezones, as we need to save local timestamps and not UTC (getting a date out of a timestamptz is not an "immutable" operation), but that's to be expected, I think.
I did see your attempt - it is definitely very close to standard SQL and I think it is actually portable(?). But aesthetically, I much prefer the "updates" framing (a little throwback to event sourcing?) - and, well, it's features like this, together with RLS, PostGIS, Postgraphile, and a few other nifty things, that made me choose Postgres :)
If I needed to do this in JSON (columns or elsewhere), this kind of mutually exclusive properties is doable in JSONSchema: in a row-per-day schema that you implied in the assignment, the JSON column would look like "{ coffeeFree: true }" or "{ coffees: [{}, {}] }" and be verified by a schema like the one here: https://www.jsonschemavalidator.net/s/lNrna3NI
(edit: Given that from OracleDB on, databases are starting to support JsonSchema, this would also be a viable approach.)
Thanks a lot! I'm going to take some time to read up on EXCLUDE and try to wrap my head around this application.
Super interesting!
And a quick analytic view of user activity over the updates table: https://dbfiddle.uk/_z-uJgxf
Thinking from a Kimball dimensional design perspective, perhaps we only record “just the facts” of coffee consumption in the Coffees table. Could have a “fact less fact” coverage table with the Days that People could have coffee. A BI tool could then determine days people did not consume coffee. Perhaps use a set function of Coffees vs Available?
> A BI tool could then determine days people did not consume coffee
how to distinguish here between days without coffee and days without an app?
I was too fast to answer. But I suspect store the “status” of each person for each day, then make the PK the Date, Person. So no duplicates enforced by the DB?
Hey,
(Substack does not send comment notifications reliably, sad.)
You can not even bother with persons, it's enough to just have a day as a PK.
> store the “status” of each person for each day
The main problem of "making illegal states unrepresentable" is to find such a schema that it's impossible to have contradictory information.
For example, what if you have a buggy app? Standard relational schema assumes that application is doing correct changes to the database. In practice this is not a problem, of course, the exercise is primarily theoretical.
Relational model really does handle many scenarios for you, even if your application is buggy. But here I think the simplest case that it's not able to handle, and I'm trying to prove or disprove this.