Best ideas from the first 25 posts
There were 25 posts in this substack so far. I’m now re-reading all of them and I’d like to share a list of ideas I myself find most useful. Some of them changed the way I now think about some data modeling problems; some of them just feel elegant; some of them are quite counterintuitive or go against tradition.
In no particular order:
1/ Generalized left join is the first thing that comes to mind. A table is a trivial case of left join. Inner join is a special case of left join. “Many faces of left join” (Apr 2021).
2/ Designing comprehensive validity constraints seems to be a losing game. Begin from the list of data types, analyze all kinds of required validity constraints, eventually you see that preserving certain invariants becomes unexpectedly hard. E.g.: “Persisting either/or data” (Mar 2021).
3/ “First normal form is not concerned about values, only about types” (“Making sense of 1NF, pt. II”, June 2021);
4/ “Concatenability principle” (Jan 2021);
5/ Classification of data types to be modelled: “Structural and pure attributes” (Feb 2021);
6/ There are only three ways to store attributes: main table, side table, and single-attribute table (“How else to store either/or data”, Mar 2021); those three representations are closely tied by table views;
7/ Final design should be the same no matter what the order of implementing requirements was. (“Requirements change: migrating either/or data”, Mar 2021);
8/ Some table designs look like “data duplication”, but they are not (Ibid.);
9/ “Erasure in database modeling” (Mar 2021);
10/ Physical database modeling is a game (Mar 2021), and you can make the right move for the wrong reason;
11/ SQL turns out to be higher level than I expected: “Many faces of left join” (Apr 2021);
12/ Multi-part SQL queries should be used more widely (Ibid.);
13/ It seems that constant values in a table column is a bit of an anti-pattern (from the minimality point of view) (“UNION ALL, pt. II: polymorphic data”, May 2021);
14/ Table indexes are derived data (“Propagation delay”, May 2021);
15/ Studying JSON representation helps overcome biases of relational model;
16/ Representing either/or data as the following JSON with a single top-level key:
{“has_no_symptoms”: {}}
or
{“has_symptoms”: {“symptoms”: [“cold”, “coughing”]}}
is extensible, and encourages to write correct code. E.g.: “Database constraints, pt. II” (Jan 2021);
17/ If you run a substack, keeping the weekly cadence helps.
Thank you for your interest. God willing, there would be another 25 issues this year, expanding on some of those topics and investigating some other directions. See “Table of contents” for some future topics.