Book update
I continue working on the “Database Design” book, based on the Minimal Modeling approach. First, the book now has its own website:
“Database Design using Minimal Modeling”: a course
Second, during the past month we did a test run of the book material in the form of a Zoom course. ~20 people from the local data analytics community took part in three sessions so far (one more to go).
Here is an outline of what was discussed in each of the two-hour sessions (a lot of that material is going to be in the book). At the end of this post I also briefly discuss what I’ve learned from this.
[Day 1] Logical model, pt. I
1/ As an input, we have a free-form text listing the requirements. Based on that, we would write down three tables as output: anchors, attributes and links. Later on, we’ll be using them to create the structure of the database tables.
2/ Example: a personal library use-case discussion.
3/ Anchors and attributes. We use formalized sentences to describe and validate both anchors and attributes.
4/ We discuss logical data types for attributes. Later on they would be translated to physical data types for the chosen database.
5/ Logical modeling demonstration. We collaboratively build a logical model for a simple but complete clone of Google Drive.
6/ Homework: based on the templates provided, do the logical modeling of your own problem.
[Day 2] Primary and secondary data
7/ Difference between primary and secondary data.
8/ Types of secondary data: cached fields, pre-aggregated tables, flat tables, DWH, ML models, etc.
9/ Secondary data is not modelled like the primary data. However, we will learn how to create a useful catalog of secondary data that would help us to organize work.
10/ Secondary data is needed only for speed or convenience. It is always ad-hoc, in a sense.
Logical model, pt. II
11/ How do I know if my logical model is correct? Basically this is a question of “when do I stop?”. Logical model is incremental, and designed for evolution, so you can stop when you’ve covered the current project scope.
12/ What to do if my logical model turns out insufficient? The logical model is designed for evolution, so this is business as usual. If something needs to change, we update the model and migrate the database.
13/ How the logical model helps manage project risks.
Logical model: links in depth
14/ How to document links. We discuss the table of links.
15/ Discussion of cardinality, using formalized sentences. Cardinality is one of more challenging aspects for logical designs. At the same time, fixing wrong cardinality is hard in the running system, so we make sure to get it right from the first time
16/ Why do you need formalized sentences? As homework results show, writing down the sentences immediately catches many mistakes. Also, for non-obvious use cases it becomes clear where additional human-readable explanation is needed.
17/ Links as fragments of business processes.
Logical model diagrams
18/ Contrasting traditional table-based diagrams and logical model diagrams based on links
19/ Use case: franchise fees management. We demonstrate how to handle a substantial change of business structure. In the example, we change from single-tier franchise management and introduce regional management companies. We discuss how the logical model reflects this change, and what it would mean for the physical tables.
20/ Why having only diagrams is not enough? Another illustration on how formalized sentences contribute to understanding.
[Day 3] Designing tables
21/ How do I know if my physical schema is correct? This seems to be one of the questions that people are anxious about the most.
22/ Game of Tables: rules and possible strategies. Social aspects and technical aspects.
A classic strategy: table per anchor (aka 3NF).
23/ A complete step-by-step guide to table design using the classic strategy, straight from the textbook. This is where the promise of the course comes true.
24/ Example: a personal library use-case. We did a complete walk-through to write down all the necessary CREATE TABLE definitions for our small example.
25/ Table evolution: what to do as the system grows. We discuss CREATE TABLE and ALTER TABLE further.
26/ Why do real-world databases get so messy? We go back to the Game of Tables and why you can’t not play.
27/ Logical modeling of “events” table
28/ Logical modeling of historized attributes
[Day 4] Secondary representations
TBA
Extracting the logical model from existing database
TBA
What have I learned from the course so far?
It seems that the material generally works in the course setting. None of the people who participated have read the book draft, they learned everything from my explanations. We spent around 10 hours in total (live + several extra videos).
My hypothesis is that people’s main barrier lies in logical modeling, and not in table design. Table design is relatively straightforward, given that you decided on the strategy of Game of Tables. When you have a clear logical model, building tables becomes even somewhat boring.
The logical model, on the other hand, asks you hard questions that only you can have an answer to:
Can an employee have multiple managers?
What do we do when the employee needs to help out in a different branch for a week?
Can we have a different sales tax based on the type of an item?
Etc., etc.
You write down all the answers to such questions, and when the logical model finally makes sense, you can proceed with the table design.
I’m going to share some feedback from the listeners after the last session is over.
What’s next?
The book now has its own website: https://databasedesignbook.com/, and its own newsletter.
One unfortunate thing is that writing about the minimal modeling process requires a lot of tables. Substack does not handle tables; you can use screenshots but they are a bit awkward. I’m going to be posting a lot of material with tables, and that means that I must find some other CMS for such posts. Maybe I’ll have to use WordPress again.
This substack will continue with the usual assortment of topics and investigations.
At the https://databasedesignbook.com/ you’ll also find a nice table of contents for the future book. The current goal is to release the book in spring 2024.