The course: How to make sense of typical database

Jan 18, 2022

TL;DR: I’m going to organize a Zoom course on how to make sense of typical databases, relational and not.

Dear all,

I spent 2021 writing a weekly newsletter on various topics of database modeling, more than 40 thousand words total. But why?

I started working with databases maybe in 1996. I remember the MS Access database that I tried to make sense of. When I spent a day trying to write a report query against it, I brought the first results to my boss. He looked at it for a few seconds and said: “This is bullshit. This camera costs $5000. There is one in my office, and there are maybe a couple more in the entire country. Your report says 37 were sold this month, it’s impossible.”

Later I designed a number of databases, and worked with many existing databases. At some point they all look eerily similar:

Too many tables;
No documentation;
Historical “growth rings” of changing approaches to data modeling;
Violations of relational normal forms and sometimes of common sense;
Interesting* and clever* approaches to storing data.

I am now ready to pass along the experience that accumulated in all those years. How to find your way around the database that you see for the first time? How to make sense of the tables and columns that your stakeholders are interested in? How to change the table schema with minimal risks? We paved the road with good intentions: why does it lead to hell?

Who is this course for?

Software developers, business analysts and data scientists, from junior to mid-level. If you recently joined a company with a huge database (or many databases), and you’re overwhelmed — this course is probably for you.

Prerequisites: you need to have some understanding of database tables and basic SQL. Knowledge of relational algebra is NOT required. It doesn’t matter which database server you use (MySQL, Oracle, MongoDB, cloud solutions, etc., etc.).

When and for how long will the course be running?

I want to make eight weekly Zoom sessions, two hour long each. First hour is a lecture, the second hour is Q&A and free-form discussion. Second hour is going to be maybe even more useful: we can discuss your specific cases and you can get advice from me or from other people who were in a similar situation.

This is a typical cohort-based course. No recordings, not self-paced, live only.

The course is projected to begin in April 2022, God willing, and will last for two months.

Course content

First, we’ll talk about stakeholders and their concerns. Then we’ll introduce a simple logical representation of data, and will use it as a foundation. Then we’ll discuss how logical representation turns into physical tables, and what drives the design of physical tables. Then we’ll introduce the idea of a data catalog, and will use it to discuss documenting; in particular, the difference between narrative and catalog. Finally, we’ll talk about extracting logical representation from the physical schema.

There will be two homework assignments. You will get a course cheatsheet at the end, to help you summarize what you’ve learned.

History of the course

Back in Fall 2020 I did a test run of this course. A dozen people from a certain tiny social network were the first listeners. Responses were positive, there was a lot of feedback, and the second-hour discussions were amazing.

Let’s get the party going.

Price

The price for the course is going to be $1000, and you can bring a plus one with you.

If you you would not be getting any value from the course, you’ll get your money back, no questions asked.

What next?

If you are interested, or if you have any questions or comments: hit Reply, and send me an email. I’m going to respond to your questions personally. You can also leave comments on Substack.

If you know someone who could be interested — forward this email to them.

P. S.: We'll return to our regularly scheduled newsletters next week. There are two topics pending from the previous year: the final post about database migrations, and the second post about flexible database schemas. Also, the “best ideas from the second half of 2021” post.

Minimal Modeling

Discussion about this post