Path dependence in relational design

Jul 2, 2024

Path dependence may be an important concern in relational table design.

7 Comments

Jul 3, 2024

I'm not sure if I really understood the example; both the primary key and a (missing, but certainly will be required) created_at timestamp would be excellent candidates for sorting of comments; Using a integer seq_num field is actually the worst possible implementation I can think of, as it will most certainly require reading a previous record to determine the value on every insert, and exposes sorting to applicational bugs (concurrency issues when inserting multiple comments at once, ensure the value is not negative, etc).

Expand full comment

Reply (3)

Alexey Makhotkin

Jul 6, 2024

I was thinking about this a bit more (inspired by a comment on lobsters), and I have a better example: take Github. There are repositories, and there are issues. Issue numbers are sequential, per-repository. Existing issues should never be renumbered. I hope that this use case is more persuasive.

Expand full comment

Reply (1)

Jul 17, 2024

sorry, just saw your comment today. There is no use for a surrogate key on the specifics of your example, but you are in a good path - a key that allows to link to a specific issue within a repository is a good example of a surrogate key (and far better than the one you have now). cheers!

Expand full comment

Alexey Makhotkin

Jul 3, 2024

> Using a integer seq_num field is actually the worst possible implementation I can think of, as it will most certainly require reading a previous record to determine the value on every insert

I'm not sure, why? "Worst possible implementation" — worse than what? I mention this in text: "we do not discuss how to generate seq_num when inserting a new comment: this is very much database-dependent.", but it just seems to be very straightforward:

Create a field issues.seq_num_value with default value 0. To insert a comment: begin transaction; update issues set seq_num_value = seq_num_value + 1 where id = ?; insert comment, reading the value of seq_num from that table; commit transaction.

The update statement will establish locking, so simultaneous inserts would wait for each other, and the value would be nicely incrementing. This is very lightweight transaction, it even locks per issue, so concurrent comments for different issues won't interfere...

Expand full comment

Reply (1)

Jul 4, 2024

> worse than what?

Worse than using a numeric primary key or a timestamp, as I mentioned.

> we do not discuss how to generate seq_num

But on an implementation, that is the meat of it, isn't it? The rest are academic details that bear little to no relevance, so the way you generate seq_num is crucial to understand if you have a good design or not.

> this is very much database-dependent

well, that is pretty much debatable, specially considering you're using mysql syntax, and not more generic sql.

>To insert a comment: begin transaction; (...)

By now, you probably realize the code you describe isnt right, and it does not work as expected. Regardless, look at it - you suggest multiple operations to perform that could be solved with one, with both better read and write performance, and less disk space used. Tell me again why this is a good idea?

> The update statement will establish locking, so simultaneous inserts would wait for each other, and the value would be nicely incrementing.

Oh boy. This is actually pretty much database dependent, but in most databases no, it doesn't work that way. It depends on the assumed transaction isolation level; as a quick example, mysql uses REPEATABLE READ by default and postgresql uses READ COMMITTED by default, so your transaction would actually have a different behaviour on both; As I understand, on mysql, two concurrent transactions would read the same value (because they are non-blocking) and insert based on the increment of that value, leading to duplicate seq_num. That's why pushing the actual complex part "under the rug" is not a good practice, and why I think this is not a great example for surrogate keys; This abstract schema design is the easy part - the complex part is understanding the tradeoffs of your design and how they impact your application.

Expand full comment

Alexey Makhotkin

Jul 3, 2024

hey,

it's not about sorting! It's about having a stable number for each comment, so that you could say "In comment number 5 you said this bug was fixed", etc. And you want that "comment number 5" to still be valid even if comment number four was deleted.

Expand full comment

Reply (1)

Jul 3, 2024

Yah, but thats what primary keys are for :) I understand now what you're trying to explain (a use case of surrogate keys), but I still think the example is not good, as it still presents all the problems above; however, I can't suggest you a better one that isn't overly complex for the goal at hand :)

Expand full comment

Minimal Modeling

Path dependence in relational design