Modeling legal guardianship
I think I found a good example for dependent and pseudo-dependent data. There is something there that I can’t put my finger on, so I’m going to work through an example that seems to be relevant.
Let’s model parents and legal guardians. Different countries have different laws around it, but I hope that the basic model is very common. Our business requirements are:
Children have parents. Specifically, parents as recorded in child’s birth certificate.
Sometimes children need legal guardians, because parents are either dead, or lost custody of the child. (Non-children also need legal guardians sometimes, that simplifies thinking).
For simplicity, let’s assume that legal guardians are physical persons, such as some other relative, or members of a foster family.
There is a lot of additional complexity (I hate this word), such as the age of majority, but we’ll ignore this because it’s not relevant to the main topic.
The main topic will be explained in the “Table data” section.
Logical model
Of course, we have a Person anchor, both for children and adults.
Also, we have two links, one for parents and another one for legal guardians:
(This is written in the Minimal Modeling notation. The “=” sign means M:N cardinality. Sentences are formalized, they must include anchor names, and the words “several” or “once” to clarify cardinality.)
That seems to be the entire logical model!
Physical model
The physical model is very straightforward. We have a central “persons
” table, but we’re not interested in any of its fields, such as name, date of birth, etc. We’re only interested in ID, that could be just an integer.
CREATE TABLE persons (
id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
…
);
Both links have M:N cardinality, so we must use a separate two-column table for each.
CREATE TABLE parents (
parent_id INTEGER NOT NULL,
child_id INTEGER NOT NULL,
PRIMARY KEY (parent_id, child_id),
KEY (child_id)
);
CREATE TABLE legal_guardians (
guardian_id INTEGER NOT NULL,
dependent_id INTEGER NOT NULL,
PRIMARY KEY (guardian_id, dependent_id),
KEY (dependent_id)
);
No surprises here either.
Table data
The main query we would be interested in is: “Who are the legal guardians of that person?”
Let’s look at the most common case, a child (id=10) having parents (ID=2 and ID=3). Which rows do we insert into which tables?
Recording parent relationship is easy:
| Table: parents |
| parent_id | child_id |
| 2 | 10 |
| 3 | 10 |
Q: Do we need to insert anything into the “legal_guardians
” table?
Let’s look at the second most common case: suppose that parents died, and the grandparents (ID=5 and ID=6) were assigned as legal guardians. Here the parenting record did not change, but now we are very certain that we insert two rows into the “legal_guardians
” table:
| Table: legal_guardians |
| guardian_id | dependent_id |
| 5 | 10 |
| 6 | 10 |
Again, what does the main query look like? If legal guardians would always be assigned, the query would have been simply: “SELECT guardian_id FROM legal_guardians WHERE dependent_id = 10
”.
Otherwise, we’d have to write some sort of UNION query that combines both data sources.
At this point we can just go ahead and say that we must insert a copy of information about the parents into the “legal_guardians
” table. Then, if they die, we remove the records about them, and insert the records about the appointed legal guardians.
Counter-proposals
Here is where I struggle: I have a feeling that some people would argue against this, and insist that only appointed legal guardians should be recorded as such.
In other words, if we have a million children in the database, we’d expect to have around two million rows in the “parents” table. But if 1% of children have legal guardians, then we’d maybe expect to only have around 10000 rows in the “legal_guardians
” table.
There would be, of course, a separate set of tables that records a paper trail of such decisions. Like, we must store the record about a court session that appointed a legal guardian. There will be ~10000 such rows. That is a separate modeling question (but it gives us a hint on why we may have a different answer to the main question).
Here is another thing that in my opinion adds to the confusion. How do we answer the question “who is responsible for this child?”, if we ignore how it is stored in the database for a minute? We say something like “their parents, unless they are not there, and in that case — court-appointed legal guardians”.
This example shows that we think about parents and legal guardians as separate entities. Do we let this mental model affect the physical model of our tables? That’s the main question.
Why do we bother? I need to find an unambiguous way to teach this to pople new to data modeling. Lots of people enjoy providing the “it depends” answer to many questions, but I believe that we must go beyond that.
A different logical modeling approach: either/or/or
We could go back to the human-readable definition of the problem, and see that it can be modelled in a different way! Specifically, the part about legal guardians (the parents part is I think uncontroversial).
For that, let’s also remember that there are quite a lot of people who are simply responsible for themselves. So, a person could be:
Either responsible for themselves;
Or, having parents responsible for them (NB: this is not strictly tied to the age, because there is this “emancipation of minors” concept);
Or, having appointed legal guardians.
The direct logical model for that requires:
An attribute “who is responsible for this person?”, of type either/or/or;
Parenting link, as described above;
Legal guardians link, as described above, but tangled to the attribute value.
Here is the attribute:
Only if the value of this attribute is “legal_guardians”, do we consult the “legal_guardians
” table.
Conclusion
I wrote this post a few weeks ago and I’ve been thinking about this since then. I feel that I made a logical mistake in the first proposal, because now I clearly bias towards the “either/or/or” solution. I think that the problem in the first proposal is that it tries to deal with two different things (parents and legal guardians) in a somewhat polymorphic way. But I need to find a good explanation for that.
The more I think about all of that, the more it seems that explicit either/or/or approach is more useful for logical modeling than I thought. It may even “dissolve” when it’s implemented in the physical schema, but it still exists conceptually.
I’m going to write another post that directly illustrates either/or/or approach and causal dependencies between data attributes. The use case that I have in mind is tracking bad habits.
Maybe we can even go even further and say that polymorphism only exists in physical schema and does not exist in logical schema? (At least in Minimal Modeling.) If that’s true that would greatly simplify teaching.