Handling time with supersecond resolution
What is this substack about? Here are the highlights from the first 25 issues.
A commonly accepted wisdom in physical database modeling is that timestamps should be stored as UTC time. We completely agree with that. In this post though we’d like to discuss various interesting use-cases of time with the resolution of more than a second: minutes, hours, days, weeks, months, quarters and years.
Conferences, concerts, etc.
Events that happen in a physical space begin at a certain local time. At first glance, we could store the start time in a local timezone: it’s the most important information for people who are going to physically attend, even if they are currently in a different timezone and they need to plan their travel.
However, if there is a corresponding virtual event (live translation of a keynote speech) then we need to store the time in UTC (with the minute accuracy). If we know the visitor’s preferred timezone then we could automatically convert the time to their timezone (or just show both UTC and preferred time, to prevent all kinds of confusion).
For mixed-type events the most straightforward way to store the data would be both in UTC time (for the virtual part) and local time (for the physical part). This would be clear and unambiguous for the event organizers, but it may be considered a data duplication. To prevent that we could:
store the time in UTC;
store the timezone of a physical space where the event happens;
in the content management tool clearly show both the local time and the UTC time, so that the operator can double-check the time.
Showing the time in two (or more) time zones simultaneously could be a useful tool to make UX more reliable and prevent confusion in some scenarios.
Opening times of a restaurant, shop etc.
Suppose that we want to store the opening times of a shop. It is open from 10:00 to 18:00 every day except Sunday when it is closed. (Let’s not get into the question of how to model this: handling 24 hours opening time, seasonal changes, public holidays, overlapping midnight, etc., etc. We now focus on what “10:00” means.)
This time is a local time, and it seems that this is one of the cases where we must store it as a local time. The shop is going to open at 10:00 no matter if the daylight savings time is in force. So, the UTC time is going to change (and that’s the whole idea). Also, we don’t even have the date part here, only the time part!
I’ve seen hopelessly confused people trying to apply the “everything must be stored in UTC” advice to this use-case, and failing. It’s completely OK to store local time (in some cases)!
Laws and regulations often have a start date, e.g. “January 1st, 2020”. To activate the necessary branch of code, we would use the midnight of a start date, e.g. “2020-01-01 00:00:00”. This timestamp is actually in a local timezone that corresponds to the country where this regulation goes into force.
For example, if a German company hosts their e-commerce website in New York, the new code must be activated at “2019-12-31 18:00:00” (Eastern Time), because it is 6 hours away from the Central European Time.
At the same time, to explain to the user why, for example, the tax rate changed, we must tell them “Since 2020-01-01, tax rate in Germany changed”, and we must use the date in the German time zone (even though it’s still December 31 in New York).
If we need to store things like tax rates in a database table then we need to consider this aspect. (General handling of time in the context of e.g. contracts and obligations is a much more complicated topic that we won’t even mention here.)
Quarterly and monthly reporting
Suppose that we have a constant stream of purchases made by the website users. Purchases are stored in the appropriate table that has some kind of a “created_at” timestamp with a second-resolution (stored in UTC, of course!). Now we want to produce typical business reports: quarterly and monthly sales numbers. This is a seemingly trivial SQL operation:
SELECT SUM(price) FROM purchases WHERE created_at >= ‘2020-01-01 00:00:00’ AND created_at < ‘2020-04-01 00:00:00’
This query should show the total sales in 2020Q1, right? But in which timezone are the timestamps? When exactly does the quarter begin and end?
One possible solution is to agree on the reference timezone, so that people in the company offices around the world would have the same numbers when querying the same database. For example, this could be the time zone of a city where the company headquarters resides.
Speaking from experience, lack of a common reference point can cause long annoying query debugging sessions.
Representing quarters, months and weeks
Often we may want to maintain a pre-aggregates tables to quickly present variously aggregated data. The typical aggregation cadences are years, quarters, months and weeks. So for example we have a “quarterly_sales” table with the following two columns: “quarter” and “amount”.
The question is how to encode the quarter (e.g., 2019Q2) in the database table? Which data type to use?
How to encode the number of the week (e.g., week 35 of 2018)? The month (e.g. January 2016)?
One possible suggestion is to use an integer representation. Then 2019Q2 would be represented as the number 20192. Week 35 of 2018 would be represented as 201835. January 2016 — as 201601. Of course, if you have yearly pre-aggregated data, you can just use the year number.
This encoding is simple, efficient and human-readable.
After this more practical intermission we’ll return next week to our regularly scheduled “Data migrations” series.