Each data item (e.g. DataFrame) is simply appended to a data file.
An IndexEntry is appended to a corresponding index file. Each
IndexEntry contains the timestamp (e.g. key) of the data item, its
offset into the data file, the length of the data entry, and a CRC
of the data entry as well as a CRC of itself. It also contains
flags that can indicate if the corresponding data is compressed
and how it is compressed.
The CRCs in the index entry give us an atomicity guarantee - if
they are not present and correct, we treat it as if the entry
never existed.
In dictionary compression mode, the index file may be padded with
zeros (i.e. empty index entries). Thus empty index entries are
not considered to be corrupt, but we ignore such entries as they
do not point to any data.
Data and Index files are append-only and never modified (only ever
removed).
Data and Index files are sharded by SHARD_TIME - e.g. any one file
only contains data or index entries whose timestamps are congruent
modulo SHARD_TIME. This allows data and index files to be cleaned
up by just unlinking the files.