Expand description
An append-only log for storing arbitrary variable length items.
variable::Journal
is an append-only log for storing arbitrary variable length data on disk. In
addition to replay, stored items can be directly retrieved given their section number and offset
within the section.
§Format
Data stored in Journal
is persisted in one of many Blobs within a caller-provided partition
.
The particular Blob
in which data is stored is identified by a section
number (u64
).
Within a section
, data is appended as an item
with the following format:
+---+---+---+---+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | ... | 8 | 9 |10 |11 |
+---+---+---+---+---+---+---+---+---+---+---+
| Size (u32) | Data | C(u32) |
+---+---+---+---+---+---+---+---+---+---+---+
C = CRC32(Size | Data)
To ensure data returned by Journal
is correct, a checksum (CRC32) is stored at the end of
each item. If the checksum of the read data does not match the stored checksum, an error is
returned. This checksum is only verified when data is accessed and not at startup (which would
require reading all data in Journal
).
§Open Blobs
Journal
uses 1 commonware-storage::Blob
per section
to store data. All Blobs
in a given
partition
are kept open during the lifetime of Journal
. If the caller wishes to bound the
number of open Blobs
, they can group data into fewer sections
and/or prune unused
sections
.
§Offset Alignment
In practice, Journal
users won’t store u64::MAX
bytes of data in a given section
(the max
Offset
provided by Blob
). To reduce the memory usage for tracking offsets within Journal
,
offsets are thus u32
(4 bytes) and aligned to 16 bytes. This means that the maximum size of
any section
is u32::MAX * 17 = ~70GB
bytes (the last offset item can store up to u32::MAX
bytes). If more data is written to a section
past this max, an OffsetOverflow
error is
returned.
§Sync
Data written to Journal
may not be immediately persisted to Storage
. It is up to the caller
to determine when to force pending data to be written to Storage
using the sync
method. When
calling close
, all pending data is automatically synced and any open blobs are dropped.
§Pruning
All data appended to Journal
must be assigned to some section
(u64
). This assignment
allows the caller to prune data from Journal
by specifying a minimum section
number. This
could be used, for example, by some blockchain application to prune old blocks.
§Replay
During application initialization, it is very common to replay data from Journal
to recover
some in-memory state. Journal
is heavily optimized for this pattern and provides a replay
method to produce a stream of all items in the Journal
in order of their section
and
offset
.
§Exact Reads
To allow for items to be fetched in a single disk operation, Journal
allows callers to specify
an exact
parameter to the get
method. This exact
parameter must be cached by the caller
(provided during replay
) and usage of an incorrect exact
value will result in undefined
behavior.
§Compression
Journal
supports optional compression using zstd
. This can be enabled by setting the
compression
field in the Config
struct to a valid zstd
compression level. This setting can
be changed between initializations of Journal
, however, it must remain populated if any data
was written with compression enabled.
§Example
use commonware_runtime::{Spawner, Runner, deterministic, buffer::PoolRef};
use commonware_storage::journal::variable::{Journal, Config};
use commonware_utils::NZUsize;
let executor = deterministic::Runner::default();
executor.start(|context| async move {
// Create a journal
let mut journal = Journal::init(context, Config{
partition: "partition".to_string(),
compression: None,
codec_config: (),
buffer_pool: PoolRef::new(NZUsize!(1024), NZUsize!(10)),
write_buffer: NZUsize!(1024 * 1024),
}).await.unwrap();
// Append data to the journal
journal.append(1, 128).await.unwrap();
// Close the journal
journal.close().await.unwrap();
});