Expand description
An append-only log for storing arbitrary variable length items.
variable::Journal
is an append-only log for storing arbitrary variable length data on disk. In
addition to replay, stored items can be directly retrieved given their section number and offset
within the section.
§Format
Data stored in Journal
is persisted in one of many Blobs within a caller-provided partition
.
The particular Blob
in which data is stored is identified by a section
number (u64
).
Within a section
, data is appended as an item
with the following format:
+---+---+---+---+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | ... | 8 | 9 |10 |11 |
+---+---+---+---+---+---+---+---+---+---+---+
| Size (u32) | Data | C(u32) |
+---+---+---+---+---+---+---+---+---+---+---+
C = CRC32(Data)
To ensure data returned by Journal
is correct, a checksum (CRC32) is stored at the end of
each item. If the checksum of the read data does not match the stored checksum, an error is
returned. This checksum is only verified when data is accessed and not at startup (which would
require reading all data in Journal
).
§Open Blobs
Journal
uses 1 commonware-storage::Blob
per section
to store data. All Blobs
in a given
partition
are kept open during the lifetime of Journal
. If the caller wishes to bound the
number of open Blobs
, they can group data into fewer sections
and/or prune unused
sections
.
§Offset Alignment
In practice, Journal
users won’t store u64::MAX
bytes of data in a given section
(the max
Offset
provided by Blob
). To reduce the memory usage for tracking offsets within Journal
,
offsets are thus u32
(4 bytes) and aligned to 16 bytes. This means that the maximum size of
any section
is u32::MAX * 17 = ~70GB
bytes (the last offset item can store up to u32::MAX
bytes). If more data is written to a section
past this max, an OffsetOverflow
error is
returned.
§Sync
Data written to Journal
may not be immediately persisted to Storage
. It is up to the caller
to determine when to force pending data to be written to Storage
using the sync
method. When
calling close
, all pending data is automatically synced and any open blobs are closed.
§Pruning
All data appended to Journal
must be assigned to some section
(u64
). This assignment
allows the caller to prune data from Journal
by specifying a minimum section
number. This
could be used, for example, by some blockchain application to prune old blocks.
§Replay
During application initialization, it is very common to replay data from Journal
to recover
some in-memory state. Journal
is heavily optimized for this pattern and provides a replay
method that iterates over multiple sections
concurrently in a single stream.
§Skip Reads
Some applications may only want to read the first n
bytes of each item during replay
. This
can be done by providing a prefix
parameter to the replay
method. If prefix
is provided,
Journal
will only return the first prefix
bytes of each item and “skip ahead” to the next
item (computing the offset using the read size
value).
Reading only the prefix
bytes of an item makes it impossible to compute the checksum of an
item. It is up to the caller to ensure these reads are safe.
§Exact Reads
To allow for items to be fetched in a single disk operation, Journal
allows callers to specify
an exact
parameter to the get
method. This exact
parameter must be cached by the caller
(provided during replay
) and usage of an incorrect exact
value will result in undefined
behavior.
§Example
use commonware_runtime::{Spawner, Runner, deterministic::Executor};
use commonware_storage::journal::variable::{Journal, Config};
use prometheus_client::registry::Registry;
use std::sync::{Arc, Mutex};
let (executor, context, _) = Executor::default();
executor.start(async move {
// Create a journal
let mut journal = Journal::init(context, Config{
registry: Arc::new(Mutex::new(Registry::default())),
partition: "partition".to_string()
}).await.unwrap();
// Append data to the journal
journal.append(1, "data".into()).await.unwrap();
// Close the journal
journal.close().await.unwrap();
});