Expand description
This module implements the API for customized Delta checkpoint writes, where the
caller drives the write themselves. The entry point is Snapshot::create_checkpoint_writer.
If you want an all-in-one API that handles writing the checkpoint, use
Snapshot::checkpoint instead.
§Checkpoint Types and Selection Logic
This API supports two checkpoint types, selected based on table features:
| Table Feature | Resulting Checkpoint Type | Description |
|---|---|---|
| No v2Checkpoints | Single-file Classic-named V1 | Follows V1 specification without CheckpointMetadata action |
| v2Checkpoints | Classic-named V2 (with or without sidecars) | Follows V2 specification with CheckpointMetadata action while maintaining backward compatibility via classic naming |
For more information on the V1/V2 specifications, see the following protocol section: https://github.com/delta-io/delta/blob/master/PROTOCOL.md#checkpoint-specs
§Architecture
CheckpointWriter- Core component that manages the checkpoint creation workflowActionReconciliationIterator- Iterator over the checkpoint data to be written
§Usage
The following steps outline the process of creating a checkpoint:
- Create a
CheckpointWriterusingSnapshot::create_checkpoint_writer - Get the checkpoint path from
CheckpointWriter::checkpoint_path - Get the checkpoint data from
CheckpointWriter::checkpoint_data - Write the data to the path in object storage (engine-specific)
- Collect metadata (
FileMeta) from the write operation - Build a
LastCheckpointHintStatsfrom the exhausted iterator state - Pass the
LastCheckpointHintStatstoCheckpointWriter::finalize
fn write_checkpoint_file(path: Url, data: ActionReconciliationIterator) -> DeltaResult<FileMeta> {
todo!() /* engine-specific logic to write data to object storage*/
}
let engine: &dyn Engine = todo!(); /* create engine instance */
// Create a snapshot for the table at the version you want to checkpoint
let url = delta_kernel::try_parse_uri("./tests/data/app-txn-no-checkpoint")?;
let snapshot = Snapshot::builder_for(url).build(engine)?;
// Create a checkpoint writer from the snapshot
let writer = snapshot.create_checkpoint_writer(engine)?;
// Get the checkpoint path and data
let checkpoint_path = writer.checkpoint_path()?;
let checkpoint_data = writer.checkpoint_data(engine)?;
// Get the iterator state before consuming the data
let state = checkpoint_data.state();
// Write the checkpoint data to the object store and collect metadata
// The write function consumes the iterator, dropping its Arc reference to the state.
let metadata: FileMeta = write_checkpoint_file(checkpoint_path, checkpoint_data)?;
/* IMPORTANT: All data must be written before finalizing the checkpoint */
// Build the [`LastCheckpointHintStats`] from the exhausted iterator state
let state = std::sync::Arc::into_inner(state)
.ok_or(Error::internal_error("checkpoint state Arc still has other references"))?;
let last_checkpoint_stats =
delta_kernel::checkpoint::LastCheckpointHintStats::from_reconciliation_state(
state,
metadata.size,
0, /* num_sidecars */
)?;
// Finalize the checkpoint by passing the stats
writer.finalize(engine, &last_checkpoint_stats)?;
§Warning
Multi-part (V1) checkpoints are DEPRECATED and UNSAFE.
§Note
We currently do not plan to support UUID-named V2 checkpoints, since S3’s put-if-absent semantics remove the need for UUIDs to ensure uniqueness. Supporting only classic-named checkpoints avoids added complexity, such as coordinating naming decisions between kernel and engine, and handling coexistence with legacy V1 checkpoints. If a compelling use case arises in the future, we can revisit this decision.
Structs§
- Checkpoint
Writer - Orchestrates the process of creating a checkpoint for a table.
- Last
Checkpoint Hint Stats - Information about a freshly-written checkpoint. Pass it to
CheckpointWriter::finalizeto produce the_last_checkpointhint file.
Enums§
- Checkpoint
Spec - Specifies the checkpoint format and behavior.
- V2Checkpoint
Config - Configuration for V2 checkpoints.
Constants§
- DEFAULT_
FILE_ ACTIONS_ PER_ SIDECAR_ HINT - Default value for
V2CheckpointConfig::WithSidecar::file_actions_per_sidecar_hint. It’s the suggested upper bound of file actions (addandremove) per sidecar file when the caller does not provide an explicit hint.