1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
//! # Log Compaction
//!
//! This module provides an API for writing log compaction files that aggregate
//! multiple commit JSON files into single compacted files. This improves performance
//! by reducing the number of individual log files that need to be processed during
//! log replay operations.
//!
//! ## Overview
//!
//! Log compaction creates files with the naming pattern
//! `{start_version}.{end_version}.compacted.json` that contain the reconciled actions from all
//! commit files in the specified version range. Only commit/compaction files that intersect with
//! [start_version, end_version] are processed. Note that `end_version` must be greater than
//! `start_version` (equal versions are not allowed). This is similar to checkpoints but operates on
//! a subset of versions rather than the entire table.
//!
//! ## Usage
//!
//! The log compaction API follows a similar pattern to the checkpoint API:
//!
//! 1. Create a [`LogCompactionWriter`] using [`crate::Snapshot::log_compaction_writer`] to compact
//! the log from a given start_version to end_version (inclusive)
//! 2. Get the compaction path from [`LogCompactionWriter::compaction_path`]
//! 3. Get the compaction data from [`LogCompactionWriter::compaction_data`]
//! 4. Write the data to the path in cloud storage (engine-specific)
//!
//! ## Example
//!
//! ```no_run
//! # use std::sync::Arc;
//! # use buoyant_kernel as delta_kernel;
//! # use delta_kernel::{ActionReconciliationIterator, LogCompactionWriter};
//! # use delta_kernel::{Engine, Snapshot, DeltaResult, Error, FileMeta};
//! # use url::Url;
//!
//! // Engine-specific function to write compaction data
//! fn write_compaction_file(path: &Url, data: ActionReconciliationIterator) -> DeltaResult<FileMeta> {
//! // In a real implementation, this would write the data to cloud storage
//! todo!("Write data batches to storage at path: {}", path)
//! }
//!
//! # fn example(engine: &dyn Engine) -> DeltaResult<()> {
//! // Create a snapshot for the table
//! let table_root = Url::parse("file:///path/to/table")?;
//! let snapshot = Snapshot::builder_for(table_root).build(engine)?;
//!
//! // Create a log compaction writer for versions 10-20
//! let mut writer = snapshot.log_compaction_writer(10, 20)?;
//!
//! let compaction_data = writer.compaction_data(engine)?;
//! let compaction_path = writer.compaction_path();
//!
//! // Write the compaction data to cloud storage
//! let _metadata: FileMeta = write_compaction_file(compaction_path, compaction_data)?;
//! # Ok(())
//! # }
//! ```
//!
//! ## When to Use Log Compaction
//!
//! Log compaction is beneficial when:
//! - Table has many small commit files that slow down log replay
//! - Reduce the number of files without creating a full checkpoint
//! - Optimize specific version ranges that are frequently accessed
//!
//! The [`should_compact`] utility function can help determine when compaction is appropriate
//! based on version intervals.
//!
//! Please see <https://github.com/delta-io/delta/blob/master/PROTOCOL.md#log-compaction-files>
//! for more details
//!
//! ## Relationship to Checkpoints
//!
//! - **Checkpoints**: Aggregate the entire table state up to a specific version
//! - **Log Compaction**: Aggregates only a specific range of commit files
//! - Both use similar action reconciliation logic but serve different use cases
use ;
use crate;
use crate;
pub use ;
/// Schema for extracting relevant actions from log files for compaction.
/// CommitInfo is excluded as it's not needed in compaction files.
static COMPACTION_ACTIONS_SCHEMA: = new;