pub struct CompactionManager { /* private fields */ }Expand description
Manages Parquet file compaction for optimal storage and query performance.
Step 4 of the sustainable data strategy moved compaction from a
global-stream pass to a per-tenant pass. Each invocation iterates
the tenants discovered under <storage_dir>/<tenant>/... and
emits a snapshot.<tenant>.<from>-<to>.parquet per qualifying
chunk. Snapshot files are written atomically (tmp + rename) and
their constituent raw files are removed only after the rename
succeeds — so a mid-compaction crash leaves data intact.
Implementations§
Source§impl CompactionManager
impl CompactionManager
Sourcepub fn new(storage_dir: impl Into<PathBuf>, config: CompactionConfig) -> Self
pub fn new(storage_dir: impl Into<PathBuf>, config: CompactionConfig) -> Self
Create a new compaction manager
Sourcepub fn should_compact(&self) -> bool
pub fn should_compact(&self) -> bool
Check if compaction should run
Sourcepub fn compact(&self) -> Result<CompactionResult>
pub fn compact(&self) -> Result<CompactionResult>
Perform compaction across every discovered tenant.
Iterates the tenants under <storage_dir>/<tenant>/..., calls
compact_tenant for each, and aggregates the results. Step 4
of the sustainable data strategy: per-tenant compaction
instead of global, keyed off the per-tenant directory tree
Step 1 introduced.
Errors compacting one tenant are logged but don’t abort the pass — other tenants still get compacted. The aggregate result reflects what actually completed.
Sourcepub fn compact_tenant(&self, tenant_id: &str) -> Result<CompactionResult>
pub fn compact_tenant(&self, tenant_id: &str) -> Result<CompactionResult>
Compact one tenant’s raw event files into a single snapshot file under that tenant’s partition.
Per-tenant pipeline:
- List
<storage>/<tenant>/...*.parquetexcluding existingsnapshot.*files. - Apply the configured strategy (size / time / full) to pick candidate raw files.
- If enough candidates: read events, sort by timestamp,
atomically write
snapshot.<tenant>.<from>-<to>.parquetviaParquetStorage::write_atomic_parquet. - After the snapshot rename succeeds, delete the
constituent raw files. Crash between snapshot and delete
leaves both on disk; the dedupe in
append_loaded_eventkeeps memory consistent on next load. A future commit can record the constituent file list in the snapshot’s metadata and let a cleanup pass finish the deletion.
Sourcepub fn stats(&self) -> CompactionStats
pub fn stats(&self) -> CompactionStats
Get compaction statistics
Sourcepub fn config(&self) -> &CompactionConfig
pub fn config(&self) -> &CompactionConfig
Get configuration
Sourcepub fn compact_now(&self) -> Result<CompactionResult>
pub fn compact_now(&self) -> Result<CompactionResult>
Trigger manual compaction