Skip to main content

Crate sochdb_storage

Crate sochdb_storage 

Source
Expand description

SochDB Storage Layer

Log-Structured Column Store (LSCS) with transaction-aware WAL for TOON-native data.

§Runtime Modes

This crate supports two runtime modes:

§Embedded Sync Mode (like SQLite)

For embedded deployments without async runtime:

sochdb-storage = { version = "...", default-features = false, features = ["embedded-sync"] }

Benefits:

  • ~500KB smaller binary
  • No async runtime overhead
  • Simpler embedded integration

§Async Mode (default, for servers)

For server deployments with async I/O:

sochdb-storage = { version = "..." }  # async enabled by default

Benefits:

  • Better scalability for concurrent connections
  • Non-blocking I/O for server workloads

§Novel Components

  • LSCS (lscs): Log-Structured Column Store - columnar variant of LSM with schema-aware compression and column-aware compaction for reduced write amplification.

  • Transaction WAL (txn_wal): ACID-compliant Write-Ahead Log with transaction boundaries, commit/abort markers, and crash recovery.

  • StorageEngine Trait (storage_engine): Pluggable storage backend abstraction enabling 80% I/O reduction for columnar projections (Task 1).

  • Page Manager (page_manager): TOON file format with magic header and O(1) page allocation (Task 8).

  • Columnar Compression (columnar_compression): Type-aware encoding with dictionary, RLE, and delta compression for 2-4× storage reduction (Task 9).

§Utility Components

  • Bloom Filters (bloom): Probabilistic existence checks
  • Block Checksums (block_checksum): Data integrity validation
  • Compression (compression): LZ4/Zstd compression
  • Sketches (sketches): Approximate algorithms (HyperLogLog, CountMin, DDSketch)

Re-exports§

pub use columnar_compression::ColumnEncoder;
pub use columnar_compression::DeltaEncoder;
pub use columnar_compression::DictionaryEncoder;
pub use columnar_compression::EncodingStats;
pub use columnar_compression::EncodingType;
pub use columnar_compression::RleEncoder;
pub use learned_index_integration::HybridIndex;
pub use learned_index_integration::IndexManager;
pub use learned_index_integration::IndexType;
pub use learned_index_integration::KeyStats;
pub use learned_index_integration::PointLookupExecutor;
pub use lscs::ColumnDef;
pub use lscs::ColumnGroup;
pub use lscs::ColumnType;
pub use lscs::ColumnarMemtable;
pub use lscs::Lscs;
pub use lscs::LscsConfig;
pub use lscs::LscsRecoveryStats;
pub use lscs::LscsStats;
pub use lscs::TableSchema;
pub use mvcc_snapshot::MvccStore;
pub use mvcc_snapshot::Snapshot as MvccSnapshot;
pub use mvcc_snapshot::Timestamp;
pub use mvcc_snapshot::TransactionManager;Deprecated
pub use mvcc_snapshot::TxnId;
pub use mvcc_snapshot::TxnStatus;
pub use mvcc_snapshot::VersionChain;
pub use mvcc_snapshot::VersionInfo;
pub use page_manager::DEFAULT_PAGE_SIZE;
pub use page_manager::DbHeader;
pub use page_manager::FORMAT_VERSION;
pub use page_manager::FreePageHeader;
pub use page_manager::PageId;
pub use page_manager::PageManager;
pub use page_manager::PageManagerStats;
pub use page_manager::PageType;
pub use page_manager::SOCHDB_MAGIC;
pub use storage_engine::ColumnId;
pub use storage_engine::ColumnIterator;
pub use storage_engine::Row;
pub use storage_engine::RowId;
pub use storage_engine::StorageEngine;
pub use storage_engine::StorageEngineType;
pub use storage_engine::StorageStats;
pub use storage_engine::TxnHandle;
pub use storage_engine::open_storage_engine;
pub use transaction::DurabilityLevel;
pub use transaction::IsolationLevel;
pub use transaction::RecoveryStats as TxnRecoveryStats;
pub use transaction::TransactionCoordinator;
pub use transaction::TransactionHandle;
pub use txn_wal::CrashRecoveryStats;
pub use txn_wal::TxnWal;
pub use txn_wal::TxnWalBuffer;
pub use txn_wal::TxnWalEntry;
pub use txn_wal::TxnWalStats;
pub use wal_integration::GroupCommitBuffer;
pub use wal_integration::MvccTransactionManager;
pub use wal_integration::RecoveryStats;
pub use wal_integration::Transaction;
pub use wal_integration::TxnState;
pub use wal_integration::WalStorageManager;
pub use adaptive_learned_index::AdaptiveLearnedIndex;
pub use adaptive_learned_index::LearnedIndexStats;
pub use adaptive_learned_index::PiecewiseLinearModel;
pub use adaptive_memtable::AdaptiveMemtableConfig;
pub use adaptive_memtable::AdaptiveMemtableSizer;
pub use adaptive_memtable::AdaptiveMemtableStats;
pub use adaptive_memtable::DEFAULT_BASE_SIZE;
pub use adaptive_memtable::MAX_MEMTABLE_SIZE;
pub use adaptive_memtable::MIN_MEMTABLE_SIZE;
pub use batch_wal::BatchAccumulator;
pub use batch_wal::BatchedWalReader;
pub use batch_wal::BatchedWalStats;
pub use batch_wal::BatchedWalWriter;
pub use batch_wal::ConcurrentBatchedWal;
pub use batch_wal::DEFAULT_MAX_BATCH_BYTES;
pub use batch_wal::DEFAULT_MAX_BATCH_SIZE;
pub use clr_learned_index::ClrIndex;
pub use clr_learned_index::ClrLookupResult;
pub use clr_learned_index::ClrStats;
pub use clr_learned_index::IndexedSortedRun;
pub use key_buffer::ArenaKey;
pub use key_buffer::ArenaKeyHandle;
pub use key_buffer::BatchKeyGenerator;
pub use key_buffer::InternedTablePrefix;
pub use key_buffer::KeyArena;
pub use key_buffer::KeyBuffer;
pub use key_buffer::MAX_KEY_LENGTH;
pub use lockfree_memtable::HazardDomain;
pub use lockfree_memtable::INLINE_VALUE_SIZE;
pub use lockfree_memtable::LockFreeMemTable;
pub use lockfree_memtable::LockFreeVersion;
pub use lockfree_memtable::LockFreeVersionChain;
pub use lockfree_memtable::ValueStorage;
pub use packed_row::PackedColumnDef;
pub use packed_row::PackedColumnType;
pub use packed_row::PackedRow;
pub use packed_row::PackedRowBuilder;
pub use packed_row::PackedTableSchema;
pub use backend::LocalFsBackend;
pub use backend::ObjectMetadata;
pub use backend::StorageBackend;
pub use backup::BackupManager;
pub use backup::BackupMetadata;
pub use block_checksum::BlockChecksumConfig;
pub use block_checksum::BlockChecksumStats;
pub use block_checksum::BlockType as BlockChecksumType;
pub use block_checksum::BlockWriter;
pub use block_checksum::ChecksummedBlock;
pub use bloom::BlockedBloomFilter;
pub use bloom::BloomFilter;
pub use bloom::LevelAdaptiveFPR;
pub use bloom::UnifiedBloomFilter;
pub use compression::CompressionEngine;
pub use compression::CompressionStats;
pub use compression::StorageTier;
pub use manifest::FileMetadata;
pub use manifest::LsmState;
pub use manifest::Manifest;
pub use manifest::VersionEdit;
pub use memory::MemoryBudget;
pub use memory::MemoryTracker;
pub use memory::WriteBufferManager;
pub use memory::WriteBufferStats;
pub use mvcc_new::ColumnGroupRef;
pub use mvcc_new::ReadVersion;
pub use mvcc_new::Snapshot;
pub use mvcc_new::SnapshotGuard;
pub use mvcc_new::VersionGuard;
pub use mvcc_new::VersionSet;
pub use mvcc_new::VersionSetStats;
pub use mvcc_new::VersionSetStatsSnapshot;
pub use payload::CompressionType;
pub use payload::PayloadStats;
pub use payload::PayloadStore;
pub use sketches::AdaptiveSketch;
pub use sketches::CountMinSketch;
pub use sketches::DDSketch;
pub use sketches::ExponentialHistogram;
pub use sketches::HyperLogLog;
pub use two_level_index::BlockIndexEntry;
pub use two_level_index::BlockIndexReader;
pub use two_level_index::FencePointer;
pub use two_level_index::TemporalKey;
pub use two_level_index::TwoLevelIndex;
pub use validation::SSTableValidator;
pub use validation::validate_sstable_file;
pub use durable_storage::ArenaMvccMemTable;
pub use durable_storage::DurableStorage;
pub use durable_storage::EphemeralHandle;
pub use durable_storage::MvccMemTable;
pub use durable_storage::TransactionMode;
pub use mvcc_concurrent::ConcurrentMvcc;
pub use mvcc_concurrent::ConcurrentVersionChain;
pub use mvcc_concurrent::ConcurrentVersionEntry;
pub use mvcc_concurrent::HlcTimestamp;
pub use mvcc_concurrent::ReaderSlot;
pub use mvcc_concurrent::VersionStore;
pub use mvcc_concurrent::VersionStoreStats;
pub use mvcc_concurrent::WriterGuard;
pub use compaction_policy::CompactionConfig;
pub use compaction_policy::CompactionFile;
pub use compaction_policy::CompactionJob;
pub use compaction_policy::CompactionPicker;
pub use compaction_policy::CompactionPriority;
pub use compaction_policy::CompactionReason;
pub use compaction_policy::CompactionState;
pub use compaction_policy::CompactionStats;
pub use compaction_policy::CompactionStrategy;
pub use compaction_policy::LeveledCompactionPicker;
pub use compaction_policy::RetentionConfig;
pub use compaction_policy::UniversalCompactionPicker;
pub use compaction_policy::VersionPruner;
pub use concurrent_art::ConcurrentART;
pub use cow_btree::BTreeEntry;
pub use cow_btree::BTreeSnapshot;
pub use cow_btree::CowBTree;
pub use cow_btree::Node;
pub use cow_btree::SearchResult;
pub use epoch_mvcc::CommitResult;
pub use epoch_mvcc::EpochManager;
pub use epoch_mvcc::EpochMvccStore;
pub use epoch_mvcc::EpochSnapshot;
pub use epoch_mvcc::EpochTransaction;
pub use epoch_mvcc::EpochVersionChain;
pub use epoch_mvcc::GcStats;
pub use epoch_mvcc::StoreStats;
pub use epoch_mvcc::VersionEntry;
pub use lazy_namespace::LazyNamespaceConfig;
pub use lazy_namespace::LazyNamespaceTable;
pub use object_store_tier::ObjectStoreTier;
pub use object_store_tier::ObjectStoreTierConfig;
pub use object_store_tier::SegmentDescriptor;
pub use optimized_scan::EntrySource;
pub use optimized_scan::FileRange;
pub use optimized_scan::LevelFiles;
pub use optimized_scan::RangeScanner;
pub use optimized_scan::ScanConfig;
pub use optimized_scan::ScanStats;
pub use optimized_scan::TournamentTree;
pub use optimized_scan::VersionedEntry;
pub use page_cache::CacheStats;
pub use page_cache::CachedPage;
pub use page_cache::ClockProCache;
pub use page_cache::PageId as CachePageId;
pub use page_cache::PageState;
pub use row_format::Slot;
pub use row_format::SlotRow;
pub use row_format::SlotRowArena;
pub use row_format::SlotRowFlags;
pub use row_format::SlotRowHandle;
pub use sstable::BlockBuilder;
pub use sstable::BlockCache;
pub use sstable::BlockHandle;
pub use sstable::BlockIterator;
pub use sstable::BlockType;
pub use sstable::BloomFilterPolicy;
pub use sstable::FilterPolicy;
pub use sstable::FilterReader;
pub use sstable::Footer;
pub use sstable::Header;
pub use sstable::ReadOptions;
pub use sstable::RibbonFilterPolicy;
pub use sstable::SSTable;
pub use sstable::SSTableBuilder;
pub use sstable::SSTableBuilderOptions;
pub use sstable::SSTableBuilderResult;
pub use sstable::SSTableFormat;
pub use sstable::Section;
pub use sstable::SectionType;
pub use sstable::TableMetadata;
pub use sstable::XorFilterPolicy;
pub use tiered_memtable::HotEntry;
pub use tiered_memtable::SortedBatch;
pub use tiered_memtable::TieredMemTable;
pub use vectorized_scan::ColumnVector;
pub use vectorized_scan::ComparisonOp;
pub use vectorized_scan::DEFAULT_BATCH_SIZE;
pub use vectorized_scan::Int64Comparison;
pub use vectorized_scan::SimdVisibilityFilter;
pub use vectorized_scan::SoaBatch;
pub use vectorized_scan::SoaScanIterator;
pub use vectorized_scan::SoaScanStats;
pub use vectorized_scan::SoaSource;
pub use vectorized_scan::StreamingScanIterator;
pub use vectorized_scan::ValueHandle;
pub use vectorized_scan::VectorBatch;
pub use vectorized_scan::VectorPredicate;
pub use vectorized_scan::VectorizedScanConfig;
pub use vectorized_scan::VectorizedScanStats;
pub use vectorized_scan::VersionedSlice;
pub use version_set::FileMetadata as VersionFileMetadata;
pub use version_set::ImmutableMemTable;
pub use version_set::ImmutableMemTableRef;
pub use version_set::LevelMetadata;
pub use version_set::SuperVersion;
pub use version_set::SuperVersionHandle;
pub use version_set::VersionSet as CowVersionSet;
pub use wal_segment::CheckpointRecord;
pub use wal_segment::RecoveryIterator;
pub use wal_segment::SegmentConfig;
pub use wal_segment::SegmentHeader;
pub use wal_segment::SegmentMetadata;
pub use wal_segment::SegmentStats;
pub use wal_segment::WalEntry;
pub use wal_segment::WalSegmentManager;
pub use zero_copy_serde::FORMAT_VERSION as SERDE_FORMAT_VERSION;
pub use zero_copy_serde::FieldDescriptor;
pub use zero_copy_serde::HEADER_SIZE as SERDE_HEADER_SIZE;
pub use zero_copy_serde::MmapWalReader;
pub use zero_copy_serde::SerdeStats;
pub use zero_copy_serde::WalBatchReader;
pub use zero_copy_serde::WalBatchWriter;
pub use zero_copy_serde::WalEntryBuilder;
pub use zero_copy_serde::WalEntryHeader;
pub use zero_copy_serde::WalEntryReader;
pub use zero_copy_serde::WalEntryType;
pub use zero_copy_serde::ZERO_COPY_MAGIC;
pub use zero_copy_serde::ZeroCopyHeader;
pub use txn_arena::ArenaWriteSet;
pub use txn_arena::BytesRef;
pub use txn_arena::KeyFingerprint;
pub use txn_arena::TxnArena;
pub use txn_arena::TxnWriteBuffer;
pub use txn_arena::WriteOp;
pub use dirty_tracking::BatchedDirtyTracker;
pub use dirty_tracking::DirtyEvent;
pub use dirty_tracking::DirtyTrackingStats;
pub use dirty_tracking::TxnDirtyBuffer;
pub use index_policy::BalancedTableIndex;
pub use index_policy::IndexPolicy;
pub use index_policy::SortedRun;
pub use index_policy::TableIndexConfig;
pub use index_policy::TableIndexRegistry;
pub use queue_index::CompositeQueueKey;
pub use queue_index::QueueIndex;
pub use queue_index::QueueIndexConfig;
pub use queue_index::QueueIndexStats;
pub use queue_index::QueueTableRegistry;
pub use cdc::CdcConfig;
pub use cdc::CdcEmitter;
pub use cdc::CdcError;
pub use cdc::CdcEvent;
pub use cdc::CdcLog;
pub use cdc::CdcOperation;
pub use cdc::CdcSubscriber;
pub use database::ColumnDef as DbColumnDef;
pub use database::ColumnType as DbColumnType;
pub use database::ColumnarQueryResult;
pub use database::Database;
pub use database::DatabaseConfig;
pub use database::GroupCommitSettings;
pub use database::QueryBuilder;
pub use database::QueryResult;
pub use database::QueryRowIterator;
pub use database::RecoveryStats as DbRecoveryStats;
pub use database::Stats as DbStats;
pub use database::SyncMode;
pub use database::TableSchema as DbTableSchema;
pub use database::TxnHandle as KernelTxnHandle;
pub use database::VectorSearchResult;

Modules§

actor
Actor-Based Connection Manager
adaptive_learned_index
Adaptive Learned Index with Bounded Error
adaptive_memtable
Adaptive Memtable Sizing with Memory Pressure Feedback
admission_control
Admission Control with Explicit Cost Model
backend
Storage backend abstraction
backup
Backup and restore functionality for SochDB database
batch_wal
Batched WAL with Vectored I/O
block_checksum
Block-Level CRC32C Checksums
bloom
Bloom filter for fast negative lookups
cdc
WAL-Derived Change Data Capture (CDC) Engine
clr_learned_index
Compact Linear Regression (CLR) Learned Index (Task 3)
columnar_compression
Columnar Compression with Type-Aware Encoding (Task 9)
compaction_policy
Compaction Policy for Version and Tombstone Pruning
compression
Storage compression and optimization module
concurrent_art
Concurrent Adaptive Radix Tree (ART) for Lock-Free Memtable
correctness_testing
Correctness Testing Framework
cow_btree
Copy-on-Write B-Tree Index (Recommendation 5)
database
SochDB Database Kernel
deferred_index
Deferred Sorted Index (Recommendation 2: LSM-Style Batch Compaction)
dict_compression
Dictionary-Based Compression
direct_io
Direct I/O Support for Cache-Bypass Scenarios
dirty_tracking
Batched Dirty Tracking with MPSC Queue
durability_contract
Durability Contract Hardening
durable_storage
Durable Storage Layer
epoch_arena
Epoch-Partitioned Key Arena (Task 1)
epoch_mvcc
Epoch-Based MVCC (Recommendation 7)
ffi
generational_slab
Generational Slab Allocator (Task 5)
group_commit
Event-Driven Group Commit Buffer
hlc
Hybrid Logical Clock (HLC) for Monotonic Commit Timestamps
hybrid_store
Adaptive Hybrid Storage (AHS) - PAX Block Layout
index_policy
Per-Table Index Policy + Scan-Optimized Structure
io_isolation
I/O Isolation Policy
io_uring
io_uring Backend for Linux Async I/O
ipc
IPC Protocol with Multiplexing and Streaming
ipc_server
Unix Domain Socket IPC Server for SochDB
key_buffer
Cache-Line Aligned Key Buffer with Stack Allocation
lazy_namespace
Per-namespace lazy hydrate/evict — resident memory tracks active tenants.
learned_index_integration
Learned Index Integration (Task 5)
lock
Advisory File Locking for Database Exclusivity
lockfree_epoch
Lock-Free Epoch Tracking via Slot Array (Task 3)
lockfree_memtable
Lock-Free MemTable with Hazard Pointer Protection
lscs
Log-Structured Column Store (LSCS)
manifest
LSM MANIFEST File Implementation (Gap #13 Fix)
memory
Memory pressure handling and resource limits
mvcc_concurrent
Concurrent MVCC for Multi-Reader Single-Writer Embedded Mode
mvcc_new
MVCC Version Management for LSCS
mvcc_snapshot
MVCC Snapshot Plumbing (Task 3)
namespace
Lightweight Namespace Routing + On-Disk Layout (Task 3)
object_store_tier
Object-storage-native cold tier for immutable index segments.
optimized_scan
Optimized Range Scan with O(log n + k) Asymptotics
packed_row
Packed Row Format for Unified Row Storage
page_cache
Application-Level Page Cache with Clock-Pro (Recommendation 8)
page_manager
Page-Based File Layout with Database Header (Task 8)
parallel_merge
Parallel K-Way Merge for Compaction
payload
Payload storage for variable-length data
polymorphic_value
Polymorphic Value Encoding with Adaptive Compression (Task 12)
prefetch
Memory Prefetching for Range Scans
queue_index
Queue-Optimized Index Policy
row_format
Slot-Based Columnar Row Storage (PAX/Hybrid Format) - Recommendation 1
shard_coalesced
Shard-Coalesced Batch DashMap with Prefetch Pipelining (Task 6)
sketches
Probabilistic Data Structures for Streaming Analytics
ssi
Serializable Snapshot Isolation (SSI) Implementation
ssi_scaling
SSI Scaling Guardrails
sstable
Block-Oriented SSTable Format
storage_engine
StorageEngine Trait Abstraction (Task 1)
stratified_skiplist
Stratified SkipList with Deferred LSM Promotion (Task 2)
streaming_iterator
Streaming Iterator Architecture for Scans
supervisor
Supervised background workers.
tiered_memtable
Tiered SkipMap Elimination (Recommendation 3)
tournament_tree
Tournament Tree (Loser Tree) for K-Way Merge
transaction
Unified Transaction Coordinator
two_level_index
Two-Level Index for SSTables
txn_arena
Transaction-Scoped Arena with Zero-Copy Key/Value Plumbing
txn_wal
Transaction-Aware WAL for ACID Transactions
upgrade_contract
Upgrade Compatibility Contract
validation
SSTable Validation Layer
vectorized_scan
SIMD-Accelerated Vectorized Scan Engine (Recommendation 2)
version_set
SuperVersion Metadata + Copy-on-Write Version Set
version_store
Version Store for MVCC
wal_integration
WAL-Storage Integration (Task 2 & Task 4)
wal_segment
WAL Segmentation and Checkpoint Manager
zero_copy
Zero-Copy SSTable Iterator
zero_copy_safety
Zero-Copy Safety with Validation Layer (Task 5)
zero_copy_serde
Zero-Copy Serialization (Recommendation 6)