Expand description
SochDB Storage Layer
Log-Structured Column Store (LSCS) with transaction-aware WAL for TOON-native data.
§Runtime Modes
This crate supports two runtime modes:
§Embedded Sync Mode (like SQLite)
For embedded deployments without async runtime:
sochdb-storage = { version = "...", default-features = false, features = ["embedded-sync"] }Benefits:
- ~500KB smaller binary
- No async runtime overhead
- Simpler embedded integration
§Async Mode (default, for servers)
For server deployments with async I/O:
sochdb-storage = { version = "..." } # async enabled by defaultBenefits:
- Better scalability for concurrent connections
- Non-blocking I/O for server workloads
§Novel Components
-
LSCS (
lscs): Log-Structured Column Store - columnar variant of LSM with schema-aware compression and column-aware compaction for reduced write amplification. -
Transaction WAL (
txn_wal): ACID-compliant Write-Ahead Log with transaction boundaries, commit/abort markers, and crash recovery. -
StorageEngine Trait (
storage_engine): Pluggable storage backend abstraction enabling 80% I/O reduction for columnar projections (Task 1). -
Page Manager (
page_manager): TOON file format with magic header and O(1) page allocation (Task 8). -
Columnar Compression (
columnar_compression): Type-aware encoding with dictionary, RLE, and delta compression for 2-4× storage reduction (Task 9).
§Utility Components
- Bloom Filters (
bloom): Probabilistic existence checks - Block Checksums (
block_checksum): Data integrity validation - Compression (
compression): LZ4/Zstd compression - Sketches (
sketches): Approximate algorithms (HyperLogLog, CountMin, DDSketch)
Re-exports§
pub use columnar_compression::ColumnEncoder;pub use columnar_compression::DeltaEncoder;pub use columnar_compression::DictionaryEncoder;pub use columnar_compression::EncodingStats;pub use columnar_compression::EncodingType;pub use columnar_compression::RleEncoder;pub use learned_index_integration::HybridIndex;pub use learned_index_integration::IndexManager;pub use learned_index_integration::IndexType;pub use learned_index_integration::KeyStats;pub use learned_index_integration::PointLookupExecutor;pub use lscs::ColumnDef;pub use lscs::ColumnGroup;pub use lscs::ColumnType;pub use lscs::ColumnarMemtable;pub use lscs::Lscs;pub use lscs::LscsConfig;pub use lscs::LscsRecoveryStats;pub use lscs::LscsStats;pub use lscs::TableSchema;pub use mvcc_snapshot::MvccStore;pub use mvcc_snapshot::Snapshot as MvccSnapshot;pub use mvcc_snapshot::Timestamp;pub use mvcc_snapshot::TransactionManager;Deprecated pub use mvcc_snapshot::TxnId;pub use mvcc_snapshot::TxnStatus;pub use mvcc_snapshot::VersionChain;pub use mvcc_snapshot::VersionInfo;pub use page_manager::DEFAULT_PAGE_SIZE;pub use page_manager::DbHeader;pub use page_manager::FORMAT_VERSION;pub use page_manager::FreePageHeader;pub use page_manager::PageId;pub use page_manager::PageManager;pub use page_manager::PageManagerStats;pub use page_manager::PageType;pub use page_manager::SOCHDB_MAGIC;pub use storage_engine::ColumnId;pub use storage_engine::ColumnIterator;pub use storage_engine::Row;pub use storage_engine::RowId;pub use storage_engine::StorageEngine;pub use storage_engine::StorageEngineType;pub use storage_engine::StorageStats;pub use storage_engine::TxnHandle;pub use storage_engine::open_storage_engine;pub use transaction::DurabilityLevel;pub use transaction::IsolationLevel;pub use transaction::RecoveryStats as TxnRecoveryStats;pub use transaction::TransactionCoordinator;pub use transaction::TransactionHandle;pub use txn_wal::CrashRecoveryStats;pub use txn_wal::TxnWal;pub use txn_wal::TxnWalBuffer;pub use txn_wal::TxnWalEntry;pub use txn_wal::TxnWalStats;pub use wal_integration::GroupCommitBuffer;pub use wal_integration::MvccTransactionManager;pub use wal_integration::RecoveryStats;pub use wal_integration::Transaction;pub use wal_integration::TxnState;pub use wal_integration::WalStorageManager;pub use adaptive_learned_index::AdaptiveLearnedIndex;pub use adaptive_learned_index::LearnedIndexStats;pub use adaptive_learned_index::PiecewiseLinearModel;pub use adaptive_memtable::AdaptiveMemtableConfig;pub use adaptive_memtable::AdaptiveMemtableSizer;pub use adaptive_memtable::AdaptiveMemtableStats;pub use adaptive_memtable::DEFAULT_BASE_SIZE;pub use adaptive_memtable::MAX_MEMTABLE_SIZE;pub use adaptive_memtable::MIN_MEMTABLE_SIZE;pub use batch_wal::BatchAccumulator;pub use batch_wal::BatchedWalReader;pub use batch_wal::BatchedWalStats;pub use batch_wal::BatchedWalWriter;pub use batch_wal::ConcurrentBatchedWal;pub use batch_wal::DEFAULT_MAX_BATCH_BYTES;pub use batch_wal::DEFAULT_MAX_BATCH_SIZE;pub use clr_learned_index::ClrIndex;pub use clr_learned_index::ClrLookupResult;pub use clr_learned_index::ClrStats;pub use clr_learned_index::IndexedSortedRun;pub use key_buffer::ArenaKey;pub use key_buffer::ArenaKeyHandle;pub use key_buffer::BatchKeyGenerator;pub use key_buffer::InternedTablePrefix;pub use key_buffer::KeyArena;pub use key_buffer::KeyBuffer;pub use key_buffer::MAX_KEY_LENGTH;pub use lockfree_memtable::HazardDomain;pub use lockfree_memtable::INLINE_VALUE_SIZE;pub use lockfree_memtable::LockFreeMemTable;pub use lockfree_memtable::LockFreeVersion;pub use lockfree_memtable::LockFreeVersionChain;pub use lockfree_memtable::ValueStorage;pub use packed_row::PackedColumnDef;pub use packed_row::PackedColumnType;pub use packed_row::PackedRow;pub use packed_row::PackedRowBuilder;pub use packed_row::PackedTableSchema;pub use backend::LocalFsBackend;pub use backend::ObjectMetadata;pub use backend::StorageBackend;pub use backup::BackupManager;pub use backup::BackupMetadata;pub use block_checksum::BlockChecksumConfig;pub use block_checksum::BlockChecksumStats;pub use block_checksum::BlockType as BlockChecksumType;pub use block_checksum::BlockWriter;pub use block_checksum::ChecksummedBlock;pub use bloom::BlockedBloomFilter;pub use bloom::BloomFilter;pub use bloom::LevelAdaptiveFPR;pub use bloom::UnifiedBloomFilter;pub use compression::CompressionEngine;pub use compression::CompressionStats;pub use compression::StorageTier;pub use manifest::FileMetadata;pub use manifest::LsmState;pub use manifest::Manifest;pub use manifest::VersionEdit;pub use memory::MemoryBudget;pub use memory::MemoryTracker;pub use memory::WriteBufferManager;pub use memory::WriteBufferStats;pub use mvcc_new::ColumnGroupRef;pub use mvcc_new::ReadVersion;pub use mvcc_new::Snapshot;pub use mvcc_new::SnapshotGuard;pub use mvcc_new::VersionGuard;pub use mvcc_new::VersionSet;pub use mvcc_new::VersionSetStats;pub use mvcc_new::VersionSetStatsSnapshot;pub use payload::CompressionType;pub use payload::PayloadStats;pub use payload::PayloadStore;pub use sketches::AdaptiveSketch;pub use sketches::CountMinSketch;pub use sketches::DDSketch;pub use sketches::ExponentialHistogram;pub use sketches::HyperLogLog;pub use two_level_index::BlockIndexEntry;pub use two_level_index::BlockIndexReader;pub use two_level_index::FencePointer;pub use two_level_index::TemporalKey;pub use two_level_index::TwoLevelIndex;pub use validation::SSTableValidator;pub use validation::validate_sstable_file;pub use durable_storage::ArenaMvccMemTable;pub use durable_storage::DurableStorage;pub use durable_storage::EphemeralHandle;pub use durable_storage::MvccMemTable;pub use durable_storage::TransactionMode;pub use mvcc_concurrent::ConcurrentMvcc;pub use mvcc_concurrent::ConcurrentVersionChain;pub use mvcc_concurrent::ConcurrentVersionEntry;pub use mvcc_concurrent::HlcTimestamp;pub use mvcc_concurrent::ReaderSlot;pub use mvcc_concurrent::VersionStore;pub use mvcc_concurrent::VersionStoreStats;pub use mvcc_concurrent::WriterGuard;pub use compaction_policy::CompactionConfig;pub use compaction_policy::CompactionFile;pub use compaction_policy::CompactionJob;pub use compaction_policy::CompactionPicker;pub use compaction_policy::CompactionPriority;pub use compaction_policy::CompactionReason;pub use compaction_policy::CompactionState;pub use compaction_policy::CompactionStats;pub use compaction_policy::CompactionStrategy;pub use compaction_policy::LeveledCompactionPicker;pub use compaction_policy::RetentionConfig;pub use compaction_policy::UniversalCompactionPicker;pub use compaction_policy::VersionPruner;pub use concurrent_art::ConcurrentART;pub use cow_btree::BTreeEntry;pub use cow_btree::BTreeSnapshot;pub use cow_btree::CowBTree;pub use cow_btree::Node;pub use cow_btree::SearchResult;pub use epoch_mvcc::CommitResult;pub use epoch_mvcc::EpochManager;pub use epoch_mvcc::EpochMvccStore;pub use epoch_mvcc::EpochSnapshot;pub use epoch_mvcc::EpochTransaction;pub use epoch_mvcc::EpochVersionChain;pub use epoch_mvcc::GcStats;pub use epoch_mvcc::StoreStats;pub use epoch_mvcc::VersionEntry;pub use lazy_namespace::LazyNamespaceConfig;pub use lazy_namespace::LazyNamespaceTable;pub use object_store_tier::ObjectStoreTier;pub use object_store_tier::ObjectStoreTierConfig;pub use object_store_tier::SegmentDescriptor;pub use optimized_scan::EntrySource;pub use optimized_scan::FileRange;pub use optimized_scan::LevelFiles;pub use optimized_scan::RangeScanner;pub use optimized_scan::ScanConfig;pub use optimized_scan::ScanStats;pub use optimized_scan::TournamentTree;pub use optimized_scan::VersionedEntry;pub use page_cache::CacheStats;pub use page_cache::CachedPage;pub use page_cache::ClockProCache;pub use page_cache::PageId as CachePageId;pub use page_cache::PageState;pub use row_format::Slot;pub use row_format::SlotRow;pub use row_format::SlotRowArena;pub use row_format::SlotRowFlags;pub use row_format::SlotRowHandle;pub use sstable::BlockBuilder;pub use sstable::BlockCache;pub use sstable::BlockHandle;pub use sstable::BlockIterator;pub use sstable::BlockType;pub use sstable::BloomFilterPolicy;pub use sstable::FilterPolicy;pub use sstable::FilterReader;pub use sstable::Header;pub use sstable::ReadOptions;pub use sstable::RibbonFilterPolicy;pub use sstable::SSTable;pub use sstable::SSTableBuilder;pub use sstable::SSTableBuilderOptions;pub use sstable::SSTableBuilderResult;pub use sstable::SSTableFormat;pub use sstable::Section;pub use sstable::SectionType;pub use sstable::TableMetadata;pub use sstable::XorFilterPolicy;pub use tiered_memtable::HotEntry;pub use tiered_memtable::SortedBatch;pub use tiered_memtable::TieredMemTable;pub use vectorized_scan::ColumnVector;pub use vectorized_scan::ComparisonOp;pub use vectorized_scan::DEFAULT_BATCH_SIZE;pub use vectorized_scan::Int64Comparison;pub use vectorized_scan::SimdVisibilityFilter;pub use vectorized_scan::SoaBatch;pub use vectorized_scan::SoaScanIterator;pub use vectorized_scan::SoaScanStats;pub use vectorized_scan::SoaSource;pub use vectorized_scan::StreamingScanIterator;pub use vectorized_scan::ValueHandle;pub use vectorized_scan::VectorBatch;pub use vectorized_scan::VectorPredicate;pub use vectorized_scan::VectorizedScanConfig;pub use vectorized_scan::VectorizedScanStats;pub use vectorized_scan::VersionedSlice;pub use version_set::FileMetadata as VersionFileMetadata;pub use version_set::ImmutableMemTable;pub use version_set::ImmutableMemTableRef;pub use version_set::LevelMetadata;pub use version_set::SuperVersion;pub use version_set::SuperVersionHandle;pub use version_set::VersionSet as CowVersionSet;pub use wal_segment::CheckpointRecord;pub use wal_segment::RecoveryIterator;pub use wal_segment::SegmentConfig;pub use wal_segment::SegmentHeader;pub use wal_segment::SegmentMetadata;pub use wal_segment::SegmentStats;pub use wal_segment::WalEntry;pub use wal_segment::WalSegmentManager;pub use zero_copy_serde::FORMAT_VERSION as SERDE_FORMAT_VERSION;pub use zero_copy_serde::FieldDescriptor;pub use zero_copy_serde::HEADER_SIZE as SERDE_HEADER_SIZE;pub use zero_copy_serde::MmapWalReader;pub use zero_copy_serde::SerdeStats;pub use zero_copy_serde::WalBatchReader;pub use zero_copy_serde::WalBatchWriter;pub use zero_copy_serde::WalEntryBuilder;pub use zero_copy_serde::WalEntryHeader;pub use zero_copy_serde::WalEntryReader;pub use zero_copy_serde::WalEntryType;pub use zero_copy_serde::ZERO_COPY_MAGIC;pub use zero_copy_serde::ZeroCopyHeader;pub use txn_arena::ArenaWriteSet;pub use txn_arena::BytesRef;pub use txn_arena::KeyFingerprint;pub use txn_arena::TxnArena;pub use txn_arena::TxnWriteBuffer;pub use txn_arena::WriteOp;pub use dirty_tracking::BatchedDirtyTracker;pub use dirty_tracking::DirtyEvent;pub use dirty_tracking::DirtyTrackingStats;pub use dirty_tracking::TxnDirtyBuffer;pub use index_policy::BalancedTableIndex;pub use index_policy::IndexPolicy;pub use index_policy::SortedRun;pub use index_policy::TableIndexConfig;pub use index_policy::TableIndexRegistry;pub use queue_index::CompositeQueueKey;pub use queue_index::QueueIndex;pub use queue_index::QueueIndexConfig;pub use queue_index::QueueIndexStats;pub use queue_index::QueueTableRegistry;pub use cdc::CdcConfig;pub use cdc::CdcEmitter;pub use cdc::CdcError;pub use cdc::CdcEvent;pub use cdc::CdcLog;pub use cdc::CdcOperation;pub use cdc::CdcSubscriber;pub use database::ColumnDef as DbColumnDef;pub use database::ColumnType as DbColumnType;pub use database::ColumnarQueryResult;pub use database::Database;pub use database::DatabaseConfig;pub use database::GroupCommitSettings;pub use database::QueryBuilder;pub use database::QueryResult;pub use database::QueryRowIterator;pub use database::RecoveryStats as DbRecoveryStats;pub use database::Stats as DbStats;pub use database::SyncMode;pub use database::TableSchema as DbTableSchema;pub use database::TxnHandle as KernelTxnHandle;pub use database::VectorSearchResult;
Modules§
- actor
- Actor-Based Connection Manager
- adaptive_
learned_ index - Adaptive Learned Index with Bounded Error
- adaptive_
memtable - Adaptive Memtable Sizing with Memory Pressure Feedback
- admission_
control - Admission Control with Explicit Cost Model
- backend
- Storage backend abstraction
- backup
- Backup and restore functionality for SochDB database
- batch_
wal - Batched WAL with Vectored I/O
- block_
checksum - Block-Level CRC32C Checksums
- bloom
- Bloom filter for fast negative lookups
- cdc
- WAL-Derived Change Data Capture (CDC) Engine
- clr_
learned_ index - Compact Linear Regression (CLR) Learned Index (Task 3)
- columnar_
compression - Columnar Compression with Type-Aware Encoding (Task 9)
- compaction_
policy - Compaction Policy for Version and Tombstone Pruning
- compression
- Storage compression and optimization module
- concurrent_
art - Concurrent Adaptive Radix Tree (ART) for Lock-Free Memtable
- correctness_
testing - Correctness Testing Framework
- cow_
btree - Copy-on-Write B-Tree Index (Recommendation 5)
- database
- SochDB Database Kernel
- deferred_
index - Deferred Sorted Index (Recommendation 2: LSM-Style Batch Compaction)
- dict_
compression - Dictionary-Based Compression
- direct_
io - Direct I/O Support for Cache-Bypass Scenarios
- dirty_
tracking - Batched Dirty Tracking with MPSC Queue
- durability_
contract - Durability Contract Hardening
- durable_
storage - Durable Storage Layer
- epoch_
arena - Epoch-Partitioned Key Arena (Task 1)
- epoch_
mvcc - Epoch-Based MVCC (Recommendation 7)
- ffi
- generational_
slab - Generational Slab Allocator (Task 5)
- group_
commit - Event-Driven Group Commit Buffer
- hlc
- Hybrid Logical Clock (HLC) for Monotonic Commit Timestamps
- hybrid_
store - Adaptive Hybrid Storage (AHS) - PAX Block Layout
- index_
policy - Per-Table Index Policy + Scan-Optimized Structure
- io_
isolation - I/O Isolation Policy
- io_
uring - io_uring Backend for Linux Async I/O
- ipc
- IPC Protocol with Multiplexing and Streaming
- ipc_
server - Unix Domain Socket IPC Server for SochDB
- key_
buffer - Cache-Line Aligned Key Buffer with Stack Allocation
- lazy_
namespace - Per-namespace lazy hydrate/evict — resident memory tracks active tenants.
- learned_
index_ integration - Learned Index Integration (Task 5)
- lock
- Advisory File Locking for Database Exclusivity
- lockfree_
epoch - Lock-Free Epoch Tracking via Slot Array (Task 3)
- lockfree_
memtable - Lock-Free MemTable with Hazard Pointer Protection
- lscs
- Log-Structured Column Store (LSCS)
- manifest
- LSM MANIFEST File Implementation (Gap #13 Fix)
- memory
- Memory pressure handling and resource limits
- mvcc_
concurrent - Concurrent MVCC for Multi-Reader Single-Writer Embedded Mode
- mvcc_
new - MVCC Version Management for LSCS
- mvcc_
snapshot - MVCC Snapshot Plumbing (Task 3)
- namespace
- Lightweight Namespace Routing + On-Disk Layout (Task 3)
- object_
store_ tier - Object-storage-native cold tier for immutable index segments.
- optimized_
scan - Optimized Range Scan with O(log n + k) Asymptotics
- packed_
row - Packed Row Format for Unified Row Storage
- page_
cache - Application-Level Page Cache with Clock-Pro (Recommendation 8)
- page_
manager - Page-Based File Layout with Database Header (Task 8)
- parallel_
merge - Parallel K-Way Merge for Compaction
- payload
- Payload storage for variable-length data
- polymorphic_
value - Polymorphic Value Encoding with Adaptive Compression (Task 12)
- prefetch
- Memory Prefetching for Range Scans
- queue_
index - Queue-Optimized Index Policy
- row_
format - Slot-Based Columnar Row Storage (PAX/Hybrid Format) - Recommendation 1
- shard_
coalesced - Shard-Coalesced Batch DashMap with Prefetch Pipelining (Task 6)
- sketches
- Probabilistic Data Structures for Streaming Analytics
- ssi
- Serializable Snapshot Isolation (SSI) Implementation
- ssi_
scaling - SSI Scaling Guardrails
- sstable
- Block-Oriented SSTable Format
- storage_
engine - StorageEngine Trait Abstraction (Task 1)
- stratified_
skiplist - Stratified SkipList with Deferred LSM Promotion (Task 2)
- streaming_
iterator - Streaming Iterator Architecture for Scans
- supervisor
- Supervised background workers.
- tiered_
memtable - Tiered SkipMap Elimination (Recommendation 3)
- tournament_
tree - Tournament Tree (Loser Tree) for K-Way Merge
- transaction
- Unified Transaction Coordinator
- two_
level_ index - Two-Level Index for SSTables
- txn_
arena - Transaction-Scoped Arena with Zero-Copy Key/Value Plumbing
- txn_wal
- Transaction-Aware WAL for ACID Transactions
- upgrade_
contract - Upgrade Compatibility Contract
- validation
- SSTable Validation Layer
- vectorized_
scan - SIMD-Accelerated Vectorized Scan Engine (Recommendation 2)
- version_
set - SuperVersion Metadata + Copy-on-Write Version Set
- version_
store - Version Store for MVCC
- wal_
integration - WAL-Storage Integration (Task 2 & Task 4)
- wal_
segment - WAL Segmentation and Checkpoint Manager
- zero_
copy - Zero-Copy SSTable Iterator
- zero_
copy_ safety - Zero-Copy Safety with Validation Layer (Task 5)
- zero_
copy_ serde - Zero-Copy Serialization (Recommendation 6)