Expand description
SuperVersion Metadata + Copy-on-Write Version Set
This module implements RocksDB-style SuperVersion metadata management for near lock-free reads. The key insight is that read paths only need a consistent snapshot of metadata (memtable, immutable memtables, SSTable levels), not exclusive access to the underlying data structures.
§Problem Analysis
Previous implementation suffered from lock contention:
- Foreground reads must traverse memtable (RwLock) → immutables → levels
- Each level switch may acquire additional locks
- Compaction/flush modifies metadata while readers hold references
- Lock-order complexity leads to deadlock potential
§Solution: SuperVersion + ArcSwap
┌─────────────────────────────────────────────────────────────────┐
│ VersionSet │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ current: ArcSwap<SuperVersion> ◄─── Atomic swap (O(1)) ││
│ └─────────────────────────────────────────────────────────────┘│
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ SuperVersion ││
│ │ ┌─────────────┐ ┌──────────────┐ ┌───────────────────┐ ││
│ │ │ memtable │ │ immutables │ │ level_files │ ││
│ │ │ Arc<...> │ │ Arc<Vec<..>> │ │ Arc<Vec<Level>> │ ││
│ │ └─────────────┘ └──────────────┘ └───────────────────┘ ││
│ └─────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────┘§Read Path (Lock-Free)
- Load current SuperVersion via atomic:
let sv = version_set.get();// O(1) - Search memtable → immutables → levels using sv references
- Release sv when done (Arc drop)
No locks acquired. Readers are completely decoupled from writers.
§Write Path (Copy-on-Write)
- Clone current SuperVersion’s inner Arcs
- Modify the new copy (e.g., add new immutable memtable)
- Atomically swap new SuperVersion into place
- Old SuperVersion kept alive by existing readers (Arc)
§Complexity Analysis
| Operation | Old (with locks) | New (SuperVersion) |
|---|---|---|
| Read acquire | O(1) but contended | O(1) atomic load |
| Read release | O(1) unlock | O(1) Arc decrement |
| Metadata update | O(1) + lock wait | O(changed_metadata) clone |
| Concurrent reads | Serialized on RwLock | Truly parallel |
For 8 threads, expected speedup: ~6-8x on read-heavy workloads.
Structs§
- Bloom
Filter Handle - Handle to a bloom filter (may be memory-mapped or in-memory)
- File
Metadata - Metadata for an SSTable file
- Immutable
MemTable Ref - Reference to an immutable memtable (sealed, pending flush)
- Level
Metadata - Metadata for a single level in the LSM tree
- Super
Version - SuperVersion: A consistent snapshot of all storage metadata
- Super
Version Handle - RAII handle for SuperVersion access
- Version
Set - VersionSet manages the current SuperVersion and provides atomic updates
Traits§
- Immutable
MemTable - Trait for immutable memtable operations