Expand description
§Cardano LSM - Log-Structured Merge Tree for Blockchain Indexing
A pure Rust implementation of an LSM tree optimized for blockchain indexing workloads, particularly UTxO-based systems like Cardano. This crate provides fast snapshots, rollback capabilities, and efficient range queries without requiring a Haskell runtime.
§Quick Start
use cardano_lsm::{LsmTree, LsmConfig, Key, Value};
// Open or create an LSM tree
let config = LsmConfig::default();
let mut tree = LsmTree::open(db_path, config)?;
// Insert key-value pairs
let key = Key::from(b"utxo_123");
let value = Value::from(b"transaction_data");
tree.insert(&key, &value)?;
// Retrieve values
if let Some(v) = tree.get(&key)? {
println!("Found: {:?}", v);
}
// Delete keys
tree.delete(&key)?;§Common Use Cases
§Blockchain Indexing with Snapshots
The primary use case is maintaining blockchain state with the ability to quickly roll back during chain reorganizations:
// Take a snapshot before processing a new block
let snapshot = tree.snapshot();
// Process block transactions
tree.insert(&Key::from(b"utxo_new"), &Value::from(b"data"))?;
tree.delete(&Key::from(b"utxo_spent"))?;
// If block is invalid or reorg occurs, rollback
tree.rollback(snapshot)?;
// Tree is now back to the snapshot state§Range Queries
Efficiently scan ranges of keys, useful for querying all UTxOs for an address:
let start = Key::from(b"addr_123_");
let end = Key::from(b"addr_124_");
for (key, value) in tree.range(&start, &end) {
// Process each key-value pair in range
println!("Key: {:?}, Value: {:?}", key, value);
}§Batch Operations
For high-throughput scenarios, use batch operations to amortize I/O costs:
let batch = vec![
(Key::from(b"key1"), Value::from(b"value1")),
(Key::from(b"key2"), Value::from(b"value2")),
];
tree.insert_batch(batch)?;
// Deletions are done separately
tree.delete(&Key::from(b"key3"))?;§Configuration
Customize the LSM tree behavior with LsmConfig:
use cardano_lsm::LsmConfig;
let config = LsmConfig {
memtable_size: 16 * 1024 * 1024, // 16 MB memtable
bloom_filter_bits_per_key: 12, // Bits per key for bloom filter
..Default::default()
};§Persistent Snapshots
Save and restore snapshots for backup or testing:
// Save current state to disk
tree.save_snapshot("block_12345", "after block 12345")?;
drop(tree); // Close the tree
// Later, restore from the saved snapshot
let tree = LsmTree::open_snapshot(db_path, "block_12345")?;§Performance Characteristics
- Snapshot creation: < 10ms (reference counting, no data copy)
- Rollback: < 1s (typically used for short-term reorgs)
- Write throughput: Optimized for blockchain workloads with batch operations
- Read latency: Bloom filters provide fast negative lookups
- Compaction: LazyLevelling strategy balances write amp and space amp
§Thread Safety
LsmTree uses internal locking and can be safely shared across threads:
// Wrap in Arc to share across threads (reads are thread-safe)
let tree = Arc::new(tree);
let tree_a = tree.clone();
let tree_b = tree.clone();
let ha = thread::spawn(move || tree_a.get(&Key::from(b"key")));
let hb = thread::spawn(move || tree_b.get(&Key::from(b"key")));
ha.join().unwrap()?;
hb.join().unwrap()?;§Testing
This implementation includes 10,000+ property-based conformance tests validated against the Haskell reference implementation with a 100% pass rate.
§Important Notes
- No Write-Ahead Log (WAL): Writes are lost on crash until a snapshot is saved. This design choice trades crash recovery for simplicity and performance.
- Session locking: Only one process can access a database directory at a time.
- Snapshots are reference-counted: Immutable view of data at a point in time.
Structs§
- Hash
- Incremental
Merkle Tree - Key
- LsmConfig
- LsmSnapshot
- LsmStats
- LsmTree
- Merkle
Diff - Merkle
Leaf - Merkle
Proof - Merkle
Root - Merkle
Snapshot - Monoidal
LsmTree - LSM tree with monoidal value support for efficient aggregation
- Monoidal
Snapshot - Immutable snapshot of a
MonoidalLsmTree - Persistent
Snapshot - A persistent snapshot on disk
- Range
Iter - Snapshot
Metadata - Metadata for a persistent snapshot (CBOR format matching Haskell)
- Snapshot
Run - Information about a single SSTable run in a snapshot
- Value
Enums§
- Compaction
Strategy - Compaction strategy for the LSM tree
- Direction
- Error
Traits§
- Monoidal
- Values that can be combined with an associative operation (monoid)