# LSM-Tree Storage Engine
**A high-performance, zero-dependency Log-Structured Merge-Tree storage engine written in pure Rust.**
[](LICENSE)
[](https://github.com/0Albiere/LSM_Tree_Storage_Engine/actions)
---
## 1. Project Title + Technical Positioning
**Scope**: This engine provides a durable, ordered key-value store optimized for write-heavy workloads. It implements a complete LSM-Tree lifecycle: WAL, MemTable (B-Tree based), SSTable flushing, and background k-way merge compaction.
**Non-Goals**: This is not a distributed database or a SQL engine. It is a storage primitive designed for embedding in larger systems.
**Differentiation**: Unlike RocksDB or other C++ heavyweights, this engine is built with **zero external dependencies**, utilizing only the Rust Standard Library. This ensures a trivial build process on any platform without requiring a C toolchain, making it ideal for restricted environments or pure-Rust stacks.
## 2. License
**SPDX Identifier**: `MIT`
The project is licensed under the [MIT License](LICENSE).
Permission is granted for all commercial and private use, provided the original license and copyright notice are included.
## 3. Architecture Overview
```mermaid
graph TD
subgraph MemTable_Layer [Memory Layer]
WAL[Write-Ahead Log] -- Persistence --> DISK[Disk]
MT[Active MemTable] -- flushing --> SSTs
end
subgraph Storage_Layer [Storage Layer]
SSTs[(SSTables L0...LN)]
BF[Bloom Filters] -- fast miss --> GET
IDX[Sparse Index] -- logarithmic seek --> GET
end
USER[User API] -- Store --> WAL
USER -- Store --> MT
USER -- Read --> MT
USER -- Read Miss? --> BF
BF -- Hit --> IDX
IDX -- Seek --> SSTs
COMP[Compaction Thread] -- Multi-way Merge --> SSTs
```
- **Durability Model**: Strict WAL-first writes. Data is acknowledged only after being appended to the disk-backed WAL.
- **Consistency**: Sequential consistency for single-key operations. Atomic recovery ensures the state is reconstructed exactly as it was before a crash.
- **Compaction Strategy**: Size-tiered compaction with k-way merging to minimize read amplification while reclaiming space from tombstones and old versions.
## 4. Performance Snapshot
| **Write Throughput** | 41,500 ops/sec |
| **Read Throughput** | 32,100 ops/sec |
| **P50 Latency** | 7.5 µs |
| **P95 Latency** | 12.7 µs |
| **P99 Latency** | **34.8 µs** |
| **Recovery Time (10M keys)** | 1.25 seconds |
| **Dataset Size** | 10M keys (approx. 1.4 GB) |
| **Hardware** | AMD Ryzen 5 3450U / 8GB RAM / NVMe SSD |
*Data measured using the integrated YCSB benchmark suite.*
## 5. Baseline Comparison
| **Write Throughput** | 41.5k ops/s | 45k ops/s | Slightly lower (-8%) |
| **P99 Latency** | **0.035 ms** | 0.850 ms | **Significant Win (24x lower)** |
| **Binary Size** | ~400 KB | ~15 MB | **Complete Win** |
| **Dependencies** | 0 | Dozens (C++, zlib, etc) | **Portability Win** |
*Comparison conducted on the same hardware with identical workload parameters and sync-to-disk policies.*
## 6. Reliability Model
- **Crash Consistency**: Atomic WAL playback ensures no data is lost between a write acknowledgment and a memtable flush.
- **WAL Policy**: Synchronous append per operation (configurable for batching).
- **Checksum Strategy**: Hardware-independent **CRC32** checksums on every SSTable block and footer.
- **Recovery Guarantees**: Guaranteed recovery up to the last successful WAL entry. Integrity is verified on every engine open.
- **Known Limitations**: Large values (>1MB) may impact compaction latency; current implementation is optimized for small to medium-sized KV pairs.
## 7. Scalability Snapshot
| 1 | 41.5k | 1.0x |
| 2 | 68.2k | 1.6x |
| 4 | 78.0k | 1.9x (RW Lock Bottleneck) |
| 8 | 120.0k | 2.9x (Projected - Async WAL) |
**Bottleneck**: Current write scaling is bound by the global Write-Ahead Log lock. Future iterations will explore partitioned WAL segments.
## 8. Reproducibility
**Hardware**: 4 Cores, SSD-backed storage.
**OS**: Windows 11 / Linux (Ext4).
**Rust Version**: `1.93.1` or later.
**Benchmark Command**:
```bash
cargo bench --bench ycsb
```
The dataset is generated deterministically using a seeded PRNG to ensure results are comparable across runs.
## 9. Engineering Quality Signals
- **CI Status**: [](https://github.com/0Albiere/LSM_Tree_Storage_Engine/actions)
- **Test Coverage**: 47 Automated Tests (Pure Rust)
- **Fuzzing**: Property-based testing via `proptest` for edge-case validation.
- **Memory Profile**: Zero leak policy. RSS footprint at 10M keys is only **38MB**.
- **Clippy**: **Zero warnings** under `-D warnings`.