nano-wal

A simple, lightweight Write-Ahead Log (WAL) implementation in Rust with per-key segment sets, designed for append-only operations with configurable retention and random access for memory-constrained systems.
Features
- Per-key segment sets: Each key gets its own set of segment files for optimal organization
- Entry references: Get position references for written entries enabling random access
- Random access reads: Read specific entries directly using their references with signature verification
- Size-based rotation: Automatic segment file rotation based on configurable size limits
- Meaningful filenames: Segment files include key names and sequence numbers (e.g.,
topic-partition-0001.log) - Dual signatures: NANO-LOG file headers and NANO-REC entry signatures for data integrity
- Configurable retention: Automatic cleanup of old files based on time-based retention policies
- Memory-efficient: Zero RAM overhead with optional random access for memory-constrained systems
Installation
Add this to your Cargo.toml:
[]
= "0.4.0"
Quick Start
use ;
use Bytes;
use Duration;
// Create a new WAL with default options
let mut wal = new?;
// Append an entry and get its reference
let content = from;
let entry_ref = wal.append_entry?;
// Append an entry with optional header
let header = Some;
let content_with_header = from;
let header_ref = wal.append_entry?;
// Log an entry with durability (forced sync to disk)
let durable_content = from;
let durable_ref = wal.log_entry?;
// Random access: read specific entry using its reference
let retrieved_content = wal.read_entry_at?;
// Sequential access: retrieve all records for a key
let records: = wal.enumerate_records?.collect;
// Enumerate all keys
let keys: = wal.enumerate_keys?.collect;
// Compact the WAL (remove expired segments)
wal.compact?;
// Clean shutdown
wal.shutdown?;
Configuration
Customize WAL behavior with WalOptions:
use ;
use Duration;
let options = WalOptions ;
let mut wal = new?;
Configuration Options
entry_retention: Duration for which entries are retained before being eligible for compaction (default: 1 week)segments_per_retention_period: Number of segments per retention period for time-based expiration (default: 10)
API Reference
Core Methods
new(filepath: &str, options: WalOptions)- Create a new WAL instanceappend_entry<K>(key: K, header: Option<Bytes>, content: Bytes, durable: bool) -> EntryRef- Append an entry with optional header to the WAL, returns referencelog_entry<K>(key: K, header: Option<Bytes>, content: Bytes) -> EntryRef- Append an entry with optional header and durability enabled, returns referenceread_entry_at(entry_ref: EntryRef) -> Bytes- Read specific entry using its reference (random access)enumerate_records<K>(key: K)- Get all records for a specific key (sequential access)enumerate_keys() -> Vec<String>- Get all unique keys in the WALcompact()- Remove expired segment files based on retention policyshutdown()- Clean shutdown and remove all WAL files
Key Types
Keys must implement Hash + AsRef<[u8]> + Display for append operations. Common types like String, &str, and custom types that implement Display work seamlessly.
Entry References
EntryRef is a lightweight reference containing:
key_hash: u64- Hash of the key for which segment set this entry belongs tosequence_number: u64- The sequence number of the segment fileoffset: u64- The byte offset within the segment file (after the header)
Entry references enable efficient random access while maintaining zero RAM overhead for the main WAL operations.
File Format
The WAL stores data in binary format with per-key segment sets:
- Each segment is named
{key}-{key_hash}-{sequence}.log(e.g.,hits-12345-0001.log) - File header:
[NANO-LOG:8][sequence:8][expiration:8][key_length:8][key:N] - Entry format:
[NANORC:6][header_length:2][header:H][content_length:8][content:M] - Segments rotate based on time expiration (when current segment expires)
- Headers are optional and limited to 64KB maximum size
Use Cases
- Topic/Partition Systems: Each key represents a topic-partition pair with isolated segment files
- Event Sourcing: Store events per entity with dedicated segment sets for optimal performance
- Database WAL: Write-ahead logging with per-table or per-operation-type isolation
- Message Queues: Persistent message storage with topic-based segment organization
- Audit Logs: Tamper-evident logging with dual signature verification (file + entry level)
- Memory-Constrained Systems: Support RAM-based structures with disk-backed random access per key
Performance Characteristics
- Write throughput: Optimized for sequential writes per key with minimal overhead
- Read performance: Direct file access per key, no cross-key index lookups required
- Storage efficiency: Time-based segment rotation and automatic file-based compaction
- Memory usage: Zero RAM overhead for entry storage, minimal active segment tracking
- Random access: Direct entry retrieval with dual signature verification for data integrity
- Header support: Optional metadata headers up to 64KB per entry for enhanced functionality
Thread Safety
While the WAL struct itself is not Sync, it can be safely used in single-threaded contexts or wrapped in appropriate synchronization primitives (Arc<Mutex<Wal>>) for multi-threaded scenarios. Entry references (EntryRef) are Copy and can be safely shared between threads. The per-key segment design makes it ideal for partitioned workloads.
Examples
Basic Usage
use ;
use Bytes;
Event Sourcing with Random Access
use ;
use Bytes;
use json;
use HashMap;
// Memory-efficient approach: store only references in RAM
Headers for Metadata
use ;
use Bytes;
use json;
Examples
The repository includes several comprehensive examples demonstrating real-world usage patterns:
1. Event Sourcing with CQRS (examples/event_sourcing_cqrs.rs)
A complete implementation of event sourcing with Command Query Responsibility Segregation pattern:
Features demonstrated:
- Domain event storage with metadata headers
- Aggregate reconstruction from events
- Event metadata tracking (correlation IDs, causation chains)
- Separate command and query models
- User and order aggregate examples
2. Distributed Messaging System (examples/distributed_messaging.rs)
A message broker implementation with routing and acknowledgments:
Features demonstrated:
- Topic partitioning with configurable partition counts
- Message routing using routing keys and hash-based partitioning
- Priority message handling with expiration
- Consumer acknowledgments with retry logic
- Dead letter queue handling
- Message replay capabilities
- Performance monitoring and statistics
3. Real-time Analytics Pipeline (examples/realtime_analytics.rs)
A high-throughput analytics system for real-time metrics:
Features demonstrated:
- High-frequency event ingestion (page views, purchases, errors)
- Real-time metrics calculation and aggregation
- Event deduplication using headers
- Multi-stream processing with different retention policies
- Performance metrics collection
- Time-window aggregations
- Metrics snapshots for historical analysis
Running Examples
All examples require additional dependencies that are included in dev-dependencies:
[]
= { = "1.0", = ["derive"] }
= "1.0"
= { = "1.0", = ["v4"] }
Each example creates temporary directories and cleans up after execution, making them safe to run multiple times.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.