Features
- High Performance: Optimized for concurrent writes and reads
- Topic-based Organization: Separate read/write streams per topic
- Configurable Consistency: Choose between strict and relaxed consistency models
- Memory-mapped I/O: Efficient file operations using memory mapping
- Persistent Read Offsets: Read positions survive process restarts
- Coordination-free Deletion: Atomic file cleanup without blocking operations
- Comprehensive Benchmarking: Built-in performance testing suite
Quick Start
Add Walrus to your Cargo.toml
:
[]
= "0.1.0"
Basic Usage
use ;
// Create a new WAL instance with default settings
let wal = new?;
// Write data to a topic
let data = b"Hello, Walrus!";
wal.append_for_topic?;
// Read data from the topic
if let Some = wal.read_next?
Advanced Configuration
use ;
// Configure with custom consistency and fsync behavior
let wal = with_consistency_and_schedule?;
// Write and read operations work the same way
wal.append_for_topic?;
Configuration Options
Read Consistency Modes
Walrus supports two consistency models:
ReadConsistency::StrictlyAtOnce
- Behavior: Read offsets are persisted after every read operation
- Guarantees: No message will be read more than once, even after crashes
- Performance: Higher I/O overhead due to frequent persistence
- Use Case: Critical systems where duplicate processing must be avoided
let wal = with_consistency?;
ReadConsistency::AtLeastOnce { persist_every: u32 }
- Behavior: Read offsets are persisted every N read operations
- Guarantees: Messages may be re-read after crashes (at-least-once delivery)
- Performance: Better throughput with configurable persistence frequency
- Use Case: High-throughput systems that can handle duplicate processing
let wal = with_consistency?;
Fsync Scheduling
Control when data is flushed to disk:
FsyncSchedule::Milliseconds(u64)
- Behavior: Background thread flushes data every N milliseconds
- Default: 1000ms (1 second)
- Range: Minimum 1ms, recommended 100-5000ms
let wal = with_consistency_and_schedule?;
Environment Variables
WALRUS_QUIET
: Set to any value to suppress debug output during operations
# Suppress debug messages
File Structure and Storage
Walrus organizes data in the following structure:
wal_files/
├── wal_1234567890.log # Log files (10MB blocks, 100 blocks per file)
├── wal_1234567891.log
├── read_offset_idx_index.db # Persistent read offset index
└── read_offset_idx_index.db.tmp # Temporary file for atomic updates
Storage Configuration
- Block Size: 10MB per block (configurable via
DEFAULT_BLOCK_SIZE
) - Blocks Per File: 100 blocks per file (1GB total per file)
- Max File Size: 1GB per log file
- Index Persistence: Read offsets stored in separate index files
API Reference
Core Methods
Walrus::new() -> std::io::Result<Self>
Creates a new WAL instance with default settings (StrictlyAtOnce
consistency).
Walrus::with_consistency(mode: ReadConsistency) -> std::io::Result<Self>
Creates a WAL with custom consistency mode and default fsync schedule (1000ms).
Walrus::with_consistency_and_schedule(mode: ReadConsistency, schedule: FsyncSchedule) -> std::io::Result<Self>
Creates a WAL with full configuration control.
append_for_topic(&self, topic: &str, data: &[u8]) -> std::io::Result<()>
Appends data to the specified topic. Topics are created automatically on first write.
read_next(&self, topic: &str) -> std::io::Result<Option<Entry>>
Reads the next entry from the topic. Returns None
if no more data is available.
Data Types
Entry
Benchmarks
Walrus includes a comprehensive benchmarking suite to measure performance across different scenarios.
Available Benchmarks
1. Write Benchmark (multithreaded_benchmark_writes
)
- Duration: 2 minutes
- Threads: 10 concurrent writers
- Data Size: Random entries between 500B and 1KB
- Topics: One topic per thread (
topic_0
throughtopic_9
) - Configuration:
AtLeastOnce { persist_every: 50 }
- Output:
benchmark_throughput.csv
2. Read Benchmark (multithreaded_benchmark_reads
)
- Phases:
- Write Phase: 1 minute (populate data)
- Read Phase: 2 minutes (consume data)
- Threads: 10 concurrent reader/writers
- Data Size: Random entries between 500B and 1KB
- Configuration:
AtLeastOnce { persist_every: 5000 }
- Output:
read_benchmark_throughput.csv
3. Scaling Benchmark (scaling_benchmark
)
- Thread Counts: 1 to 10 threads (tested sequentially)
- Duration: 30 seconds per thread count
- Data Size: Random entries between 500B and 1KB
- Configuration:
AtLeastOnce { persist_every: 50 }
- Output:
scaling_results.csv
andscaling_results_live.csv
Running Benchmarks
Using Make (Recommended)
# Run individual benchmarks
# Show results
# Live monitoring (run in separate terminal)
# Cleanup
Using Cargo Directly
# Write benchmark
# Read benchmark
# Scaling benchmark
Benchmark Data Generation
All benchmarks use the following data generation strategy:
// Random entry size between 500B and 1KB
let size = rng.gen_range;
let data = vec!;
This creates realistic variable-sized entries with predictable content for verification.
Visualization Scripts
The scripts/
directory contains Python visualization tools:
visualize_throughput.py
- Write benchmark graphsshow_reads_graph.py
- Read benchmark graphsshow_scaling_graph_writes.py
- Scaling resultslive_scaling_plot.py
- Live scaling monitoring
Requirements: pandas
, matplotlib
Architecture
Key Design Principles
- Coordination-free Operations: Writers don't block readers, minimal locking
- Memory-mapped I/O: Efficient file operations with OS-level optimizations
- Topic Isolation: Each topic maintains independent read/write positions
- Persistent State: Read offsets survive process restarts
- Background Maintenance: Async fsync and cleanup operations
Read Offset Persistence
Important: Read offsets are decoupled from write offsets. This means:
- Each topic maintains its own read position independently
- Read positions are persisted to disk based on consistency configuration
- After restart, readers continue from their last persisted position
- Write operations don't affect existing read positions
This design enables multiple readers per topic and supports replay scenarios.
Performance Tuning
For Maximum Throughput
let wal = with_consistency_and_schedule?;
For Maximum Durability
let wal = with_consistency_and_schedule?;
For Balanced Performance
let wal = with_consistency_and_schedule?;
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Development Setup
Running Tests
# Unit tests
# Integration tests
# End-to-end tests
# All tests
License
This project is licensed under the MIT License - see the LICENSE file for details.
Changelog
Version 0.1.0
- Initial release
- Core WAL functionality
- Topic-based organization
- Configurable consistency modes
- Comprehensive benchmark suite
- Memory-mapped I/O implementation
- Persistent read offset tracking