SMR-Swap: Lock-Free Single-Writer Multiple-Reader Swap Container
A high-performance, lock-free Rust library for safely sharing mutable data between a single writer and multiple readers using epoch-based memory reclamation.
Features
- Lock-Free: No mutexes or locks required for reads or writes
- High Performance: Optimized for both read and write operations
- Single-Writer Multiple-Reader Pattern: Type-safe enforcement via
Swapper<T>andSwapReader<T> - Memory Safe: Uses epoch-based reclamation (via
swmr-epoch) to prevent use-after-free - Zero-Copy Reads: Readers get direct references to the current value
- Concurrent: Safe to use across multiple threads with
Send + Syncbounds
Quick Start
Installation
Add to your Cargo.toml:
[]
= "0.3"
Basic Usage
use smr_swap;
Using with Arc (for shared ownership)
While SMR-Swap works with any type T, you can wrap values in Arc for shared ownership:
use smr_swap;
use Arc;
Note: SMR-Swap itself does not require Arc. Use Arc only if you need shared ownership of the inner value.
API Overview
Creating a Container
let = new;
Returns a tuple of:
Swapper<T>: The writer (notClone, enforces single writer)SwapReader<T>: The reader (can be cloned for multiple readers)
Registering Readers
Before reading, each thread must register itself to obtain a LocalEpoch:
// In reader thread
let local_epoch = reader.register_reader;
// In writer thread
let writer_epoch = swapper.register_reader;
The LocalEpoch is !Sync and must be stored per-thread (typically in thread-local storage).
Writer Operations (Swapper)
update(new_value: T)
Atomically replaces the current value.
swapper.update;
read<'a>(&self, local_epoch: &'a LocalEpoch) -> SwapGuard<'a, T>
Gets a read-only reference to the current value (via SwapGuard).
let guard = swapper.read;
println!;
read_with_guard<F, R>(&self, local_epoch: &LocalEpoch, f: F) -> R where F: FnOnce(&SwapGuard<T>) -> R
Executes a closure with a guard, allowing multiple operations on the same pinned version without re-pinning.
let len = swapper.read_with_guard;
map<F, U>(&self, local_epoch: &LocalEpoch, f: F) -> U where F: FnOnce(&T) -> U
Applies a closure to the current value and returns the result.
let len = swapper.map;
update_and_fetch<'a, F>(&mut self, local_epoch: &'a LocalEpoch, f: F) -> SwapGuard<'a, T> where F: FnOnce(&T) -> T
Atomically updates the value using the provided closure and returns a guard to the new value.
let guard = swapper.update_and_fetch;
register_reader() -> LocalEpoch
Registers the current thread as a reader and returns a LocalEpoch for use in read operations.
let local_epoch = swapper.register_reader;
Arc-Specific Writer Operations (Swapper<Arc>)
The following methods are only available when T is wrapped in an Arc:
swap(&mut self, local_epoch: &LocalEpoch, new_value: Arc<T>) -> Arc<T>
Atomically replaces the current Arc-wrapped value and returns the old Arc.
use Arc;
let = new;
let writer_epoch = swapper.register_reader;
let old = swapper.swap;
println!; // 42
update_and_fetch_arc<F>(&mut self, local_epoch: &LocalEpoch, f: F) -> Arc<T> where F: FnOnce(&Arc<T>) -> Arc<T>
Updates the value using a closure that receives the current Arc and returns a new Arc.
use Arc;
let = new;
let writer_epoch = swapper.register_reader;
let new_arc = swapper.update_and_fetch_arc;
println!; // [1, 2, 3, 4]
Reader Operations (SwapReader)
read<'a>(&self, local_epoch: &'a LocalEpoch) -> SwapGuard<'a, T>
Gets a read-only reference to the current value (via SwapGuard).
let guard = reader.read;
println!;
read_with_guard<'a, F, R>(&self, local_epoch: &'a LocalEpoch, f: F) -> R where F: FnOnce(&SwapGuard<'a, T>) -> R
Executes a closure with a guard, allowing multiple operations on the same pinned version without re-pinning.
let len = reader.read_with_guard;
map<'a, F, U>(&self, local_epoch: &'a LocalEpoch, f: F) -> U where F: FnOnce(&T) -> U
Applies a closure to the current value and returns the result.
let len = reader.map;
filter<'a, F>(&self, local_epoch: &'a LocalEpoch, f: F) -> Option<SwapGuard<'a, T>> where F: FnOnce(&T) -> bool
Returns a guard to the current value if the closure returns true.
if let Some = reader.filter
register_reader() -> LocalEpoch
Registers the current thread as a reader and returns a LocalEpoch for use in read operations.
let local_epoch = reader.register_reader;
Performance Characteristics
Comprehensive benchmark results comparing SMR-Swap against arc-swap on modern hardware.
Benchmark Summary Table
| Scenario | SMR-Swap | ArcSwap | Improvement | Notes |
|---|---|---|---|---|
| Single-Thread Read | 0.90 ns | 9.24 ns | 99% faster | Pure read performance |
| Single-Thread Write | 112.78 ns | 127.28 ns | 11% faster | Improved epoch management |
| Multi-Thread Read (2) | 0.95 ns | 9.26 ns | 99% faster | No contention |
| Multi-Thread Read (4) | 0.90 ns | 9.64 ns | 99% faster | Consistent scaling |
| Multi-Thread Read (8) | 0.98 ns | 9.80 ns | 99% faster | Excellent scaling |
| Mixed R/W (2 readers) | 111.44 ns | 453.11 ns | 75% faster | 1 writer + 2 readers |
| Mixed R/W (4 readers) | 112.35 ns | 452.34 ns | 75% faster | 1 writer + 4 readers |
| Mixed R/W (8 readers) | 113.08 ns | 533.86 ns | 79% faster | 1 writer + 8 readers |
| Batch Read | 1.63 ns | 10.10 ns | 84% faster | Optimized batch reads |
| Read with Held Guard | 112.68 ns | 526.53 ns | 79% faster | Reader holds guard during write |
| Read Under Memory Pressure | 703 ns | 764.69 ns | 8% faster | Aggressive GC collection |
Detailed Performance Analysis
Single-Thread Read
smr-swap: 0.90 ns █
arc-swap: 9.24 ns ██████████
Winner: SMR-Swap (99% faster)
- Extremely fast read path with minimal overhead
- Direct pointer access without atomic operations
- Near-nanosecond latency
Single-Thread Write
smr-swap: 112.78 ns ████████████
arc-swap: 127.28 ns █████████████
Winner: SMR-Swap (11% faster)
- Improved epoch management efficiency
- Both show excellent write performance
Multi-Thread Read Performance (Scaling)
Readers: 2 4 8
smr-swap: 0.95 ns 0.90 ns 0.98 ns
arc-swap: 9.26 ns 9.64 ns 9.80 ns
Analysis:
- SMR-Swap maintains near-constant sub-nanosecond time regardless of thread count
- 99% faster than arc-swap across all thread counts
- Excellent scaling characteristics with virtually no contention
Mixed Read-Write (Most Realistic Scenario)
Readers: 2 4 8
smr-swap: 111 ns 112 ns 113 ns
arc-swap: 453 ns 452 ns 534 ns
Winner: SMR-Swap (75-79% faster)
- Consistent performance under load (111-113 ns across all thread counts)
- Minimal impact from concurrent writers
- ArcSwap shows increased latency with more readers (up to 534 ns with 8 readers)
- Aggressive GC ensures stable performance even with frequent writes
Read Under Memory Pressure
smr-swap: 703 ns ████
arc-swap: 764.69 ns █████
Winner: SMR-Swap (8% faster)
- Aggressive garbage collection in
update()prevents garbage accumulation - Epoch-based reclamation is triggered immediately after each write
- Consistent performance even under memory pressure
- Trade-off: slightly higher write latency for predictable read performance
Read Latency with Held Guard
smr-swap: 113.31 ns ████
arc-swap: 490.02 ns ███████████████
Winner: SMR-Swap (77% faster)
- Minimal overhead when readers hold guards
- Critical for applications requiring long-lived read access
Performance Recommendations
Use SMR-Swap when:
- Read performance is critical (up to 99% faster reads)
- Multiple readers need to hold guards for extended periods (79% faster)
- Mixed read-write patterns are common (75-79% faster)
- Consistent low-latency reads are required under all conditions
- You need predictable performance even under memory pressure
- Sub-nanosecond read latency is required
- You can tolerate slightly higher write latency for better read performance
Use ArcSwap when:
- You need the absolute simplest implementation
- You need a more established, battle-tested solution
- You prefer lower write latency over read optimization
- You have very simple read patterns with minimal guard holding
Design
Type System Guarantees
-
Swapper<T>: NotClone(enforced viaArcsingle ownership)- Guarantees single writer via type system
- Can be shared across threads if wrapped in
Arc(but breaks single-writer guarantee)
-
SwapReader<T>:Clone- Multiple readers can be created and shared
- Each reader independently sees the latest value
-
LocalEpoch:!Sync(enforced by type system)- Must be stored per-thread (typically in thread-local storage)
- Ensures each thread has its own epoch tracking state
- Prevents accidental sharing across threads
API Design: Explicit LocalEpoch Management
The new API design requires explicit LocalEpoch registration:
// Reader thread setup
let local_epoch = reader.register_reader;
// All read operations require the LocalEpoch
let guard = reader.read;
let result = reader.map;
Benefits:
- Explicit control: Users understand when epoch tracking is active
- Type safety: Compiler prevents misuse of LocalEpoch across threads
- Performance: Avoids hidden thread-local lookups on every read
- Flexibility: Users can cache LocalEpoch for repeated reads
Memory Management
swmr-epoch Implementation
SMR-Swap uses a custom swmr-epoch library for memory reclamation, optimized for single-writer multiple-reader scenarios:
Core Architecture:
- Global Epoch Counter: Atomic counter advanced by writer during garbage collection
- Reader Slots: Each reader maintains a
ReaderSlotwith anAtomicUsizetracking its active epoch - Shared State:
SharedStateholds the global epoch and aMutex<Vec<Weak<ReaderSlot>>>for reader tracking - Garbage Bins: Writer maintains a
VecDeque<(usize, Vec<RetiredObject>)>grouping garbage by epoch
Key Mechanisms:
-
Pin Operation (
LocalEpoch::pin()):- Increments thread-local
pin_countcounter - On first pin (count = 0), loads current global epoch and stores it in the
ReaderSlot - Returns a
PinGuardthat keeps the thread pinned - Supports reentrancy: multiple nested pins via
pin_counttracking - When
PinGuardis dropped, decrementspin_count; if reaches zero, marks thread asINACTIVE_EPOCH
- Increments thread-local
-
Garbage Collection (
GcHandle::collect()):- Step 1: Advance global epoch via
fetch_add(1, Ordering::Acquire) - Step 2: Scan all active readers (via
Weakreferences) to find minimum active epoch - Step 3: Calculate safe reclamation point:
- If no active readers: reclaim all garbage
- Otherwise: reclaim garbage from epochs older than
min_active_epoch - 1
- Step 4: Pop garbage from front of
VecDequeuntil reaching safe point - Step 5: Clean up dead
Weakreferences in the readers list
- Step 1: Advance global epoch via
-
Automatic Reclamation:
- Configurable threshold (default: 64 items)
- After each
retire(), if total garbage exceeds threshold,collect()is automatically triggered - Can be disabled by passing
Nonetonew_with_threshold()
-
Memory Efficiency:
- Uses
VecDequefor O(1) front removal of reclaimed garbage - Weak references prevent reader slots from being kept alive indefinitely
- Automatic cleanup of dead readers during collection cycles
- Uses
Performance Characteristics:
- Single-thread read: 99% faster than arc-swap (minimal atomic operations)
- Single-thread write: 11% faster than arc-swap (direct ownership, no Mutex overhead)
- Multi-thread read: 99% faster than arc-swap (efficient epoch tracking)
- Automatic reclamation prevents unbounded garbage accumulation
Optimization Suggestions:
- For read-heavy scenarios, use
read_with_guard()to reuse Guard without re-pinning - Cache
LocalEpochin thread-local storage to avoid repeatedregister_reader()calls - Adjust reclamation threshold via
new_with_threshold()based on workload characteristics
Thread Safety
Both Swapper<T> and SwapReader<T> implement Send + Sync when T: 'static, allowing safe sharing across threads. The LocalEpoch is !Sync to prevent accidental cross-thread usage.
Limitations
- No
no_stdsupport: Requiresstdfor thread synchronization - Single writer only: The type system enforces this via
Swappernot beingClone - Epoch-based reclamation: Write latency depends on epoch advancement (typically microseconds)
- Explicit LocalEpoch management: Users must call
register_reader()and passLocalEpochto read operations
Comparison with Alternatives
vs. arc-swap
- Advantages: Better read performance, especially with held guards
- Disadvantages: Slightly higher write latency due to epoch management
vs. RwLock<T>
- Advantages: Lock-free, no contention, better for read-heavy workloads
- Disadvantages: Only supports single writer
vs. Mutex<T>
- Advantages: Lock-free, no blocking, better performance
- Disadvantages: Single writer only
Safety
All unsafe code is carefully documented and justified:
- Pointer dereferencing is guarded by epoch pins
- Memory is only accessed while guards are held
- Deferred destruction ensures no use-after-free
Testing
Run tests with:
Run benchmarks with:
License
Licensed under either of Apache License, Version 2.0 or MIT license at your option.
Contributing
Contributions are welcome! Please ensure all tests pass and benchmarks are stable before submitting.
Benchmark Details
Test Scenarios
Benchmarks cover typical workloads for single-writer multiple-reader systems:
- Single-Thread Read: Continuous reads from a single thread, tests pure read performance
- Single-Thread Write: Continuous writes from a single thread, tests write overhead
- Multi-Thread Read (2/4/8 threads): Concurrent read scalability testing
- Mixed Read-Write: 1 writer thread + N reader threads, most realistic scenario
- Batch Read: Multiple reads within a single pin, tests
read_with_guard()optimization - Read with Held Guard: Write latency while readers hold guards
- Memory Pressure: Frequent writes causing garbage accumulation, tests GC overhead
Key Findings
Read Performance:
- Via
EpochPtrandPinGuardmechanism, SMR-Swap is 99% faster than arc-swap on reads - Single-thread read achieves 0.90 ns, approaching hardware limits
- Multi-thread reads maintain consistent sub-nanosecond latency with no contention
Write Performance:
- Single-thread write is 11% faster than arc-swap (108.94 ns vs 130.87 ns)
- Benefits from
VecDequegarbage management and aggressive GC collection - Mixed workload write latency is stable (111-113 ns) with immediate GC
- Aggressive GC in
update()ensures predictable performance
Scalability:
- Performance remains stable as reader count increases, with no contention
- Multi-thread reads maintain 0.90-0.98 ns across 2/4/8 threads
- Mixed read-write scenarios show SMR-Swap is 75-79% faster than arc-swap
- Performance improves with aggressive GC strategy
Guard Holding:
- When readers hold guards, SMR-Swap write latency is much lower than arc-swap (112.68 ns vs 526.53 ns)
- 79% faster than arc-swap in this critical scenario
- Essential for applications requiring long-lived read access
Memory Pressure:
- Improved: SMR-Swap is now 8% faster than arc-swap under memory pressure (703 ns vs 764.69 ns)
- Aggressive garbage collection in
update()prevents garbage accumulation - Epoch-based reclamation is triggered immediately after each write
- Trade-off: slightly higher write latency for predictable read performance under all conditions
Use Cases
SMR-Swap is particularly well-suited for scenarios where read performance is critical and writes are relatively infrequent:
Ideal Scenarios
-
Configuration Hot Updates: Single configuration manager, multiple services reading config
- Advantage: Config read latency < 1 ns, no lock contention
- Suitable for: Microservice architectures with dynamic config distribution
-
Cache Management: Single cache update thread, multiple query threads
- Advantage: Cache queries extremely fast (0.90 ns), excellent scalability
- Suitable for: High-concurrency query scenarios
-
Routing Tables: Single routing table manager, multiple forwarding threads
- Advantage: Route lookups have no contention, supports long-lived references
- Suitable for: Network packet forwarding, load balancing
-
Feature Flags: Single administrator, multiple checking threads
- Advantage: Feature checks are extremely fast, non-blocking
- Suitable for: A/B testing, canary deployments
-
Performance-Critical Read Paths: Systems requiring minimal read latency
- Advantage: Sub-nanosecond read latency, 99% faster than arc-swap
- Suitable for: High-frequency trading, real-time data processing
Less Suitable Scenarios
-
Frequent Writes: If write frequency approaches read frequency, GC overhead increases
- Recommendation: Use
new_with_threshold(None)to disable auto-reclamation, control manually
- Recommendation: Use
-
Memory-Constrained Environments: Garbage accumulation may cause GC pauses
- Recommendation: Adjust
new_with_threshold()to a smaller value, or use arc-swap
- Recommendation: Adjust
Performance Optimization Tips
Choose optimization strategies based on workload characteristics:
-
Read-Heavy (Recommended):
- Use default configuration (threshold 64)
- Cache
LocalEpochin thread-local storage - Use
read_with_guard()for batch reads
-
Balanced Read-Write:
- Adjust threshold:
new_with_threshold(Some(128))or higher - Call
gc.collect()periodically to control GC timing
- Adjust threshold:
-
Memory-Constrained:
- Lower threshold:
new_with_threshold(Some(32)) - Or disable auto-reclamation:
new_with_threshold(None), triggercollect()manually
- Lower threshold:
Implementation Details
LocalEpoch and Pin Mechanism
- Each reader obtains a
LocalEpochviaregister_reader()(once per thread) LocalEpochcontains:Arc<ReaderSlot>: Shared slot tracking this reader's active epochArc<SharedState>: Reference to global state (epoch counter and reader list)Cell<usize>: Thread-localpin_countfor reentrancy tracking
- When
read()is called with aLocalEpoch, it callslocal_epoch.pin():- If
pin_count == 0: loads current global epoch and stores inReaderSlot - Increments
pin_countand returnsPinGuard - Supports reentrancy: multiple nested pins increment counter
- If
- When
PinGuardis dropped:- Decrements
pin_count - If
pin_countreaches zero: marks thread asINACTIVE_EPOCH(usize::MAX)
- Decrements
Atomic Operations
- Uses
EpochPtr<T>(fromswmr-epoch) for atomic pointer management EpochPtr::load(&guard)safely dereferences the pointer with lifetime bound to guardEpochPtr::store(new_value, &mut gc)atomically swaps pointer and retires old value- Uses
Ordering::Acquirefor loads andOrdering::Releasefor stores to ensure memory ordering
Guard Mechanism
SwapGuard<'a, T>holds aPinGuard<'a>to maintain the epoch pin state- Provides transparent access to the value via
Dereftrait - Lifetime
'ais tied to thePinGuard, enforced by Rust's borrow checker - Ensures value cannot be accessed after guard is dropped
PinGuardsupportsClonefor nested pinning (incrementspin_count)
Garbage Collection Pipeline
- Retire Phase: When writer calls
store(), old value is wrapped inRetiredObjectand added to garbage bin - Accumulation: Garbage is grouped by epoch in
VecDeque<(usize, Vec<RetiredObject>)> - Automatic Trigger: After each
retire(), if total garbage > threshold,collect()is automatically invoked - Collection Phase:
- Advance global epoch
- Scan all active readers to find minimum active epoch
- Calculate safe reclamation point (min_active_epoch - 1)
- Pop garbage from front of deque until reaching safe point
- Dropped
RetiredObjects automatically invoke their destructors
- Cleanup: Dead reader slots (via
Weakreferences) are cleaned up during collection
Value Lifecycle
- Writer calls
update()orswap()to replace the current value - Old value is immediately wrapped in
RetiredObjectand stored in garbage bin for current epoch - Writer can optionally call
gc.collect()to trigger garbage collection - When all readers have left the epoch, garbage is safely reclaimed and destructors are invoked
- This ensures no use-after-free while minimizing synchronization overhead