genswap
A high-performance Rust library providing generation-tracked caching around ArcSwap for read-heavy workloads. Achieve 5-1500x faster reads compared to ArcSwap::load_full() by eliminating atomic refcount operations on cache hits.
Overview
In read-heavy scenarios where shared data is updated infrequently, repeatedly calling ArcSwap::load_full() incurs unnecessary overhead from atomic refcount operations. This library solves that by:
- Tracking updates with a generation counter (cheap atomic u64)
- Caching Arc references locally in each reader
- Only reloading when the generation changes
The result: cache hits only perform a single atomic load and comparison, avoiding all refcount operations.
Performance
See BENCHMARK_RESULTS.md for detailed analysis.
Single-threaded performance:
- 5.8x faster than
ArcSwap::load_full()on cache hits - 0.72ns per read vs 4.2ns for
load_full()
Multi-threaded performance (8 threads):
- 1548x faster than
ArcSwap::load_full() - 0.40ns per read vs 619ns for
load_full() - Perfect scaling with thread count (no cache line contention)
Cache hit rate impact:
- 90% hit rate → 1.8x faster
- 99% hit rate → 3.6x faster
- 99.9% hit rate → 5.2x faster
Installation
Add this to your Cargo.toml:
[]
= "0.1.0"
Usage
Basic Example
use GenSwap;
use Arc;
use thread;
// Create shared config
let config = new;
// Spawn reader threads
let handles: = .map.collect;
// Update from another thread (infrequent)
config.update;
for h in handles
Real-World Example: Configuration Management
use GenSwap;
use Arc;
API Overview
GenSwap<T>
The producer-side type that holds the data and generation counter.
// Create
let swap = new;
let swap = new_from_arc;
// Update
swap.update; // Increments generation
swap.update_arc;
// Read-copy-update
swap.rcu; // Atomic update based on current value
// Direct access (for one-off reads)
let arc = swap.load_full;
let guard = swap.load;
// Create cached readers
let mut reader = swap.reader;
// Query
let gen = swap.generation;
CachedReader<T>
The consumer-side handle that caches an Arc<T> locally.
let mut reader = swap.reader;
// Primary method - fast path on cache hit
let data: & = reader.get;
// Check if stale without reloading
if reader.is_stale
// Get owned Arc
let arc = reader.get_clone;
// Force reload even if generation matches
reader.force_refresh;
// Query
let gen = reader.cached_generation;
let cached_ref = reader.cached; // Without staleness check
When to Use
Ideal for:
- ✅ Shared configuration data (feature flags, rate limits, routing tables)
- ✅ Read-heavy workloads (>95% reads vs writes)
- ✅ Multi-threaded scenarios with frequent reads
- ✅ When Arc refcount contention is a bottleneck
- ✅ Data that changes infrequently but is read constantly
Not ideal for:
- ❌ Write-heavy workloads (use
ArcSwapdirectly) - ❌ Single-threaded applications with few reads
- ❌ Data that changes on every access
- ❌ When you need strong consistency guarantees (readers may lag by one update)
Memory Ordering
The implementation uses carefully chosen memory orderings for correctness:
// Producer: Release ordering ensures data is visible before generation
self.data.store;
self.generation.fetch_add;
// Consumer: Acquire ordering pairs with Release
let current_gen = source.generation.load;
if current_gen != cached_generation
This guarantees that when a reader observes a new generation, the corresponding data update is visible.
Thread Safety
GenSwap<T>isSend + SyncwhereT: Send + SyncCachedReader<T>isSendbut NOTSync- Each thread should have its own
CachedReaderinstance - Multiple readers can exist for the same
GenSwap - Readers own an
Arc<GenSwap<T>>, so no lifetime constraints
How It Works
Without GenSwap (ArcSwap::load_full)
Thread 1: load_full() → atomic_fetch_add(refcount) → use data → atomic_fetch_sub(refcount)
Thread 2: load_full() → atomic_fetch_add(refcount) → use data → atomic_fetch_sub(refcount)
...
Problem: Every read requires two atomic operations (increment + decrement) on a shared refcount, causing cache line contention.
With GenSwap
Setup (once per thread):
reader = swap.reader() → stores Arc<T> + generation
Hot path (millions of times):
reader.get() → load(generation) → compare → return cached Arc
↑
Only this atomic load, no refcount operations!
Update (rare):
swap.update(new_data) → store(data) → fetch_add(generation)
Solution: Cached readers only check a generation counter (single atomic load). The cached Arc's refcount doesn't change, avoiding contention.
Benchmarks
Run benchmarks with:
See BENCHMARK_RESULTS.md for detailed analysis.
Comparison with Alternatives
| Approach | Single-thread | Multi-thread (8 cores) | Notes |
|---|---|---|---|
Arc::clone() |
3.88 ns | High contention | Direct clone, high overhead |
ArcSwap::load() |
3.03 ns | Moderate contention | Returns Guard, no refcount |
ArcSwap::load_full() |
4.20 ns | 619 ns | Returns Arc, refcount ops |
CachedReader::get() |
0.72 ns | 0.40 ns | Best performance |
Examples
See the examples/ directory:
# Basic usage example
Testing
# Run all tests
# Run with output
# Run specific test
Limitations
-
Generation counter wrapping: The u64 generation counter will wrap after 2^64 updates. In practice, this is not a concern (would take millions of years at 1M updates/sec).
-
Memory overhead: Each
CachedReaderstores anArc<T>and a u64, adding ~16-24 bytes per reader. -
Eventual consistency: Readers may observe stale data for a brief period between updates. They're guaranteed to see the update on their next
get()call. -
Not Clone:
CachedReaderis intentionally notCloneto prevent cache aliasing. Each reader should be independent.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Acknowledgments
Built on top of the excellent arc-swap crate by Michal 'vorner' Vaner.
Inspired by the need for high-performance shared configuration in read-heavy systems.
See Also
arc-swap- Atomic Arc swap operationsleft-right- Alternative read-optimized concurrency primitiveevmap- Eventually consistent, lock-free map