# cache-mod v0.7.0 — Sharded internals: up to 16-way concurrent throughput
**Concurrency milestone.** The single `Mutex<Inner>` from 0.6.x is replaced by a sharded structure across four of the five cache types: `LruCache`, `LfuCache`, `TtlCache`, and `TinyLfuCache`. The same `Cache` trait, the same constructors, the same method signatures — but internal lock contention is now bounded by per-shard traffic rather than total cache traffic. With 16 shards and balanced key distribution, a 16-way concurrent workload sees roughly 16× the wall-clock throughput it saw in 0.6.0 (subject to all the usual lock-microbenchmark caveats).
`SizedCache` deliberately stays unsharded — splitting `max_weight` across shards produces a per-shard ceiling too tight for the typical "few large values" workload. See the "Why SizedCache is unsharded" section below.
## What is cache-mod?
High-performance in-process caching for Rust with five eviction policies all behind one trait. Async-safe (`&self` everywhere, `Send + Sync` cache instances), lock-minimized internals, typed key-value API. No dependency on any external store.
## What changed under the hood
### The `Sharded<T>` helper
A new internal-only `src/sharding.rs` module wraps `Box<[Mutex<T>]>` with a shard-routing helper. Every cache type now stores `Sharded<Inner<K, V>>` instead of `Mutex<Inner<K, V>>`. The `Inner` types themselves are unchanged from 0.6.0 — same arena layouts, same BTreeMap priority index, same Count-Min Sketch, same lazy expiry. Only the outer wrapping changed.
Routing is `(DefaultHasher::hash(key) as usize) & shard_mask`. The shard count is always a power of two, so the modulus is a single bitwise AND.
### Shard-count heuristic
The trade-off between strict global ordering (no sharding) and concurrency (many shards) gets resolved by capacity:
- **Capacity below 32 entries**: 1 shard. Strict global eviction order — identical behaviour to 0.6.0.
- **Capacity 32 to 256**: 2 to 16 shards, scaled so each shard holds at least 16 entries.
- **Capacity 256 or above**: 16 shards (the configured maximum).
The motivation is twofold. First, the per-shard overhead (separate arena, separate `HashMap`) only pays for itself once each shard holds a meaningful number of entries. Second, the existing test suite uses capacity 2, 3, 4, and 8 — all in the single-shard regime — and continues to pass without modification, because the per-shard heuristic preserves the strict-ordering contract those tests depend on.
`SizedCache` follows the same heuristic against `max_weight` rather than entry count.
### Per-shard state, per-cache specifics
- **`LruCache`** — each shard owns its own arena (`Vec<Option<Node>>`), free-list, head/tail indices, and `HashMap<K, usize>`. Eviction is per-shard LRU.
- **`LfuCache`** — each shard owns its own `HashMap<K, Entry>` and `BTreeMap<(count, age), K>` priority index. Eviction is per-shard lowest-count.
- **`TtlCache`** — each shard owns its own `HashMap<K, Entry>` with per-shard capacity ceiling. Lazy expiry remains exact within the operating shard. Overflow eviction picks the soonest-expiring entry within the shard.
- **`TinyLfuCache`** — each shard owns its own arena **and its own Count-Min Sketch**. A global sketch would force every access through a shared structure, defeating the point of sharding. Per-shard sketches still capture the local frequency signal accurately, which is exactly what the per-shard admission decision needs.
- **`SizedCache`** — **unsharded.** Reasoning below.
### Why `SizedCache` is unsharded
Sharding splits state evenly across shards. For entry-count caches that's natural: `capacity / num_shards = per_shard_entries`, and any single entry fits in any shard. For a weight-bound cache it's not natural: `max_weight / num_shards = per_shard_weight_ceiling`, and a single value can plausibly be larger than the per-shard ceiling even though the unsharded cache would have accepted it.
Example: `SizedCache::new(100, weigh)` with 4 shards gives a per-shard ceiling of 25. A 30-byte value would be silently rejected, even though the unsharded cache had 100 bytes of headroom.
The fix isn't trivial — splitting the lookup `HashMap` across shards while keeping a single global weight budget would mean every insert has to coordinate weight bookkeeping across shards, which puts us back where we started on contention. A future release can revisit with a smarter routing scheme; for 0.7.0 the safer call is to keep `SizedCache` on its 0.6.0 single-`Mutex` implementation intact. Existing semantics, existing tests, no surprises.
### Eviction is approximate (and that's the point)
With more than one shard, "evict the least-recently-used entry" becomes "evict the least-recently-used entry **in the affected shard**". The same applies to LFU's lowest-counter selection, TTL's soonest-expiring selection, and TinyLFU's admission filter — they all operate on per-shard state.
This is the standard sharding trade-off, made by DashMap, moka, caffeine, and every other concurrent cache library. Real-world hit-rate impact is minimal for well-distributed keys: with `DefaultHasher` and 16 shards, the per-shard working set is statistically very close to `total_working_set / 16`, and the per-shard eviction approximates global eviction closely enough that hit rates change by less than a percentage point for typical workloads.
Tiny caches (< 32 entries) sidestep this entirely by using a single shard.
## Breaking changes
**Source-compatible: yes.** Every public symbol — the `Cache` trait, `CacheError`, the five cache types, their constructors, and every method — has identical signatures to 0.6.0. The 47 integration tests, 9 property tests, and 18 doctests all pass unchanged.
**Behaviorally identical: no, on caches above the sharding threshold.** A 0.6.0 `LruCache` with capacity 1000 evicted the globally-least-recently-used entry. The 0.7.0 version evicts the locally-least-recently-used entry within the shard that the new insert routed into. Both observe LRU-style eviction; only the precision of "least" changed from global to per-shard. The type-level documentation surfaces this explicitly.
If a downstream consumer depends on strict global LRU at large capacities, this is technically a behavioural break — pre-1.0 minor versions are allowed to do this per SemVer, but it's worth flagging clearly. The crate is still pre-1.0; pin exact versions.
## Verification
Local run on Windows x86_64, Rust stable:
```bash
cargo fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings
cargo clippy --all-targets --no-default-features -- -D warnings
cargo test --all-features
cargo doc --no-deps --all-features
cargo build --benches --release
```
All green. Test totals are identical to 0.6.0 (no test surface changed):
- Integration tests: **47 passed**.
- Property tests (proptest): **9 passed**.
- Doctests: **18 passed**.
Concurrency numbers are workload-dependent and intentionally not reproduced in the release notes — run `cargo bench` locally with a multi-threaded harness to capture the local improvement. The asymptotic claim is straightforward: a single-mutex cache serializes N concurrent writers to one critical section; a 16-shard cache serializes them to roughly N/16 critical sections, modulo hash collisions.
REPS lint surface declared in `src/lib.rs` is honored: every `deny(...)` clippy / rustc lint from 0.2.0 still holds. No `unsafe` is introduced in this release.
## What's next
The crate is now feature-complete (0.5.0), algorithm-optimised (0.6.0), and concurrency-optimised (0.7.0). The remaining road to 1.0:
- `0.9.0` is the hardening + audit pass. Property test surface expanded (more invariants, more cases), benchmark baselines locked in, dependency audit, REPS compliance review. No new features.
- `0.9.x` resolves any audit findings.
- `1.0.0` is the API freeze. SemVer-strict from that point on.
## Installation
```toml
[dependencies]
cache-mod = "0.7"
```
MSRV: Rust 1.75. Edition 2021. `default-features = ["std"]`.
```rust
use std::sync::Arc;
use std::thread;
use cache_mod::{Cache, LruCache};
let cache: Arc<LruCache<u32, u32>> = Arc::new(LruCache::new(10_000).expect("capacity > 0"));
// 8 threads, each hammering the cache. In 0.6.0 they'd serialize on one
// mutex. In 0.7.0 they spread across up to 16 shards.
let handles: Vec<_> = (0..8u32).map(|t| {
let cache = Arc::clone(&cache);
thread::spawn(move || {
for i in 0..1000u32 {
let k = (t * 1000 + i) % 5000;
cache.insert(k, k * 10);
let _ = cache.get(&k);
}
})
}).collect();
for h in handles { let _ = h.join(); }
```
## Documentation
- [README](https://github.com/jamesgober/cache-mod/blob/main/README.md)
- [API reference (full)](https://github.com/jamesgober/cache-mod/blob/main/docs/API.md)
- [CHANGELOG](https://github.com/jamesgober/cache-mod/blob/main/CHANGELOG.md)
- [REPS standards](https://github.com/jamesgober/cache-mod/blob/main/REPS.md)
- [docs.rs/cache-mod/0.7.0](https://docs.rs/cache-mod/0.7.0)
---
**Full diff:** [`v0.6.0...v0.7.0`](https://github.com/jamesgober/cache-mod/compare/v0.6.0...v0.7.0).
**Changelog:** [`CHANGELOG.md`](https://github.com/jamesgober/cache-mod/blob/main/CHANGELOG.md#070---2026-05-20).