ax-cache 0.1.0

Hardware-aware concurrent cache engine. Sharded SwissTable with S3-FIFO eviction, fueled by axhash.
Documentation

ax-cache

A concurrent in-memory cache for Rust with admission-controlled eviction.

Crates.io Docs.rs License: MIT


What it is

  • Sharded hashbrown map with one parking_lot::RwLock per shard.
  • S3-FIFO eviction (small-queue / main-queue split with frequency-based promotion).
  • TinyLFU admission (Count-Min Sketch) keeps one-hit-wonders from displacing established hot keys.
  • AES-NI accelerated hashing via axhash when the CPU supports it; portable fallback otherwise.

Install

[dependencies]
ax-cache = "0.1"

No feature flags. AES detection is automatic at runtime.


API at a glance

use ax_cache::{Cache, InsertOutcome};
use std::time::Duration;

// Construction
let cache: Cache<String, u64> = Cache::new(10_000);            // auto-shard
let cache: Cache<String, u64> = Cache::with_shards(10_000, 16); // explicit

// Reads — accept any borrowed form (e.g. `&str` for `String` keys, no allocation).
let value: Option<u64>    = cache.get("alpha");
let exists: bool          = cache.contains_key("alpha");
let removed: Option<u64>  = cache.remove("alpha");

// Writes — return InsertOutcome (Inserted / Updated / Rejected).
let r: InsertOutcome = cache.insert("alpha".to_string(), 1);
let r: InsertOutcome = cache.insert_with_ttl("session".to_string(), 1, Duration::from_secs(60));
assert!(r.is_present()); // == Inserted | Updated

// Bulk
cache.clear();
let n: usize = cache.len();
let empty: bool = cache.is_empty();

// Observability
let m = cache.metrics(); // hits, misses, insertions, evictions, rejections

InsertOutcome makes write semantics explicit:

Variant Meaning is_present()
Inserted Key was new and admitted into the cache. true
Updated Key already existed; value (and TTL) was replaced. true
Rejected Cache was full; admission filter declined this key. false

Examples

Basic

use ax_cache::Cache;

let cache: Cache<String, Vec<u8>> = Cache::new(10_000);

cache.insert("user:1001".to_string(), vec![1, 2, 3]);

if let Some(data) = cache.get("user:1001") {
    println!("found: {} bytes", data.len());
}

TTL

use ax_cache::Cache;
use std::time::Duration;

let cache: Cache<String, String> = Cache::new(50_000);
cache.insert_with_ttl(
    "session:abc".to_string(),
    "token".to_string(),
    Duration::from_secs(30),
);

assert!(cache.get("session:abc").is_some());

Expired entries are reclaimed lazily on access. To reclaim them proactively in the background, opt into the maintenance thread:

use ax_cache::{Cache, MaintenanceConfig};
use std::time::Duration;

let cache: Cache<u64, Vec<u8>> = Cache::new(100_000);
cache.enable_maintenance(MaintenanceConfig {
    sweep_interval: Duration::from_millis(250),
    max_sweep_per_shard: 128,
});

The maintenance thread stops automatically when the cache is dropped.

Shared across threads

use ax_cache::Cache;
use std::sync::Arc;
use std::thread;

let cache = Arc::new(Cache::<u64, u64>::new(100_000));

let handles: Vec<_> = (0..8u64).map(|t| {
    let c = Arc::clone(&cache);
    thread::spawn(move || {
        for i in 0..10_000u64 {
            c.insert(t * 100_000 + i, i);
        }
    })
}).collect();

for h in handles { h.join().unwrap(); }

Metrics

use ax_cache::Cache;

let cache: Cache<u64, u64> = Cache::new(10_000);
// ... use cache ...

let m = cache.metrics();
let total = m.hits + m.misses;
if total > 0 {
    println!("hit ratio: {:.2}%", m.hits as f64 * 100.0 / total as f64);
}
println!("inserted={} evicted={} rejected={}",
    m.insertions, m.evictions, m.rejections);

Performance

The numbers below come from running the benchmarks in this repository on an Apple Silicon laptop. They are reproducible — run cargo bench to verify on your machine. Performance varies with workload, hardware, and concurrency level; treat these as ballpark figures, not guarantees.

Single-thread microbench (uncontended)

cargo bench --bench single_thread (Cache<u64, u64>):

Operation Latency Throughput
get_hit ~8 ns ~120 Mops/s
get_miss ~14 ns ~70 Mops/s
insert_update ~12 ns ~85 Mops/s
insert_new ~70 ns ~14 Mops/s

insert_new is dominated by allocation, hashing, queue admission, and (on overflow) eviction work; insert_update only mutates an existing slot.

Contention sweep (Zipfian, write-heavy)

cargo bench --bench contention with cap=1M, universe=10M, zipf α=0.99, 5% writes:

Threads Throughput P50 P99 Hit ratio
1 ~9 Mops/s ~42 ns ~250 ns ~0.83
4 ~30 Mops/s ~83 ns ~290 ns ~0.84
8 ~35 Mops/s ~125 ns ~420 ns ~0.84

30-minute soak (realistic workload)

8 threads, Cache<u64, Arc<[u8]>> with 256-byte payloads, cap=1M, universe=10M, zipf α=0.99, 5% writes:

Metric Value
Throughput 11.4 Mops/s
Hit ratio 0.84
P50 500 ns
P99 29 µs
P999 101 µs
Peak RSS ~150 MB
Final entries 1,000,000

The gap between the contention bench (u64 values, in-memory tight loop) and the soak test (Arc<[u8]> payloads, longer run) reflects DRAM bandwidth and OS scheduler effects on long-running workloads. P99+ spikes in the soak test are dominated by OS scheduling, not cache code.

Hit ratio vs Zipfian skew

α (skew) Hit ratio
0.99 (typical) ~0.84
1.2 (heavy) ~0.96

Hit ratio scales with workload skew: the more concentrated the access pattern on a few hot keys, the more effectively S3-FIFO + TinyLFU preserves them.

Scan resistance

cargo bench --bench scan_resistance (cap=10k, 1k hot keys + 100k scan keys): hot-key survival rate is ~90–100% depending on admission pressure. Cold scan keys are dropped from the small queue within a handful of evictions; hot keys live in the main queue.


When to use ax-cache

Good fit:

  • Server-side hot-data caches with concentrated access patterns (Zipfian, power-law).
  • Workloads that need bounded memory with predictable eviction.
  • Scenarios where occasional one-hit-wonder reads should not pollute the cache (admission filtering helps).

Less good fit:

  • Caches where every key matters equally and rejection is unacceptable (consider a plain RwLock<HashMap> or dashmap).
  • Persistent / off-heap caches (this is in-memory only).
  • Workloads dominated by very large values where memory bandwidth is the bottleneck rather than hash table operations.

Limitations / honest caveats

  • capacity is a soft target. The cache holds approximately capacity entries in steady state; it may briefly sit a few entries above capacity between inserts.
  • No async API. All operations are synchronous and lock-based.
  • Values are cloned on get. If V is expensive to clone, wrap it in Arc<T>.
  • Tail latency at high concurrency is OS-bound on the slowest percentiles. P99 spikes in long-running workloads correlate with scheduler events (preemption, page faults), not with cache internals.
  • No serialization / persistence. Drop the cache, lose the data.

License

MIT — see LICENSE.