fast-cache
fast-cache is an embedded-first, in-memory key-value database. The default
crate build exposes a Rust API for direct in-process use. The optional
server feature builds fast-cache-server, a Redis-compatible TCP server with
WAL and snapshot persistence.
The default build uses conservative safe memory paths. Reviewed lower-overhead
paths are available only when the unsafe feature is enabled.
Embedded Use
Use [storage::EmbeddedStore] when several threads should share one cache
handle. Keys and values are byte vectors. TTL arguments are milliseconds; pass
None for persistent values.
use EmbeddedStore;
let cache = new;
cache.set;
assert_eq!;
cache.delete;
assert!;
Batch APIs group routing work and return results in key order:
use EmbeddedStore;
let cache = new;
cache.batch_set;
let values = cache.batch_get;
assert_eq!;
assert_eq!;
Use [storage::LocalEmbeddedStore] when each worker owns its shards and calls
the cache through an exclusive &mut handle. This is the lowest-overhead Rust
API and is enabled by the default sharded feature.
This ownership model is intentionally different from shared reference-counted
caches such as DashMap. Shared caches let every worker clone one map handle
and access any key through internal synchronization. fast-cache instead
routes each key to a shard and gives the owning worker exclusive access to that
shard's cache slab. Use EmbeddedStore to build and route the cache, then
split it with into_local_stores when workers are pinned.
Use [storage::SharedEmbeddedStore] when every worker needs to clone one
handle and reach every key. It is cache-padded and lock-striped like DashMap,
while still sharing the same embedded shard implementation. EmbeddedStore,
SharedEmbeddedStore, and LocalEmbeddedStore all require power-of-two shard
counts so routing can use shift-based striping.
use ;
let shared = with_route_mode;
let mut stores = from_embedded.into_stores;
let mut local = stores.pop.expect;
local.set;
assert_eq!;
Session APIs keep related KV-cache chunks on the same route and can pack values into contiguous buffers:
use ;
let cache = new;
let mut write = with_capacity;
write.push_owned_record;
write.push_owned_record;
cache.batch_set_session_packed_no_ttl;
let keys = vec!;
let batch = cache.batch_get_session_packed;
assert!;
assert_eq!;
Embedded Benchmark Highlights
The embedded API is optimized for thread-per-core deployments. The lowest
overhead path is [storage::LocalEmbeddedStore], where each worker owns its
shards and calls the cache through an exclusive &mut handle. For applications
that need a cloneable shared handle, [storage::SharedEmbeddedStore] provides
lock-striped access with the same embedded shard implementation.
Current Linux release benchmarks use pinned workers, 100k keys for
small values, 10s measured runs, and latency sampling disabled for
max-throughput rows. The fc-embed rows below are direct owner-local embedded
stores with no TTL and no eviction:
| Value | Mix | fast-cache direct | DashMap | Moka |
|---|---|---|---|---|
| 64B | GET | 422.82M ops/s | 54.84M ops/s | 2.65M ops/s |
| 64B | SET | 114.85M ops/s | 35.40M ops/s | 1.51M ops/s |
| 64B | 80/20 | 253.88M ops/s | 45.23M ops/s | 4.52M ops/s |
| 4KiB | 80/20 | 21.03M ops/s | 6.90M ops/s | 3.77M ops/s |
Capacity-bounded rows stress a different path. With LRU enabled, 64B,
read-only, 16 workers, and 25% resident capacity, fc-embed reaches
473.7M ops/s. On large write-heavy LRU workloads, Moka can be faster because
value materialization and eviction bookkeeping dominate the small-value shard
hot path; for example, a 64KiB write-only LRU row measured fast-cache around
19.1 GB/s and Moka around 33.3 GB/s.
Treat these as workload-specific reference points, not universal constants.
The full embedded matrix, LRU/TTL rows, CSV artifact paths, and reproduction
commands live in
benchmarks/FAST_CACHE_EMBEDDED_RELEASE.md.
API Map
The most commonly used Rust APIs live in [storage]:
- [
storage::EmbeddedStore]: shared, sharded store for byte-string keys. - [
storage::SharedEmbeddedStore]: cloneable, lock-striped embedded store for cross-worker shared handles. - [
storage::LocalEmbeddedStore]: worker-local store for thread-per-core workers. - [
storage::EmbeddedRouteMode], [storage::EmbeddedKeyRoute], and [storage::PreparedPointKey]: routing and precomputed lookup helpers. - [
storage::PackedBatch] and [storage::PackedSessionWrite]: contiguous batch read/write payloads. - [
storage::TierStatsSnapshot], [storage::ShardStatsSnapshot], and [storage::WalStatsSnapshot]: runtime statistics.
Core key/value methods include set, set_value_bytes, batch_set, get,
get_view, batch_get, batch_get_view, batch_get_packed, delete,
exists, ttl_seconds, pttl_millis, expire, persist, len,
key_snapshot, stored_bytes, stats_snapshot, and
process_maintenance.
Session-oriented methods include batch_set_session_owned_no_ttl,
batch_set_session_packed_no_ttl, batch_get_session,
batch_get_session_view, batch_get_session_packed,
prepare_point_key, and their routed or prehashed variants.
Redis object helpers are exposed on [storage::EmbeddedStore] for hashes,
lists, sets, and sorted sets. They use Redis-style wrong-type behavior through
[storage::RedisObjectResult]. The public method families are hset/hget
and related hash methods, lpush/rpush/lrange and related list methods,
sadd/srem/smembers and related set methods, and
zadd/zrange/zscore and related sorted-set methods.
Other modules:
- [
config]:FastCacheConfig,EvictionPolicy, tier sizing, persistence configuration, and TOML load/store helpers. - [
protocol]: RESP and native fast protocol codecs. - [
persistence]: snapshot loading/writing and WAL runtime support. - [
cuda]: GPU-facing configuration and transfer descriptors. - [
server]: TCP listener and connection handling, available withserver.
Server Use
Install and run the optional server binary:
From a checkout:
The server listens on 127.0.0.1:6380 by default and accepts RESP clients:
|
|
|
The current server command catalog implements the redesigned string-key hot
path for GET and SET. Additional RESP commands should be added through the
per-command module pattern in src/commands/README.md so parser, storage, and
direct-server behavior stay local to the command.
Configuration
Load configuration from TOML with [config::FastCacheConfig]:
use Path;
use FastCacheConfig;
The repository includes fast-cache.toml.example with the supported fields.
WAL TCP Export
The server can stream live WAL frames in addition to writing segments to disk. Two modes are supported:
connect: fast-cache connects to one downstream collector.listen: fast-cache binds a subscription port and fans out live frames to authenticated subscribers.
[]
= true
= "listen"
= "127.0.0.1:7630"
= "replace-me"
= 16384
= 64
= false
The stream uses the same framed WAL bytes as disk segments: FCW2 magic,
flags, payload length, payload, and CRC. This is a live export path, not a
catch-up or replay API; disk WAL segments and snapshots remain the recovery
source. With backpressure_on_full = false, a slow TCP exporter can drop live
export frames while disk WAL append continues. Set it to true only when the
export consumer is allowed to backpressure writes. Auth tokens are plaintext
inside the TCP stream, so use localhost, a private network, or a TLS tunnel
across trust boundaries.
Native Replication
Native replication is separate from WAL and Redis PSYNC. It ships storage-level mutation batches for async read replicas and service subscribers:
[]
= true
= "primary"
= "127.0.0.1:7631"
= "replace-me"
= "none"
= 3
= "batch"
= 64
= 262144
= 250
= 67108864
= 1048576
send_policy = "immediate" flushes every write as a one-record mutation
batch. send_policy = "batch" flushes by record count, byte size, or delay.
Replication defaults to compression = "none" because realistic write-sync
payloads usually compress poorly enough that zstd costs more CPU than it saves
on the hot path. Use compression = "zstd" only when bandwidth is the limiting
resource and benchmark data for the payload shape justifies it.
Primary export runs through shard-local queues and batchers. This keeps write
sync aligned with fast-cache's owned-shard architecture: a saturated replication
lane can backpressure its shard without centralizing all writes behind one
global replication queue. Subscribers still receive one FCRP frame stream, and
the per-shard sequence watermarks make replay idempotent.
Replicas track per-shard sequence watermarks so they can apply mutations
idempotently and catch up from backlog or snapshot-plus-delta.
Feature Flags
embedded: default embedded Rust database API.sharded: default sharded storage and owner-local embedded API.server: builds the Redis-compatiblefast-cache-serverbinary.monoio: enables the Linux-only server runtime selected withFAST_CACHE_USE_MONOIO=1. The server still usesbytes-handofffor connection read buffering, using its monoio adapter on Linux. WithFAST_CACHE_DIRECT_SHARD_PORTS=1, the server also binds one listener per shard, starting atFAST_CACHE_DIRECT_SHARD_BASE_PORTor the fanout port + 1, so direct clients can route while fanout RESP/FCNP stays available. Monoio writer experiments are selected withFAST_CACHE_MONOIO_SAFE_WRITER=inline|split|writev; Tokio remains the portable default runtime.telemetry: integrates withfast-telemetry.cuda: exposes GPU-facing configuration and transfer descriptors.fast-point-map: enables the experimental point-map storage path.unsafe: opts into reviewed unsafe hot paths for lower overhead.
Safety
The unsafe feature keeps the same public API while enabling reviewed hot
paths for server I/O, protocol codecs, flat-map indexing, and owner-local read
views. See SAFETY.md for the unsafe inventory, invariants, and safe
fallbacks.
License
Apache-2.0. See the repository LICENSE file.