
Murrdb: A RocksDB-based NVMe/S3 cache for AI inference workloads. A faster Redis replacement, optimized for batch low-latency zero-copy reads and writes.
This
README.mdis 99%[^1] human written.
[^1]: Used only for grammar and syntax checking.
What is Murr?

Murr is a caching layer for ML/AI data serving that sits between your batch data pipelines and inference apps:
- Tiered storage: hot data lives in memory, cold data stays on disk with S3-based replication. It's 2026, RAM is expensive - keep only the hot stuff there.
- Batch-in, batch-out: native batch reads and writes over columnar storage, with no per-row overhead. Dumping 1GB Parquet/Arrow files into the ingestion API is a perfectly valid use case.
# yes this works for batch writes
curl -d @0000.parquet -H "Content-Type: application/vnd.apache.parquet" \
-XPUT http://localhost:8080/api/v1/table/yolo/write
- Zero-copy wire protocol: no conversion needed when building
np.ndarray,pd.DataFrameorpt.Tensorfrom API responses. Sure, Redis is fast, but parsing its replies is not (especially in Python!).
=
# look mom, zero copy!
- Stateless: Murr is not a database - all state is persisted on S3. When a Redis node gets evicted, you're cooked. Murr just self-bootstraps from block storage.
Murr shines when:
- your data is heavy and tabular: that giant Parquet dump on S3 your AI inference or ML prep job produces? Perfect fit.
- reads are batched: pulling 100 columns across 1000 documents your agent wants to analyze? Great!
- you care about costs: sure, Redis with 1TB of RAM will work fine, but disk/S3 offloading is operationally simpler and way cheaper.
Short quickstart (see full example):
uv pip install murrdb
and then
= # embedded local instance
# fetch columns for a batch of document keys
=
# Output:
# score category
# 0 0.95 ml
# 1 0.72 infra
# 2 0.68 ops
Why Murr?
TLDR: latency, simplicity, cost -- pick two. Murrdb tries to nail all three: fastest, cheapest, and easiest to operate. A bold claim, I know.

For the typical use case of read N datapoints across M documents (an agent reading document attributes, an ML ranker fetching feature values), on top of being the fastest, Murrdb:
- vs Redis: is persistent (S3 is the new filesystem) and can offload cold data to local NVMe.
- vs embedded RocksDB: no need to build data sync between producer jobs and inference nodes yourself. Murrdb was designed to be distributed from the start.
- vs DynamoDB: roughly 10x cheaper, since you only pay for CPU/RAM, not per query.
Not being a general-purpose database, it tries to be friendly to the everyday pain points of ML/AI engineers:
- First-class Python support:
pip install murrdb, then map to/from Numpy/Pandas/Polars/Pytorch arrays with zero copy. - Sparse columns: when a column has no data, it takes up zero bytes. Unlike the packed feature blob approach, where null columns aren't actually null.
Why NOT Murr?
Murr is not a general-purpose database:
- OLTP workloads: if you have relations, transactions, and per-row reads/writes, go with Postgres.
- Analytics: aggregating over entire tables to produce reports? Pick Clickhouse, BigQuery, or Snowflake.
- General-purpose caching: need to cache user session data for a web app? Use Redis.
- Feature store: yes, it kinda looks like one — but Murrdb doesn't govern how you compute and store your data. Murr is an online serving layer, and can be a part of both internal feature stores and open-source ones like Feast, Hopsworks, and Databricks Feature Store.
[!WARNING] Murr is still in its early days and may not be stable enough for your use case yet. But it's improving quickly.
Quickstart
=
# define table schema
=
# write a batch of documents
=
# fetch specific columns for a few keys
=
# Output:
# score category
# 0 0.95 ml
# 1 0.72 infra
# 2 0.68 ops
Benchmarks
Full benchmark suite with reproduction steps: murrdb/murr-benchmark.
We benchmark a typical ML Ranking use case: 100M rows, 10 float32 columns, 1000 random key lookups per iteration. The suite includes two complementary harnesses:
- Rust (Criterion) — measures raw service throughput as time-to-last-byte. Reads
select_rowsrandom keys per iteration and consumes raw response bytes without decoding. This isolates the storage/network layer and shows the theoretical ceiling of each backend. - Python (pyperf) — measures end-to-end latency as experienced by a Python ML client. Performs the same random-key reads but includes full protocol decoding and conversion into a
pd.DataFrame. This captures the real cost a user pays: protocol parsing, byte deserialization, and DataFrame construction.
Backends and data layouts tested:
- murr (native, Arrow IPC) — row-wise storage on top of RocksDB SSTables, with zero-copy reads and projection pushdown. Two modes:
mmap(PlainTable, in-memory) andblock(BlockTable, NVMe-backed). - Redis / Valkey / Dragonfly, blob — all features packed into a single
MGETblob. Compact and cache-friendly, but always reads all columns. - Redis / Valkey / Dragonfly, HSET — Feast-style hash-per-row: each feature is a separate HSET field. Flexible, but per-field overhead adds up.
- PostgreSQL blob — BYTEA column with packed features.
- PostgreSQL col-per-feature — explicit typed columns, one per feature.
Rust time-to-last-byte
All backends run on the same machine; container-backed ones use Docker via testcontainers. Memory is the container TOTAL (RSS+SHR) delta around the load phase. Net TX is server-to-client bytes per read. disk variants are cgroup-capped at 2 GiB RAM to force disk reads.
Blob layouts
| Engine | Layout | Memory | Disk | Ingestion | p50 latency | Net TX/read |
|---|---|---|---|---|---|---|
| murr 0.2.0 mmap | native | 7.5 GiB | 5.9 GiB | 948K rows/s | 268 µs | 42 KiB |
| Dragonfly 1.31 | blob | 7.3 GiB | — | 4.01M rows/s | 296 µs | 46 KiB |
| Valkey 8.1 | blob | 8.9 GiB | — | 1.58M rows/s | 657 µs | 46 KiB |
| Redis 8.6.3 | blob | 9.6 GiB | — | 1.43M rows/s | 815 µs | 46 KiB |
| pgsql 18.4 | blob | 24.0 GiB | 12.8 GiB | 400K rows/s | 5.69 ms | 62 KiB |
Hash / col-per-feature layouts
| Engine | Layout | Memory | Disk | Ingestion | p50 latency | Net TX/read |
|---|---|---|---|---|---|---|
| murr 0.2.0 mmap | native | 7.5 GiB | 5.9 GiB | 948K rows/s | 268 µs | 42 KiB |
| Dragonfly 1.31 | hash | 20.1 GiB | — | 650K rows/s | 2.82 ms | 213 KiB |
| Valkey 8.1 | hash | 19.4 GiB | — | 378K rows/s | 3.20 ms | 210 KiB |
| Redis 8.6.3 | hash | 20.1 GiB | — | 398K rows/s | 3.25 ms | 210 KiB |
| pgsql 18.4 | col | 23.4 GiB | 12.7 GiB | 384K rows/s | 6.54 ms | 86 KiB |
Disk mode (2 GiB RAM cap)
| Engine | Layout | Memory | Disk | Ingestion | p50 latency | Net TX/read |
|---|---|---|---|---|---|---|
| murr 0.2.0 block | native | 1.7 GiB | 5.8 GiB | 1.00M rows/s | 6.33 ms | 42 KiB |
| pgsql 18.4 | blob | 2.0 GiB | 12.8 GiB | 329K rows/s | 189 ms | 62 KiB |
| pgsql 18.4 | col | 2.0 GiB | 12.7 GiB | 327K rows/s | 217 ms | 86 KiB |
Python end-to-end
Measures full round-trip latency including protocol decoding and pd.DataFrame conversion. Ingestion throughput includes Python-side serialization and batch writes.
| Engine | Layout | Ingestion | Read latency |
|---|---|---|---|
| murr 0.1.8 | columnar | 2.34M rows/s | 1.38 ms |
| Redis 8.6.1 | blob | 136K rows/s | 2.42 ms |
| Redis 8.6.1 | HSET | 61K rows/s | 9.39 ms |
| RocksDB | blob | 622K rows/s | 4.90 ms |
| PostgreSQL 17 | blob | 356K rows/s | 10.8 ms |
| PostgreSQL 17 | col-per-feature | 143K rows/s | 10.6 ms |
Murr is ~3x faster than Redis on packed-blob reads and ~12x faster on Feast-style HSET layout, while using ~3x less RAM than the HSET equivalent. Dragonfly's packed-blob mode is close on latency, but still pays the protocol-parsing cost on the client.
Roadmap
No ETAs, but at least you can see where things stand:
- HTTP API
- Arrow Flight gRPC API
- API for data ingestion
- Storage Directory interface (which is heavily inspired by Apache Lucene)
- Segment read/writes (again, inspired by Apache Lucene)
- Python embedded murrdb, so we can make a cool demo
- Benchmarking harness: Redis support, Feast and feature-blob styles
- Win at your own benchmark (this was surprisingly hard btw)
- Support for
utf8,bool, signed/unsignedint8/16/32/64,float32andfloat64datatypes - Python remote API client (sync + async)
- Docker image
- Support most popular Arrow numerical types (signed/unsigned int 8/16/32/64, float 16, date-time)
- Array datatypes (e.g. Arrow
list), so you can store embeddings - Sparse columns
- Add RocksDB and Postgres to the benchmark harness
- Apache Iceberg and the very popular
parquet dump on S3data catalog support
Development
License
Apache 2.0