
[](https://github.com/shuttie/murr/actions)
[](https://opensource.org/licenses/Apache-2.0)



<p align="center">
<a href="#what-is-murr">๐ฑ What is Murr?</a> · <a href="#why-murr">๐ Why Murr?</a> · <a href="#why-not-murr">๐ซ Why NOT Murr?</a> · <a href="#quickstart">โก Quickstart</a> · <a href="#benchmarks">๐ Benchmarks</a> · <a href="#roadmap">๐บ Roadmap</a>
</p>
**Murrdb**: A RocksDB-based NVMe/S3 cache for AI inference workloads. A faster Redis replacement, optimized for batch low-latency zero-copy reads and writes.
> This `README.md` is 99% human written.
## What is Murr?

Murr is a caching layer for ML/AI data serving that sits between your batch data pipelines and inference apps:
- **Tiered storage**: hot data lives in memory, cold data stays on disk with S3-based replication. It's 2026, RAM is expensive - keep only the hot stuff there.
- **Batch-in, batch-out**: native batch reads and writes over columnar storage, with no per-row overhead. Dumping 1GB Parquet/Arrow files into the ingestion API is a perfectly valid use case.
```shell
# yes this works for batch writes
curl -d @0000.parquet -H "Content-Type: application/vnd.apache.parquet" \
-XPUT http://localhost:8080/api/v1/table/yolo/write
```
- **Zero-copy wire protocol**: no conversion needed when building `np.ndarray`, `pd.DataFrame` or `pt.Tensor` from API responses. Sure, Redis is fast, but parsing its replies is not (especially in Python!).
```python
result = db.read("docs", keys=["doc_1", "doc_3", "doc_5"], columns=["score", "category"])
print(result.to_pandas()) ```
- **Stateless**: Murr is not a database - all state is persisted on S3. When a Redis node gets evicted, you're cooked. Murr just self-bootstraps from block storage.
Murr shines when:
* **your data is heavy and tabular**: that giant Parquet dump on S3 your AI inference or ML prep job produces? Perfect fit.
* **reads are batched**: pulling 100 columns across 1000 documents your agent wants to analyze? Great!
* **you care about costs**: sure, Redis with 1TB of RAM will work fine, but disk/S3 offloading is operationally simpler and way cheaper.
Short quickstart (see [full example](#quickstart)):
```shell
uv pip install murrdb
```
and then
```python
from murr.sync import Murr
db = Murr.start_local(cache_dir="/tmp/murr") # embedded local instance
# fetch columns for a batch of document keys
result = db.read("docs", keys=["doc_1", "doc_3", "doc_5"], columns=["score", "category"])
print(result.to_pandas())
# Output:
# score category
# 0 0.95 ml
# 1 0.72 infra
# 2 0.68 ops
```
## Why Murr?
TLDR: latency, simplicity, cost -- pick two. Murrdb tries to nail all three: fastest, cheapest, and easiest to operate. A bold claim, I know.

For the typical use case of `read N datapoints across M documents` (an agent reading document attributes, an ML ranker fetching feature values), on top of being the fastest, Murrdb:
- vs **[Redis](https://redis.io/)**: is persistent (S3 is the new filesystem) and can offload cold data to local NVMe.
- vs embedded **[RocksDB](https://rocksdb.org/)**: no need to build data sync between producer jobs and inference nodes yourself. Murrdb was designed to be distributed from the start.
- vs **[DynamoDB](https://aws.amazon.com/dynamodb/)**: roughly 10x cheaper, since you only pay for CPU/RAM, not per query.
Not being a general-purpose database, it tries to be friendly to the everyday pain points of ML/AI engineers:
* **First-class Python support**: `pip install murrdb`, then map to/from Numpy/Pandas/Polars/Pytorch arrays with zero copy.
* **Sparse columns**: when a column has no data, it takes up zero bytes. Unlike the packed feature blob approach, where null columns aren't actually null.
## Why NOT Murr?
Murr is not a general-purpose database:
* **OLTP workloads**: if you have relations, transactions, and per-row reads/writes, go with [Postgres](https://www.postgresql.org/).
* **Analytics**: aggregating over entire tables to produce reports? Pick [Clickhouse](https://clickhouse.com/), [BigQuery](https://cloud.google.com/bigquery), or [Snowflake](https://www.snowflake.com/).
* **General-purpose caching**: need to cache user session data for a web app? Use [Redis](https://redis.io/).
* **Feature store**: yes, it kinda looks like one โ but Murrdb doesn't govern how you compute and store your data. Murr is an online serving layer, and can be a part of both internal feature stores and open-source ones like [Feast](https://feast.dev/), [Hopsworks](https://www.hopsworks.ai/), and [Databricks Feature Store](https://docs.databricks.com/en/machine-learning/feature-store/index.html).
> [!WARNING]
> Murr is still in its early days and may not be stable enough for your use case yet. But it's improving quickly.
## Quickstart
```python
import pandas as pd
import pyarrow as pa
from murr import TableSchema, ColumnSchema, DType
from murr.sync import Murr
db = Murr.start_local(cache_dir="/tmp/murr")
# define table schema
schema = TableSchema(
key="doc_id", # the key
columns={
"doc_id": ColumnSchema(dtype=DType.UTF8, nullable=False),
"score": ColumnSchema(dtype=DType.FLOAT32),
"category": ColumnSchema(dtype=DType.UTF8),
},
)
db.create_table("docs", schema)
# write a batch of documents
df = pd.DataFrame.from_dict({
"doc_id": ["doc_1", "doc_2", "doc_3", "doc_4", "doc_5"],
"score": [0.95, 0.87, 0.72, 0.91, 0.68],
"category": ["ml", "search", "infra", "ml", "ops"],
})
db.write("docs", pa.Table.from_pandas(df))
# fetch specific columns for a few keys
result = db.read("docs", keys=["doc_1", "doc_3", "doc_5"], columns=["score", "category"])
print(result.to_pandas())
# Output:
# score category
# 0 0.95 ml
# 1 0.72 infra
# 2 0.68 ops
```
## Benchmarks
Full benchmark suite with reproduction steps: [murrdb/murr-benchmark](https://github.com/murrdb/murr-benchmark).
We benchmark a typical `ML Ranking` use case: 100M rows, 10 `float32` columns, 1000 random key lookups per iteration. The suite includes two complementary harnesses:
* **Rust (Criterion)** โ measures raw service throughput as time-to-last-byte. Reads `select_rows` random keys per iteration and consumes raw response bytes without decoding. This isolates the storage/network layer and shows the theoretical ceiling of each backend.
* **Python (pyperf)** โ measures end-to-end latency as experienced by a Python ML client. Performs the same random-key reads but includes full protocol decoding and conversion into a `pd.DataFrame`. This captures the real cost a user pays: protocol parsing, byte deserialization, and DataFrame construction.
Backends and data layouts tested:
* **murr** (native, Arrow IPC) โ row-wise storage on top of RocksDB SSTables, with zero-copy reads and projection pushdown. Two modes: `mmap` (PlainTable, in-memory) and `block` (BlockTable, NVMe-backed).
* **Redis / Valkey / Dragonfly, blob** โ all features packed into a single `MGET` blob. Compact and cache-friendly, but always reads all columns.
* **Redis / Valkey / Dragonfly, HSET** โ [Feast](https://feast.dev/)-style hash-per-row: each feature is a separate HSET field. Flexible, but per-field overhead adds up.
* **PostgreSQL blob** โ BYTEA column with packed features.
* **PostgreSQL col-per-feature** โ explicit typed columns, one per feature.
### Rust time-to-last-byte
All backends run on the same machine; container-backed ones use Docker via `testcontainers`. Memory is the container `TOTAL` (RSS+SHR) delta around the load phase. Net TX is server-to-client bytes per read. `disk` variants are cgroup-capped at 2 GiB RAM to force disk reads.
#### Blob layouts
| murr 0.2.0 mmap | native | 7.5 GiB | 5.9 GiB | 948K rows/s | 268 ยตs | 42 KiB |
| Dragonfly 1.31 | blob | 7.3 GiB | โ | 4.01M rows/s | 296 ยตs | 46 KiB |
| Valkey 8.1 | blob | 8.9 GiB | โ | 1.58M rows/s | 657 ยตs | 46 KiB |
| Redis 8.6.3 | blob | 9.6 GiB | โ | 1.43M rows/s | 815 ยตs | 46 KiB |
| pgsql 18.4 | blob | 24.0 GiB | 12.8 GiB | 400K rows/s | 5.69 ms | 62 KiB |
#### Hash / col-per-feature layouts
| murr 0.2.0 mmap | native | 7.5 GiB | 5.9 GiB | 948K rows/s | 268 ยตs | 42 KiB |
| Dragonfly 1.31 | hash | 20.1 GiB | โ | 650K rows/s | 2.82 ms | 213 KiB |
| Valkey 8.1 | hash | 19.4 GiB | โ | 378K rows/s | 3.20 ms | 210 KiB |
| Redis 8.6.3 | hash | 20.1 GiB | โ | 398K rows/s | 3.25 ms | 210 KiB |
| pgsql 18.4 | col | 23.4 GiB | 12.7 GiB | 384K rows/s | 6.54 ms | 86 KiB |
#### Disk mode (2 GiB RAM cap)
| murr 0.2.0 block | native | 1.7 GiB | 5.8 GiB | 1.00M rows/s | 6.33 ms | 42 KiB |
| pgsql 18.4 | blob | 2.0 GiB | 12.8 GiB | 329K rows/s | 189 ms | 62 KiB |
| pgsql 18.4 | col | 2.0 GiB | 12.7 GiB | 327K rows/s | 217 ms | 86 KiB |
### Python end-to-end
Measures full round-trip latency including protocol decoding and `pd.DataFrame` conversion. Ingestion throughput includes Python-side serialization and batch writes.
| murr 0.1.8 | columnar | 2.34M rows/s | 1.38 ms |
| Redis 8.6.1 | blob | 136K rows/s | 2.42 ms |
| Redis 8.6.1 | HSET | 61K rows/s | 9.39 ms |
| RocksDB | blob | 622K rows/s | 4.90 ms |
| PostgreSQL 17 | blob | 356K rows/s | 10.8 ms |
| PostgreSQL 17 | col-per-feature | 143K rows/s | 10.6 ms |
Murr is ~3x faster than Redis on packed-blob reads and ~12x faster on Feast-style HSET layout, while using ~3x less RAM than the HSET equivalent. Dragonfly's packed-blob mode is close on latency, but still pays the protocol-parsing cost on the client.
## Roadmap
No ETAs, but at least you can see where things stand:
- [x] HTTP API
- [x] Arrow Flight gRPC API
- [x] API for data ingestion
- [x] Storage Directory interface (which is heavily inspired by [Apache Lucene](https://lucene.apache.org/))
- [x] Segment read/writes (again, inspired by [Apache Lucene](https://lucene.apache.org/))
- [x] Python embedded murrdb, so we can make a cool demo
- [x] Benchmarking harness: Redis support, Feast and feature-blob styles
- [x] Win at your own benchmark (this was surprisingly hard btw)
- [x] Support for `utf8`, `bool`, signed/unsigned `int8/16/32/64`, `float32` and `float64` datatypes
- [x] Python remote API client (sync + async)
- [x] Docker image
- [ ] Support most popular Arrow numerical types (signed/unsigned int 8/16/32/64, float 16, date-time)
- [ ] Array datatypes (e.g. Arrow `list`), so you can store embeddings
- [ ] Sparse columns
- [x] Add RocksDB and Postgres to the benchmark harness
- [ ] [Apache Iceberg](https://iceberg.apache.org/) and the very popular `parquet dump on S3` data catalog support
## Development
```bash
cargo build # Build the project
cargo test # Run all tests
cargo check # Fast syntax/type check
cargo clippy # Linting
cargo fmt # Format code
cargo bench --bench <name> # Run a benchmark (multi_segment_index_bench, row_vs_col_bench)
```
## License
Apache 2.0