murr 0.2.1

Columnar in-memory cache for AI/ML inference workloads
Documentation
![logo](doc/img/logo.png)

[![CI Status](https://github.com/shuttie/murr/workflows/CI/badge.svg)](https://github.com/shuttie/murr/actions)
[![License: Apache 2](https://img.shields.io/badge/License-Apache2-green.svg)](https://opensource.org/licenses/Apache-2.0)
![Last commit](https://img.shields.io/github/last-commit/shuttie/murr)
![Last release](https://img.shields.io/github/release/shuttie/murr)
![Rust](https://shields.io/badge/-Rust-3776AB?style=flat&logo=rust)

<p align="center">
<a href="#what-is-murr">๐Ÿฑ What is Murr?</a> &middot; <a href="#why-murr">๐Ÿš€ Why Murr?</a> &middot; <a href="#why-not-murr">๐Ÿšซ Why NOT Murr?</a> &middot; <a href="#quickstart">โšก Quickstart</a> &middot; <a href="#benchmarks">๐Ÿ“Š Benchmarks</a> &middot; <a href="#roadmap">๐Ÿ—บ Roadmap</a>
</p>

**Murrdb**: A RocksDB-based NVMe/S3 cache for AI inference workloads. A faster Redis replacement, optimized for batch low-latency zero-copy reads and writes.

> This `README.md` is 99%[^1] human written.

[^1]: Used only for grammar and syntax checking.


## What is Murr?

![system diagram](doc/img/overview.png)

Murr is a caching layer for ML/AI data serving that sits between your batch data pipelines and inference apps:

- **Tiered storage**: hot data lives in memory, cold data stays on disk with S3-based replication. It's 2026, RAM is expensive - keep only the hot stuff there.
- **Batch-in, batch-out**: native batch reads and writes over columnar storage, with no per-row overhead. Dumping 1GB Parquet/Arrow files into the ingestion API is a perfectly valid use case.
```shell
# yes this works for batch writes
curl -d @0000.parquet -H "Content-Type: application/vnd.apache.parquet" \
  -XPUT http://localhost:8080/api/v1/table/yolo/write
```
- **Zero-copy wire protocol**: no conversion needed when building `np.ndarray`, `pd.DataFrame` or `pt.Tensor` from API responses. Sure, Redis is fast, but parsing its replies is not (especially in Python!).
```python
result = db.read("docs", keys=["doc_1", "doc_3", "doc_5"], columns=["score", "category"])
print(result.to_pandas())  # look mom, zero copy!
```
- **Stateless**: Murr is not a database - all state is persisted on S3. When a Redis node gets evicted, you're cooked. Murr just self-bootstraps from block storage.

Murr shines when:
* **your data is heavy and tabular**: that giant Parquet dump on S3 your AI inference or ML prep job produces? Perfect fit.
* **reads are batched**: pulling 100 columns across 1000 documents your agent wants to analyze? Great!
* **you care about costs**: sure, Redis with 1TB of RAM will work fine, but disk/S3 offloading is operationally simpler and way cheaper.

Short quickstart (see [full example](#quickstart)):
```shell
uv pip install murrdb
```
and then
```python
from murr.sync import Murr

db = Murr.start_local(cache_dir="/tmp/murr")  # embedded local instance

# fetch columns for a batch of document keys
result = db.read("docs", keys=["doc_1", "doc_3", "doc_5"], columns=["score", "category"])
print(result.to_pandas())

# Output:
#    score category
# 0   0.95       ml
# 1   0.72    infra
# 2   0.68      ops
```

## Why Murr?

TLDR: latency, simplicity, cost -- pick two. Murrdb tries to nail all three: fastest, cheapest, and easiest to operate. A bold claim, I know.

![comparison with competitors](doc/img/compare.png)

For the typical use case of `read N datapoints across M documents` (an agent reading document attributes, an ML ranker fetching feature values), on top of being the fastest, Murrdb:
- vs **[Redis]https://redis.io/**: is persistent (S3 is the new filesystem) and can offload cold data to local NVMe.
- vs embedded **[RocksDB]https://rocksdb.org/**: no need to build data sync between producer jobs and inference nodes yourself. Murrdb was designed to be distributed from the start.
- vs **[DynamoDB]https://aws.amazon.com/dynamodb/**: roughly 10x cheaper, since you only pay for CPU/RAM, not per query.

Not being a general-purpose database, it tries to be friendly to the everyday pain points of ML/AI engineers:
* **First-class Python support**: `pip install murrdb`, then map to/from Numpy/Pandas/Polars/Pytorch arrays with zero copy.
* **Sparse columns**: when a column has no data, it takes up zero bytes. Unlike the packed feature blob approach, where null columns aren't actually null.

## Why NOT Murr?

Murr is not a general-purpose database:
* **OLTP workloads**: if you have relations, transactions, and per-row reads/writes, go with [Postgres]https://www.postgresql.org/.
* **Analytics**: aggregating over entire tables to produce reports? Pick [Clickhouse]https://clickhouse.com/, [BigQuery]https://cloud.google.com/bigquery, or [Snowflake]https://www.snowflake.com/.
* **General-purpose caching**: need to cache user session data for a web app? Use [Redis]https://redis.io/.
* **Feature store**: yes, it kinda looks like one โ€” but Murrdb doesn't govern how you compute and store your data. Murr is an online serving layer, and can be a part of both internal feature stores and open-source ones like [Feast]https://feast.dev/, [Hopsworks]https://www.hopsworks.ai/, and [Databricks Feature Store]https://docs.databricks.com/en/machine-learning/feature-store/index.html.

> [!WARNING]
> Murr is still in its early days and may not be stable enough for your use case yet. But it's improving quickly.

## Quickstart

```python
import pandas as pd
import pyarrow as pa
from murr import TableSchema, ColumnSchema, DType
from murr.sync import Murr

db = Murr.start_local(cache_dir="/tmp/murr")

# define table schema
schema = TableSchema(
    key="doc_id", # the key
    columns={
        "doc_id": ColumnSchema(dtype=DType.UTF8, nullable=False),
        "score": ColumnSchema(dtype=DType.FLOAT32),
        "category": ColumnSchema(dtype=DType.UTF8),
    },
)
db.create_table("docs", schema)

# write a batch of documents
df = pd.DataFrame.from_dict({
    "doc_id":   ["doc_1", "doc_2", "doc_3", "doc_4", "doc_5"],
    "score":    [0.95, 0.87, 0.72, 0.91, 0.68],
    "category": ["ml", "search", "infra", "ml", "ops"],
})
db.write("docs", pa.Table.from_pandas(df))

# fetch specific columns for a few keys
result = db.read("docs", keys=["doc_1", "doc_3", "doc_5"], columns=["score", "category"])
print(result.to_pandas())

# Output:
#   score category
# 0   0.95       ml
# 1   0.72    infra
# 2   0.68      ops

```

## Benchmarks

Full benchmark suite with reproduction steps: [murrdb/murr-benchmark](https://github.com/murrdb/murr-benchmark).

We benchmark a typical `ML Ranking` use case: 100M rows, 10 `float32` columns, 1000 random key lookups per iteration. The suite includes two complementary harnesses:

* **Rust (Criterion)** โ€” measures raw service throughput as time-to-last-byte. Reads `select_rows` random keys per iteration and consumes raw response bytes without decoding. This isolates the storage/network layer and shows the theoretical ceiling of each backend.
* **Python (pyperf)** โ€” measures end-to-end latency as experienced by a Python ML client. Performs the same random-key reads but includes full protocol decoding and conversion into a `pd.DataFrame`. This captures the real cost a user pays: protocol parsing, byte deserialization, and DataFrame construction.

Backends and data layouts tested:
* **murr** (native, Arrow IPC) โ€” row-wise storage on top of RocksDB SSTables, with zero-copy reads and projection pushdown. Two modes: `mmap` (PlainTable, in-memory) and `block` (BlockTable, NVMe-backed).
* **Redis / Valkey / Dragonfly, blob** โ€” all features packed into a single `MGET` blob. Compact and cache-friendly, but always reads all columns.
* **Redis / Valkey / Dragonfly, HSET** โ€” [Feast]https://feast.dev/-style hash-per-row: each feature is a separate HSET field. Flexible, but per-field overhead adds up.
* **PostgreSQL blob** โ€” BYTEA column with packed features.
* **PostgreSQL col-per-feature** โ€” explicit typed columns, one per feature.

### Rust time-to-last-byte

All backends run on the same machine; container-backed ones use Docker via `testcontainers`. Memory is the container `TOTAL` (RSS+SHR) delta around the load phase. Net TX is server-to-client bytes per read. `disk` variants are cgroup-capped at 2 GiB RAM to force disk reads.

#### Blob layouts

| Engine | Layout | Memory | Disk | Ingestion | p50 latency | Net TX/read |
|--------|--------|-------:|-----:|----------:|------------:|------------:|
| murr 0.2.0 mmap | native | 7.5 GiB | 5.9 GiB | 948K rows/s | 268 ยตs | 42 KiB |
| Dragonfly 1.31 | blob | 7.3 GiB | โ€” | 4.01M rows/s | 296 ยตs | 46 KiB |
| Valkey 8.1 | blob | 8.9 GiB | โ€” | 1.58M rows/s | 657 ยตs | 46 KiB |
| Redis 8.6.3 | blob | 9.6 GiB | โ€” | 1.43M rows/s | 815 ยตs | 46 KiB |
| pgsql 18.4 | blob | 24.0 GiB | 12.8 GiB | 400K rows/s | 5.69 ms | 62 KiB |

#### Hash / col-per-feature layouts

| Engine | Layout | Memory | Disk | Ingestion | p50 latency | Net TX/read |
|--------|--------|-------:|-----:|----------:|------------:|------------:|
| murr 0.2.0 mmap | native | 7.5 GiB | 5.9 GiB | 948K rows/s | 268 ยตs | 42 KiB |
| Dragonfly 1.31 | hash | 20.1 GiB | โ€” | 650K rows/s | 2.82 ms | 213 KiB |
| Valkey 8.1 | hash | 19.4 GiB | โ€” | 378K rows/s | 3.20 ms | 210 KiB |
| Redis 8.6.3 | hash | 20.1 GiB | โ€” | 398K rows/s | 3.25 ms | 210 KiB |
| pgsql 18.4 | col | 23.4 GiB | 12.7 GiB | 384K rows/s | 6.54 ms | 86 KiB |

#### Disk mode (2 GiB RAM cap)

| Engine | Layout | Memory | Disk | Ingestion | p50 latency | Net TX/read |
|--------|--------|-------:|-----:|----------:|------------:|------------:|
| murr 0.2.0 block | native | 1.7 GiB | 5.8 GiB | 1.00M rows/s | 6.33 ms | 42 KiB |
| pgsql 18.4 | blob | 2.0 GiB | 12.8 GiB | 329K rows/s | 189 ms | 62 KiB |
| pgsql 18.4 | col | 2.0 GiB | 12.7 GiB | 327K rows/s | 217 ms | 86 KiB |

### Python end-to-end

Measures full round-trip latency including protocol decoding and `pd.DataFrame` conversion. Ingestion throughput includes Python-side serialization and batch writes.

| Engine | Layout | Ingestion | Read latency |
|--------|--------|----------:|-------------:|
| murr 0.1.8 | columnar | 2.34M rows/s | 1.38 ms |
| Redis 8.6.1 | blob | 136K rows/s | 2.42 ms |
| Redis 8.6.1 | HSET | 61K rows/s | 9.39 ms |
| RocksDB | blob | 622K rows/s | 4.90 ms |
| PostgreSQL 17 | blob | 356K rows/s | 10.8 ms |
| PostgreSQL 17 | col-per-feature | 143K rows/s | 10.6 ms |

Murr is ~3x faster than Redis on packed-blob reads and ~12x faster on Feast-style HSET layout, while using ~3x less RAM than the HSET equivalent. Dragonfly's packed-blob mode is close on latency, but still pays the protocol-parsing cost on the client.

## Roadmap

No ETAs, but at least you can see where things stand:
- [x] HTTP API
- [x] Arrow Flight gRPC API
- [x] API for data ingestion
- [x] Storage Directory interface (which is heavily inspired by [Apache Lucene]https://lucene.apache.org/)
- [x] Segment read/writes (again, inspired by [Apache Lucene]https://lucene.apache.org/)
- [x] Python embedded murrdb, so we can make a cool demo
- [x] Benchmarking harness: Redis support, Feast and feature-blob styles
- [x] Win at your own benchmark (this was surprisingly hard btw)
- [x] Support for `utf8`, `bool`, signed/unsigned `int8/16/32/64`, `float32` and `float64` datatypes
- [x] Python remote API client (sync + async)
- [x] Docker image
- [ ] Support most popular Arrow numerical types (signed/unsigned int 8/16/32/64, float 16, date-time)
- [ ] Array datatypes (e.g. Arrow `list`), so you can store embeddings
- [ ] Sparse columns
- [x] Add RocksDB and Postgres to the benchmark harness
- [ ] [Apache Iceberg](https://iceberg.apache.org/) and the very popular `parquet dump on S3` data catalog support


## Development

```bash
cargo build                  # Build the project
cargo test                   # Run all tests
cargo check                  # Fast syntax/type check
cargo clippy                 # Linting
cargo fmt                    # Format code
cargo bench --bench <name>   # Run a benchmark (multi_segment_index_bench, row_vs_col_bench)
```

## License

Apache 2.0