
Murrdb: A columnar in-memory cache for AI inference workloads. A faster Redis/RocksDB replacement, optimized for batch low-latency zero-copy reads and writes.
This
README.mdis 99%[^1] human written.
[^1]: Used only for grammar and syntax checking.
What is Murr?

Murr is a caching layer for ML/AI data serving that sits between your batch data pipelines and inference apps:
- Tiered storage: hot data lives in memory, cold data stays on disk with S3-based replication. It's 2026, RAM is expensive - keep only the hot stuff there.
- Batch-in, batch-out: native batch reads and writes over columnar storage, with no per-row overhead. Dumping 1GB Parquet/Arrow files into the ingestion API is a perfectly valid use case.
# yes this works for batch writes
curl -d @0000.parquet -H "Content-Type: application/vnd.apache.parquet" \
-XPUT http://localhost:8080/api/v1/table/yolo/write
- Zero-copy wire protocol: no conversion needed when building
np.ndarray,pd.DataFrameorpt.Tensorfrom API responses. Sure, Redis is fast, but parsing its replies is not (especially in Python!).
=
# look mom, zero copy!
- Stateless: Murr is not a database - all state is persisted on S3. When a Redis node gets restarted, you're cooked. Murr just self-bootstraps from block storage.
Murr shines when:
- your data is heavy and tabular: that giant Parquet dump on S3 your AI inference or ML prep job produces? Perfect fit.
- reads are batched: pulling 100 columns across 1000 documents your agent wants to analyze? Great!
- you care about costs: sure, Redis with 1TB of RAM will work fine, but disk/S3 offloading is operationally simpler and way cheaper.
Short quickstart (see full example):
uv pip install murrdb
and then
= # embedded local instance
# fetch columns for a batch of document keys
=
# Output:
# score category
# 0 0.95 ml
# 1 0.72 infra
# 2 0.68 ops
Why Murr?
TLDR: latency, simplicity, cost -- pick two. Murrdb tries to nail all three: fastest, cheapest, and easiest to operate. A bold claim, I know.

For the typical use case of read N datapoints across M documents (an agent reading document attributes, an ML ranker fetching feature values), on top of being the fastest, Murrdb:
- vs Redis: is persistent (S3 is the new filesystem) and can offload cold data to local NVMe.
- vs embedded RocksDB: no need to build data sync between producer jobs and inference nodes yourself. Murrdb was designed to be distributed from the start.
- vs DynamoDB: roughly 10x cheaper, since you only pay for CPU/RAM, not per query.
Not being a general-purpose database, it tries to be friendly to the everyday pain points of ML/AI engineers:
- First-class Python support:
pip install murrdb, then map to/from Numpy/Pandas/Polars/Pytorch arrays with zero copy. - Sparse columns: when a column has no data, it takes up zero bytes. Unlike the packed feature blob approach, where null columns aren't actually null.
Why NOT Murr?
Murr is not a general-purpose database:
- OLTP workloads: if you have relations, transactions, and per-row reads/writes, go with Postgres.
- Analytics: aggregating over entire tables to produce reports? Pick Clickhouse, BigQuery, or Snowflake.
- General-purpose caching: need to cache user session data for a web app? Use Redis.
- Feature store: yes, it kinda looks like one — but Murrdb doesn't govern how you compute and store your data. Murr is an online serving layer, and can be a part of both internal feature stores and open-source ones like Feast, Hopsworks, and Databricks Feature Store.
[!WARNING] Murr is still in its early days and may not be stable enough for your use case yet. But it's improving quickly.
Quickstart
=
# define table schema
=
# write a batch of documents
=
# fetch specific columns for a few keys
=
# Output:
# score category
# 0 0.95 ml
# 1 0.72 infra
# 2 0.68 ops
Benchmarks
We benchmark a typical ML Ranking use case: an ML scoring model running across N=1000 documents, each with M=10 float32 feature values. Key distribution is random, on a small 10M row dataset.
- murrdb: modeled as a simple table with a
utf8key and 10 non-nullablefloat32columns. We measure both Flight gRPC and HTTP protocols. - Redis with feature-blob approach: all 10 per-document features packed into a 40-byte blob. Essentially a key-value lookup via
MGET, all 1000 keys at once. Efficient, but good luck adding a new column. - Redis with Feast-style approach: each document is an HSET where the key is the feature name and the value is its value. Each feature can be read/written separately, but you need pipelining to get anywhere near MGET performance.
| Approach | Latency (mean)[^2] | 95% CI | Throughput |
|---|---|---|---|
| Murr (HTTP + Arrow IPC) | 104 µs | [103—104 µs] | 9.63 Mkeys/s |
| Murr (Flight gRPC) | 105 µs | [104—105 µs] | 9.53 Mkeys/s |
| Redis MGET (feature blobs) | 263 µs | [262—264 µs] | 3.80 Mkeys/s |
| Redis Feast (HSET per row) | 3.80 ms | [3.76—3.89 ms] | 263 Kkeys/s |
Murr is ~2.5x faster than the best Redis layout (MGET with packed blobs) and ~36x faster than Feast-style hash-per-row storage.
[^2]: We measure last-byte latency and don't include protocol parsing overhead yet.
Roadmap
No ETAs, but at least you can see where things stand:
- HTTP API
- Arrow Flight gRPC API
- API for data ingestion
- Storage Directory interface (which is heavily inspired by Apache Lucene)
- Segment read/writes (again, inspired by Apache Lucene)
- Python embedded murrdb, so we can make a cool demo
- Benchmarking harness: Redis support, Feast and feature-blob styles
- Win at your own benchmark (this was surprisingly hard btw)
- Support for
utf8andfloat32datatypes - Python remote API client (sync + async)
- Docker image
- Support most popular Arrow numerical types (signed/unsigned int 8/16/32/64, float 16/64, date-time)
- Array datatypes (e.g. Arrow
list), so you can store embeddings - Sparse columns
- Add RocksDB and Postgres to the benchmark harness
- Apache Iceberg and the very popular
parquet dump on S3data catalog support
Architecture
Storage Engine
The storage subsystem is a custom columnar format heavily inspired by Apache Lucene's immutable segment model:
- Segments (
.segfiles) are the atomic unit of write -- one batch of data becomes one immutable segment. No in-place modifications, which simplifies concurrency and maps naturally to object storage. - Directory abstraction keeps logical data organization separate from physical storage (local filesystem for now, S3 later).
- Memory-mapped reads via
memmap2-- the OS takes care of page caching, segment data is accessed as zero-copy byte slices. - Last-write-wins key resolution: newer segments shadow older ones for the same key, so you get incremental updates without rewriting old data.
[MURR magic (4B)][version u32 LE]
[column payloads, 4-byte aligned]
[footer entries: name_len|name|offset|size per column]
[footer_size u32 LE]
The footer-at-the-end layout follows the same pattern as Lucene's compound file format.
Each column type has its own binary encoding tuned for scatter-gather reads. We tried using Arrow for the in-memory representation early on, and it turned out surprisingly slow compared to a hand-rolled implementation:
| Type | Status | Description |
|---|---|---|
float32 |
Implemented | 16-byte header, 8-byte aligned f32 payload, optional null bitmap |
utf8 |
Implemented | 20-byte header, i32 value offsets, concatenated strings, optional null bitmap |
int16, int32, int64, uint16, uint32, uint64, float64, bool |
Planned |
Null bitmaps are u64-word bit arrays (bit set = valid). Non-nullable columns skip bitmap checks entirely.
Served by the Axum HTTP layer.
| Method | Path | Description |
|---|---|---|
| GET | /health |
Health check |
| GET | /openapi.json |
OpenAPI spec |
| GET | /api/v1/table |
List all tables with schemas |
| GET | /api/v1/table/{name}/schema |
Get table schema |
| PUT | /api/v1/table/{name} |
Create a table |
| POST | /api/v1/table/{name}/fetch |
Read data (JSON or Arrow IPC response) |
| PUT | /api/v1/table/{name}/write |
Write data (JSON, Parquet or Arrow IPC request) |
Fetch responses respect the Accept header (application/json or application/vnd.apache.arrow.stream). Write requests use Content-Type for the same formats.
A read-only Arrow Flight endpoint for native Arrow integration without the HTTP overhead. Source: src/api/flight/.
| RPC | Description |
|---|---|
do_get |
Fetch rows by keys and columns (JSON-encoded FetchTicket) |
get_flight_info |
Get table schema and metadata |
get_schema |
Get schema in Arrow IPC format |
list_flights |
List all available tables |
Ticket format for do_get:
Development
License
Apache 2.0