ProllyTree
A probabilistic B-tree with Merkle properties — a content-addressed, Git-versioned key-value store with branching, three-way merge, cryptographic proofs, optional SQL, and an optional vector / text-search index. Written in Rust with first-class Python bindings.
A prolly tree's shape is a deterministic function of its contents, so two replicas holding the same key-value set converge to the same root hash regardless of insertion order. That property is what makes the rest — Git-style versioning, efficient diff/sync between replicas, and verifiable subtree sharing across history — fall out for free.
Features
| Capability | What it gives you |
|---|---|
| Versioned KV store | Git-backed branch / commit / diff / three-way merge on raw key-value state |
| Namespaced KV store | Many isolated prolly trees in one Git repo, atomic across namespaces |
| Text / vector search | Optional versioned ANN index inside any namespace; bundled MiniLM, hash, and callable embedders |
| Multi-chunk indexing | Split docs into chunks at index time, dedup on search by document |
| Cascade mode | One primary write auto-mirrors into every registered text index |
| Large-value externalization | Values above a threshold land in content-addressed blobs; gc_blobs() reclaims them |
| Cryptographic proofs | Merkle inclusion / absence proofs on every value |
| Multiple storage backends | In-memory, File, RocksDB, Git-backed |
| SQL interface | Query the tree as relational tables via GlueSQL |
| Python bindings | Full surface via PyO3 — versioning, namespaces, text search, SQL |
git-prolly CLI |
Git-style command surface over the versioning + SQL layers |
Quick start
Rust
[]
= { = "0.4.0", = ["git", "sql"] }
# Add `proximity` for the text-search surface, `proximity_text` for bundled MiniLM.
Python
Examples
Verifiable key-value store
The raw ProllyTree ships a Merkle inclusion proof for every key — useful when data crosses trust boundaries.
use ;
use InMemoryNodeStorage;
let mut tree = new;
tree.insert;
let proof = tree.generate_proof;
assert!;
Git-backed versioning
The git feature stores tree nodes as Git objects, so commits, branches, and merges work natively on key-value state.
use StoreFactory;
let mut store = ?;
store.insert?;
store.commit?;
store.create_branch?;
store.insert?;
store.commit?;
// → diff, merge, history available; see the user guide
Multiple namespaces in one store
NamespacedKvStore holds many independent prolly trees in one Git repo. Each namespace owns its own key space and (optionally) its own search indexes; one commit covers them all.
=
# b"dark" — main is unchanged
Optional text / vector search
A namespace can host one or more text indexes that ride on the same storage as the primary tree. Every search hit is just an id; resolve back to the original bytes via the primary tree.
=
# primary writes auto-index
See examples/ (Rust) and python/examples/ for the full set: namespaces, text search, cascade, merge resolvers, SQL, blob GC.
Good fits
The combination of content-addressed Merkle structure + Git-style versioning + optional semantic search makes ProllyTree a natural fit for a few non-trivial use cases:
- Auditable application state. Anywhere you'd otherwise reach for "an event log + a current-state snapshot" — config systems, feature-flag rollout state, policy rules — gets a real Git history with diff, blame, rollback, and proofs for free.
- Distributed / multi-replica data. Two peers that hold the same keys converge to the same root hash. Subtree sharing makes diff and sync
O(changes), notO(corpus). - AI agent memory. Per-agent namespaces give isolated key spaces in one store; commits make every memory mutation auditable; branches isolate speculative reasoning; the optional text index gives semantic recall without a separate vector database. The text-search guide walks through this pattern in detail.
- Versioned analytical datasets. SQL over a Git-tracked KV store —
git checkouta historical commit and run the same query against the data as it existed then. See the SQL guide. - Content-addressed indexes. Any place a Merkle tree already makes sense (verifiable logs, proof systems, gossip-friendly indexes) — ProllyTree gives you the data-structure ergonomics of a B-tree on top.
Embedders (when you use the text-search feature)
| Embedder | Pulls in | Use it for |
|---|---|---|
HashEmbedder |
nothing extra | Tests, demos, exact-match recall |
MiniLmEmbedder |
Candle (pure Rust) + ~90 MB weights | Real semantic search, offline-friendly |
CallableEmbedder |
your callable | OpenAI, Cohere, sentence-transformers, your own model |
Embedder identity (id + version) is persisted with the index. Reopening with a mismatched embedder surfaces a clear error — no silent mixing of vectors from different models.
Feature flags
| Feature | Description | Default |
|---|---|---|
git |
Git-backed versioned storage with branching, merging, history | Yes |
sql |
SQL query interface via GlueSQL | Yes |
proximity |
Vector index + text-search infrastructure (ML-free) | No |
proximity_text |
Bundled Candle + all-MiniLM-L6-v2 embedder | No |
rocksdb_storage |
RocksDB persistent storage backend | No |
python |
Python bindings via PyO3 | No |
tracing |
Observability via the tracing crate |
No |
Python PyPI wheels ship git, sql, rocksdb_storage, proximity, and proximity_text enabled. Rust users opt in:
[]
= "0.4.0"
= ["git", "sql", "proximity", "proximity_text"]
Documentation
- User Guide — mkdocs site (architecture, CLI, Python API, examples, theory)
- Text Search Guide — design, embedder identity, cascade, merge, externalisation
- Browser Demo — interactive single-page deck of the text-search workflow
- Rust API Reference — auto-generated from source
- Python Quickstart — Python-specific intro
- Runnable Examples — verifiable KV, versioning, namespaces, text search, SQL, multi-agent worktrees
CLI
See the user guide for the full CLI walkthrough.
Contributing
Contributions welcome — see CONTRIBUTING.md.
License
Licensed under the Apache License 2.0. See LICENSE.