prollytree 0.3.4

A prolly (probabilistic) tree for efficient storage, retrieval, and modification of ordered data.
Documentation
# FAQ

## General

### What is a prolly tree, in one sentence?

A B-tree whose node boundaries are chosen by a content-defined hash predicate, which makes the tree shape a function of the key set alone — so the root hash becomes a stable fingerprint. See [Prolly Trees](theory/prolly_tree.md) for the full story.

### How is this different from a Merkle tree?

A plain Merkle tree is content-addressed, but two Merkle trees with the same data can have different shapes (and root hashes). A prolly tree is **history-independent**: identical KV sets yield identical trees, byte for byte.

### How is this different from a B-tree?

A classical B-tree rebalances on *count*, so shape depends on insertion order. A prolly tree rebalances on a content-defined predicate — same content means same shape regardless of order. See [Probabilistic Balancing](theory/rolling_hash.md).

### How is this different from Git?

Git versions files; it merges with text-level diffs. `git-prolly` versions structured key-value data and merges at the KV level using a three-way algorithm over Merkle subtrees. You can use Git on top of ProllyTree (the default `git` backend is a real Git repo) for remotes, tags, etc.

### When should I *not* use ProllyTree?

- When a single SQLite file would do.
- When you need a full relational database (joins across millions of rows, transactions spanning many tables with isolation guarantees). GlueSQL is embedded but not a replacement for Postgres.
- When you need strong multi-writer concurrency on a single branch. ProllyTree concurrency is per-branch; concurrent writers to the *same* branch should serialise.

## Usage

### How big can a store get before performance degrades?

RocksDB-backed stores have been tested to the low millions of keys without tuning. File-backed stores should be kept to tens of thousands of keys — one file per node doesn't scale. See [Storage Backends](storage.md) for guidance.

### Are commits cheap?

Yes. A commit records the current root hash plus the usual Git commit metadata. Tree nodes that didn't change aren't re-serialised — the new root hash just references the same child hashes as the previous commit.

### Do I need to use Git?

Not necessarily. The tree works on top of `InMemoryNodeStorage`, `FileNodeStorage`, or `RocksDBNodeStorage` without Git. But if you want commits, branches, and merges, the `VersionedKvStore` needs the `git` feature.

### Can I use a custom hash function?

The tree is parameterised by digest length (commonly `32` for SHA-256) but the hash function itself is currently fixed. If you have a specific need, open an issue.

### Can two processes write to the same Git-backed store?

Use `StoreFactory::git_threadsafe` (or the worktree manager) for multi-threaded access. For multi-*process* writers on the same branch, serialise via an external lock — Git's own locking is per-repository and not sufficient for arbitrary concurrent writers.

## Merging

### What happens when two branches change the same key?

The merge engine detects the conflict and delegates to a **conflict resolver**. Built-in options: `IgnoreAll` (keep destination), `TakeSource`, `TakeDestination`. You can also plug in a custom resolver. See [Theory → Versioning & Merge](theory/versioning.md).

### Can I probe for conflicts without merging?

Yes:

```python
ok, conflicts = store.try_merge("feature")
```

`try_merge` walks the diff and reports conflicts without mutating state.

### Why did my merge succeed silently when I expected a conflict?

You're almost certainly using `ConflictResolution.IgnoreAll` (the default in some flows). That's not technically a silent merge — it's the documented "keep destination" behaviour — but it can be surprising. Use `try_merge` first if you want to know before committing.

## Python bindings

### Does `pip install prollytree` include SQL and Git support?

Yes. The published wheel is built with the `python` + `sql` features and the `git` feature is on by default.

### Where are the Python docs?

Here, in the [Python API](api/python.md) section, and with examples in [Examples → Python bindings](examples/python.md). The old Sphinx-based docs on Read the Docs may still be reachable but are being replaced by this site.

### Can I use ProllyTree as a LangGraph / LangMem backend?

Yes — ProllyTree is designed with AI agent memory in mind. See the [LangMem example](examples/python.md#langmem-integration-for-ai-agent-memory) and the [Memoir project](https://github.com/zhangfengcdt/memoir), which builds a full semantic memory system on top of ProllyTree.

## SQL

### Which SQL engine does ProllyTree use?

[GlueSQL](https://github.com/gluesql/gluesql). It's embedded (no external process) and speaks a useful subset of SQL. See [SQL Interface](sql.md) for the supported features and known limitations.

### Can I run SQL on a historical commit?

Yes, read-only:

```bash
git-prolly sql -b v1.0 "SELECT COUNT(*) FROM users"
```

Write queries are rejected when `-b` is set to keep historical commits immutable.

### Why `SELECT * FROM users` returns `I64(1)` instead of `1`?

GlueSQL's default table output includes the wire type. Use `-o json` or `-o csv` for a cleaner shape.

## Storage backends

### Which backend should I use in production?

Most likely **RocksDB**. It's the only backend tuned for write-heavy, large-dataset workloads. `File` is fine for small datasets and debugging; `InMemory` is for tests. The **Git backend** is production-capable *as long as you commit* — experimental raw Git object storage (without commits) is explicitly unsafe because of `git gc`. See [Storage Backends](storage.md).

## Contributing / support

### Where do I report bugs?

[github.com/zhangfengcdt/prollytree/issues](https://github.com/zhangfengcdt/prollytree/issues).

### Where do I see the Rust API?

[docs.rs/prollytree](https://docs.rs/prollytree) — auto-generated from the source. The [Rust API](api/rust.md) page here is a pointer.

### Is there a Discord / Slack?

Not currently. Open an issue or a discussion on GitHub.