SparrowDB is an embedded graph database. It links directly into your process — Rust, Python, Node.js, or Ruby — and gives you a real Cypher query interface backed by a WAL-durable store on disk. No server. No JVM. No cloud subscription. No daemon to babysit.
If your data is fundamentally relational — recommendations, social graphs, dependency trees, fraud rings, knowledge graphs — and you want to query it with multi-hop traversals instead of JOIN chains, SparrowDB is the drop-in answer.
Quick Start
use GraphDb;
That's it. The database is a directory on disk. Ship it.
Why SparrowDB
The graph database landscape has a gap.
Neo4j is powerful, but it requires a running server, a JVM, and a license the moment you need production features. DGraph is horizontally scalable, but you don't need horizontal scale — you need to ship your app. Every existing option assumes you want to operate a database cluster, not embed a graph engine.
SparrowDB fills the same role SQLite fills for relational data: zero infrastructure, full capability, open source, MIT licensed.
| Question | Answer |
|---|---|
| Does it need a server? | No. It's a library. |
| Does it need a cloud account? | No. It's a file on disk. |
Can it survive kill -9? |
Yes. WAL + crash recovery. |
| Can multiple threads read at once? | Yes. SWMR — readers never block writers. |
| Does the Python binding release the GIL? | Yes. Every call into the engine releases it. |
| Can I use it from an AI assistant? | Yes. Built-in MCP server. |
When to Use SparrowDB
SparrowDB is the right choice when:
- Your data has structure that's hard to flatten. Social follows, product recommendations, dependency graphs, org charts, bill-of-materials, knowledge graphs — these are terrible in SQL and natural in graphs.
- You're building an application, not operating a database. You want to
cargo add sparrowdband ship, not provision instances. - You need multi-hop queries.
MATCH (a)-[:FOLLOWS*1..3]->(b)is one query. In SQL it's recursive CTEs all the way down. - You're embedding into a CLI, desktop app, agent, or edge service. SparrowDB opens in milliseconds and has no runtime overhead when idle.
SparrowDB is not the right choice when you need distributed writes across many nodes, or when your graph has billions of edges and you need horizontal sharding. Use Neo4j Aura or DGraph for that.
Features
Cypher Support
| Feature | Status |
|---|---|
CREATE, MATCH, SET, DELETE |
✅ |
WHERE — =, <>, <, <=, >, >= |
✅ |
WHERE n.prop CONTAINS str / STARTS WITH str |
✅ |
WHERE n.prop IS NULL / IS NOT NULL |
✅ |
1-hop and multi-hop edges (a)-[:R]->()-[:R]->(c) |
✅ |
Undirected edges (a)-[:R]-(b) |
✅ |
Variable-length paths [:R*1..N] |
✅ |
RETURN DISTINCT, ORDER BY, LIMIT, SKIP |
✅ |
COUNT(*), COUNT(expr), COUNT(DISTINCT expr) |
✅ |
SUM, AVG, MIN, MAX |
✅ |
collect() — aggregate into list |
✅ |
coalesce(expr1, expr2, …) — first non-null |
✅ |
WITH … WHERE pipeline (filter mid-query) |
✅ |
WITH … MATCH pipeline (chain traversals) |
✅ |
WITH … UNWIND pipeline |
✅ |
UNWIND list AS var MATCH (n {id: var}) |
✅ |
OPTIONAL MATCH |
✅ |
UNION / UNION ALL |
✅ |
MERGE — upsert node with ON CREATE SET / ON MATCH SET |
✅ |
MATCH (a),(b) MERGE (a)-[:R]->(b) — idempotent edge |
✅ |
CREATE (a)-[:REL]->(b) — directed edge |
✅ |
CASE WHEN … THEN … ELSE … END |
✅ |
EXISTS { (n)-[:REL]->(:Label) } |
✅ |
EXISTS in WITH … WHERE |
✅ |
shortestPath((a)-[:R*]->(b)) |
✅ |
ANY / ALL / NONE / SINGLE list predicates |
✅ |
id(n), labels(n), type(r) |
✅ |
size(), range(), toInteger(), toString() |
✅ |
toUpper(), toLower(), trim(), replace(), substring() |
✅ |
abs(), ceil(), floor(), sqrt(), sign() |
✅ |
Parameters $param |
✅ |
CALL db.index.fulltext.queryNodes — scored full-text search |
✅ |
CALL db.schema() |
✅ |
Multi-label nodes (n:A:B) |
⚠️ Planned |
Subqueries CALL { … } |
⚠️ Partial |
Engine & Storage
- WAL durability — write-ahead log with crash recovery; survives hard kills
- SWMR concurrency — single-writer, multiple-reader; readers never block writers
- Factorized execution — multi-hop traversals avoid materializing O(N²) intermediate rows
- B-tree property index — equality lookups in O(log n), not full label scans
- Inverted text index —
CONTAINS/STARTS WITHrouted through an index - Full-text search — relevance-scored
queryNodeswithout Elasticsearch - External merge sort —
ORDER BYon large results spills to disk; no unbounded heap - At-rest encryption — optional XChaCha20-Poly1305 per WAL entry; wrong key errors immediately, never silently decrypts garbage
execute_batch()— multiple writes in onefsyncfor bulk-load throughputexecute_with_timeout()— cancel runaway traversals without killing the processexport_dot()— export any graph to Graphviz DOT for visualization- APOC CSV import — migrate existing Neo4j graphs in one command
- MVCC write-write conflict detection — two writers on the same node: the second is aborted
Language Bindings
| Language | Mechanism | Status |
|---|---|---|
| Rust | Native GraphDb API |
✅ Stable |
| Python | PyO3 — releases GIL, context manager | ✅ Stable |
| Node.js | napi-rs — SparrowDB class |
✅ Stable |
| Ruby | Magnus extension | ✅ Stable |
All bindings open the same on-disk format. A graph written from Python can be read by Node.js.
Install
Rust
[]
= "0.1"
Python
# Once published to PyPI:
# Build from source:
&&
Node.js
Ruby
&& &&
CLI
MCP Server (Claude Desktop integration)
Language Examples
Rust
use GraphDb;
use Duration;
Python
# Context manager — database closes cleanly on exit; execute() releases the GIL
# Traverse: what's related to Widget?
=
# [{'r.name': 'Doohickey', 'r.price': 4.99}, {'r.name': 'Gadget', 'r.price': 24.99}]
# UNWIND + MATCH: bulk lookup by ID
=
# [{'n.name': 'Widget', 'n.price': 9.99}, {'n.name': 'Doohickey', 'n.price': 4.99}]
# Thread-safe: GIL is released inside execute(), checkpoint(), and optimize()
return
=
Node.js / TypeScript
import SparrowDB from 'sparrowdb';
const db = new SparrowDB('/path/to/my.db');
db.execute("CREATE (n:Article {id: 'a1', title: 'Graph Databases 101', tags: 'graphs,rust'})");
db.execute("CREATE (n:Article {id: 'a2', title: 'Cypher Query Language', tags: 'cypher,graphs'})");
db.execute("CREATE (n:Article {id: 'a3', title: 'Embedded Rust', tags: 'rust,embedded'})");
db.execute("MATCH (a:Article {id:'a1'}),(b:Article {id:'a2'}) CREATE (a)-[:RELATED]->(b)");
// Find related articles, 2 hops
const related = db.execute(
"MATCH (a:Article {id:'a1'})-[:RELATED*1..2]->(r) RETURN DISTINCT r.title"
);
console.log(related); // [['Cypher Query Language']]
// Full-text search (after indexing)
db.execute("CALL db.index.fulltext.createNodeIndex('articles', ['Article'], ['title', 'tags'])");
const results = db.execute(
"CALL db.index.fulltext.queryNodes('articles', 'rust') " +
"YIELD node, score RETURN node.title, score ORDER BY score DESC"
);
console.log(results);
db.close();
Ruby
db = SparrowDB::GraphDb.new()
db.execute()
db.execute()
db.execute()
db.execute()
# Who does tokio depend on transitively?
rows = db.execute(
)
puts rows.inspect # [["serde"]]
db.close
Real-World Use Cases
Recommendation Engine
-- "Users who liked X also liked Y"
MATCH (u:User {id: $user_id})-[:LIKED]->(item:Item)
WITH collect(item) AS liked_items
MATCH (other:User)-[:LIKED]->(item) WHERE item IN liked_items
WITH other, COUNT(item) AS overlap ORDER BY overlap DESC LIMIT 20
MATCH (other)-[:LIKED]->(candidate:Item)
WHERE NOT candidate IN liked_items
RETURN candidate.name, COUNT(other) AS score ORDER BY score DESC LIMIT 10
Fraud Detection
-- Find accounts that share a device with a flagged account
MATCH (flagged:Account {status:'fraudulent'})-[:USED]->(device:Device)
MATCH (device)<-[:USED]-(suspect:Account)
WHERE suspect.status <> 'fraudulent'
WITH suspect, COUNT(device) AS shared_devices
WHERE shared_devices >= 2
RETURN suspect.id, suspect.email, shared_devices
ORDER BY shared_devices DESC
Dependency Graph (software, supply chain)
-- What breaks if we remove this package?
MATCH (pkg:Package {name: $package_name})<-[:DEPENDS_ON*1..10]-(dependent)
RETURN DISTINCT dependent.name, dependent.version
ORDER BY dependent.name
Knowledge Graph
-- How are these two concepts connected?
MATCH (a:Concept {name: 'machine learning'}), (b:Concept {name: 'linear algebra'})
MATCH path = shortestPath((a)-[:RELATED_TO|REQUIRES|FOUNDATION_OF*]->(b))
RETURN [n IN nodes(path) | n.name] AS connection_chain
Org Chart Reporting
-- Full reporting chain from an IC to the top
MATCH (emp:Employee {name: $name})-[:REPORTS_TO*]->(mgr:Employee)
RETURN emp.name, [m IN collect(mgr) | m.name + ' (' + m.title + ')'] AS chain
Advanced Features
Encrypted Database
Protect data at rest. The key must be exactly 32 bytes. Wrong key = immediate error, never silently decrypted garbage.
use GraphDb;
Graph Visualization
use GraphDb;
|
Full-Text Search
use GraphDb;
Per-Query Timeout
use GraphDb;
use Duration;
Bulk Load (single fsync)
use GraphDb;
Neo4j Migration
# Export from Neo4j using APOC:
# CALL apoc.export.csv.all("export", {})
# Produces nodes.csv + relationships.csv
Performance Characteristics
SparrowDB is designed for in-process, latency-sensitive workloads — not distributed analytics:
| Technique | What it buys you |
|---|---|
| Factorized execution | Friend-of-friend at 1M nodes stays fast — no O(N²) intermediate rows |
| B-tree property index | Equality lookups: O(log n), not a full label scan |
| Inverted text index | CONTAINS / STARTS WITH without scanning every node |
| External merge sort | ORDER BY on results larger than RAM — sorted runs spill to disk |
execute_batch() |
Orders-of-magnitude faster bulk loads — one fsync for N writes |
| SWMR concurrency | Concurrent readers at zero extra cost; readers never block writers |
| Zero-copy open | Opens in < 1ms — suitable for serverless and short-lived processes |
| GIL-released Python | Python threads can issue parallel reads without contention |
SparrowDB does not yet have published LDBC SNB benchmarks (that's on the roadmap). What we can say: for graphs that fit on a single machine — up to tens of millions of nodes and edges — it performs well. Pull requests with reproducible benchmarks are welcome.
Architecture
+------------------------------------------------------------------------+
| Language Bindings |
| Rust - Python (PyO3) - Node.js (napi-rs) - Ruby (Magnus) |
| CLI (sparrowdb) - MCP Server (sparrowdb-mcp) |
+------------------------------------------------------------------------+
| Cypher Frontend (sparrowdb-cypher) |
| Lexer -> AST -> Binder (name resolution, type checking) |
+------------------------------------------------------------------------+
| Factorized Execution Engine (sparrowdb-execution) |
| Physical plan - iterator model - aggregation |
| External merge sort - EXISTS evaluation - deadline checks |
+------------------------------------------------------------------------+
| Catalog (sparrowdb-catalog) |
| Label registry - B-tree property index - Inverted text index |
+------------------------------------------------------------------------+
| Storage (sparrowdb-storage) |
| Write-Ahead Log - CSR adjacency store - Delta log |
| XChaCha20-Poly1305 encryption (optional) - Crash recovery - SWMR |
+------------------------------------------------------------------------+
Crate layout:
| Crate | Role |
|---|---|
sparrowdb |
Public API — GraphDb, QueryResult, Value |
sparrowdb-common |
Shared types and error definitions |
sparrowdb-storage |
WAL, CSR store, encryption, crash recovery |
sparrowdb-catalog |
Label/property schema, B-tree index, text index |
sparrowdb-cypher |
Lexer, parser, AST, binder |
sparrowdb-execution |
Physical query executor, sort, aggregation |
sparrowdb-cli |
sparrowdb command-line binary |
sparrowdb-mcp |
JSON-RPC 2.0 MCP server binary |
sparrowdb-python |
PyO3 extension module |
sparrowdb-node |
napi-rs Node.js addon |
sparrowdb-ruby |
Magnus Ruby extension |
MCP Server — AI Assistant Integration
sparrowdb-mcp speaks JSON-RPC 2.0 over stdio. It plugs into Claude Desktop and any MCP-compatible AI client, letting the assistant query and write to your graph database using natural tool calls.
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
Available tools:
| Tool | Description |
|---|---|
execute_cypher |
Execute any Cypher statement; returns result rows |
create_entity |
Create a node with a label and properties |
add_property |
Set a property on nodes matching a filter |
checkpoint |
Flush WAL and compact |
info |
Database metadata |
Full setup: docs/mcp-setup.md
CLI Reference
# Execute a query — results as JSON
# Flush WAL and compact
# Database metadata
# Export graph as DOT
|
# Import Neo4j APOC CSV export
# NDJSON line-oriented server mode
# stdin: {"id":"q1","cypher":"MATCH (n) RETURN n LIMIT 5"}
# stdout: {"id":"q1","columns":["n"],"rows":[...],"error":null}
Comparison
| SparrowDB | Neo4j | DGraph | SQLite + JSON | |
|---|---|---|---|---|
| Deployment | Embedded (in-process) | Server required | Server required | Embedded |
| Query language | Cypher | Cypher | GraphQL+DQL | SQL |
| Primary language | Rust | JVM | Go | C |
| Python binding | PyO3 native (releases GIL) | Bolt driver | Bolt driver | Adapter |
| Node.js binding | napi-rs native | Bolt driver | Bolt driver | Adapter |
| Ruby binding | Magnus native | Bolt driver | None | Adapter |
| At-rest encryption | XChaCha20 built-in | Enterprise only | No | No |
| WAL crash recovery | Yes | Yes | Yes | Yes |
| Full-text search | Built-in | Built-in | Built-in | No |
| MCP server | Built-in | No | No | No |
| License | MIT | GPL / Commercial | Apache 2 | Public domain |
| Runtime dependencies | Zero | JVM + server | Server process | Zero |
TL;DR: If you need embedded + Cypher + zero infrastructure, there's nothing else. SparrowDB is the only option in that row.
Project Status
SparrowDB is pre-1.0. We are building in public.
We ship fast. The API is stable enough to build on, but we're still adding features and the on-disk format may change before 1.0. Pin your version.
What's done:
- Full Cypher subset (see table above)
- WAL durability + crash recovery
- At-rest encryption
- Factorized multi-hop engine
- B-tree + full-text indexes
- External merge sort
- Per-query timeouts
- Bulk batch writes
- Python / Node.js / Ruby bindings
- MCP server
- CLI tools
- Neo4j APOC import
What's next (ordered by priority):
- WAL CRC32C integrity checksums (SPA-253)
- HTTP/SSE transport layer (SPA-231)
- Multi-label nodes
(n:A:B)(SPA-200) - LDBC SNB benchmarks (SPA-111)
- Publish to PyPI / npm
Follow along: github.com/ryaker/SparrowDB
Documentation
| Guide | |
|---|---|
| docs/quickstart.md | Build your first graph from zero |
| docs/cypher-reference.md | Full Cypher support with examples |
| docs/bindings.md | Rust, Python, Node.js, Ruby API details |
| docs/mcp-setup.md | MCP server and Claude Desktop config |
| docs/use-cases.md | Real-world usage patterns |
| DEVELOPMENT.md | Contributor workflow and architecture |
Contributing
Open an issue before submitting a large PR so we can discuss the design first.
The workspace is structured so each crate has one job. Adding a Cypher feature typically means touching sparrowdb-cypher (parser + AST) and sparrowdb-execution (executor), with an integration test in crates/sparrowdb/tests/. See DEVELOPMENT.md.
License
MIT — see LICENSE.