SQLiteGraph

Embedded graph database with dual backend architecture, graph algorithms, Cypher-inspired queries, and HNSW vector search.

Positioning: Single-binary embedded database (no server). Persistent storage with atomic batch commits. Graph algorithms + HNSW vector search in one engine. SQLite: stable, mature, and easy to inspect with standard tooling. Native V3: graph-oriented storage with cache, KV, pub/sub, and traversal features. See the benchmarks below for workload-specific behavior.

What's New in the 3.2 Line

The current 3.2 release line builds on the 3.0 backend/model transition with concrete HNSW and traversal improvements:

HnswIndex::batch_insert_vectors() for lower-overhead bulk vector ingestion.
Transactional topology persistence for HNSW metadata and layers.
HnswIndexStats runtime counters for inserts, searches, vector-cache hits, and vector-cache misses.
Streaming traversal iterators for BFS, DFS, topological sort, and connected components, so callers can avoid materializing full Vecs when they only need incremental results.
parking_lot-based lock cleanup across the HNSW path and related hot locks.

See CHANGELOG for full history.

Backends

Feature	SQLite	Native V3
Status	Stable	Stable
Storage	`.db` file	`.graph` file
Capacity model	Storage-limited	Storage-limited
Graph algorithms	35+	35+
HNSW vectors	Yes	Yes
Pub/Sub	Yes	Yes
LRU Cache	No	Yes
Parallel BFS	No	Yes

Benchmarks

See Architecture for system design details and Benchmarking for methodology.

Representative clean samples from 2026-06-07 (AMD Ryzen 7 7800X3D, tmpfs, Rust 1.95.0):

Benchmark	SQLite	V3
Criterion `bfs_traversal/small_random_1k_5k`	`2.3680 ms`	`3.3191 ms`
Criterion `bfs_traversal/medium_random_10k_50k`	`26.510 ms`	`56.240 ms`
Release microbenchmark point lookup	`3965 ns`	`146 ns`

Use the release microbenchmark for fast sanity checks and the Criterion suites for workload comparisons.

Run benchmarks yourself:

git clone https://github.com/oldnordic/sqlitegraph.git
cd sqlitegraph
./scripts/run-curated-benchmarks.sh

cd sqlitegraph-core
cargo run --release --example test_performance_comparison --features native-v3
cargo bench --features native-v3 --bench backend_comparison
cargo bench --features native-v3 --bench sqlite_v3_curated

See examples/ for quick performance checks and docs/BENCHMARKING.md for the full workflow.

Quick Start

[dependencies]
# SQLite backend (default)
sqlitegraph = "3.2"

# OR Native V3 backend (graph-oriented storage)
sqlitegraph = { version = "3.2", features = ["native-v3"] }

use sqlitegraph_core::backend::{GraphBackend, NodeSpec};
use sqlitegraph_core::backend::sqlite::SqliteGraphBackend;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let backend = SqliteGraphBackend::in_memory()?;

    let node_id = backend.insert_node(NodeSpec {
        kind: "User".to_string(),
        name: "Alice".to_string(),
        file_path: None,
        data: serde_json::json!({"age": 30}),
    })?;

    println!("Created node: {}", node_id);
    Ok(())
}

TypedDiGraph (In-Memory)

A lightweight in-memory directed graph with typed node and edge weights, independent of the GraphBackend persistence layer. Useful for build-system DAGs, dependency graphs, and analysis passes that don't need disk storage.

use sqlitegraph::typed_digraph::{TypedDiGraph, NodeIndex, Direction};
use sqlitegraph::typed_digraph::algo::{toposort, tarjan_scc, Dfs};

let mut g = TypedDiGraph::<&str, i32>::new();
let a = g.add_node("compile");
let b = g.add_node("link");
let c = g.add_node("run");
g.add_edge(a, b, 1);
g.add_edge(b, c, 2);

// Topological order
let order = toposort(&g).expect("acyclic");
assert_eq!(order, vec![a, b, c]);

// DFS traversal
let mut dfs = Dfs::new(&g, a);
assert_eq!(dfs.by_ref().collect::<Vec<_>>(), vec![a, b, c]);

Available in the current 3.x line.

CLI

cargo install sqlitegraph-cli

# Query
sqlitegraph --db graph.db query "MATCH (n:User) RETURN n.name"

# Algorithms
sqlitegraph --db graph.db bfs --start 1 --depth 3
sqlitegraph --db graph.db algo pagerank --iterations 100

Copy-Paste CLI Demo

rm -f /tmp/sqlitegraph-demo.db

sqlitegraph --db /tmp/sqlitegraph-demo.db --write insert --kind User --name Alice --data '{"age":30}'
sqlitegraph --db /tmp/sqlitegraph-demo.db --write insert --kind User --name Bob --data '{"age":31}'
sqlitegraph --db /tmp/sqlitegraph-demo.db --write query 'CREATE (1)-[:KNOWS]->(2)'

sqlitegraph --db /tmp/sqlitegraph-demo.db query 'MATCH (a:User)-[:KNOWS]->(b:User) RETURN a.name, b.name'
sqlitegraph --db /tmp/sqlitegraph-demo.db algo scc

Hybrid Runtime Demo

This crate includes a runnable demo that combines ordinary SQLite rows, Native V3 graph metadata, SQLite-backed HNSW vectors, and V3 pub/sub:

cargo run -p sqlitegraph --example hybrid_sqlite_v3_hnsw_pubsub --features native-v3

Safety Invariants

Orphan edges are detected by verifying every edge endpoint references a stored entity before any reasoning or subgraph extraction runs.
Duplicate edges (identical (from,to,type) tuples) are tallied so traversal/pipeline counts stay deterministic and regressions surface quickly.
Invalid label/property references (metadata rows pointing at missing entities) are reported by the safety-check helpers.
Integrity sweeps perform a deep table walk (entities/edges/labels/properties), verifying sorted IDs, valid JSON payloads, and metadata references before committing to pipelines or migrations.

DSL Constraints

Supported clauses are limited to deterministic pattern, k-hop, filter type=…, and score steps; ordering matters and only one filter clause is allowed.
Combination syntax (CALLS*2, CALLS->USES) must not introduce conflicting filters or unknown tokens—ambiguous or unsupported input causes parser errors surfaced to the CLI/tests.

Performance & Instrumentation

Performance thresholds in sqlitegraph_bench.json gate releases. Benchmarks produce HTML reports under target/criterion. Use cargo bench --bench bench_insert (etc.) to isolate suites. For comparison work, prefer release-mode examples for quick sanity checks and Criterion suites for workload data.

Runtime instrumentation is exposed through the core APIs used by benchmarks and integration tests: prepare/execute counts, transaction begins/commits/rollbacks, and cache hits/misses can be captured while reproducing workloads.

Schema Compatibility Matrix

Version	Description
1	Base tables (`graph_entities`, `graph_edges`, `graph_labels`, `graph_properties`) plus indexes and `graph_meta`.
2	Adds `graph_meta_history` rows so each migration application is recorded; exposed via `run_pending_migrations` / CLI `migrate`.
Future	The CLI refuses to open DBs whose version exceeds the compiled `SCHEMA_VERSION`.

Upgrade workflow:

Inspect the database with sqlitegraph --db <path> status.
Review pending migrations through the library migration helpers.
Apply migrations atomically through the library helper; history entries are appended automatically.

Ecosystem

Tools built on SQLiteGraph:

Tool	Purpose	Repository	crates.io
Magellan	Code graph indexing, symbol navigation	github.com/oldnordic/magellan	crates.io/crates/magellan
llmgrep	Semantic code search	github.com/oldnordic/llmgrep	crates.io/crates/llmgrep
Mirage	CFG analysis, path enumeration	github.com/oldnordic/mirage	crates.io/crates/mirage-analyzer
splice	Precision code editing	github.com/oldnordic/splice	crates.io/crates/splice

Documentation

Architecture - System design
Manual - API guide
Query Language - Cypher-inspired query reference
Changelog - Version history
SnapshotId Migration Guide - v2.1.2 API changes

License

GPL-3.0-only

sqlitegraph 3.2.5