SQLiteGraph

Embedded Graph Database with Native V2 Backend and Graph Algorithms Library

What's New in v1.4.2

Developer Documentation - Comprehensive internal architecture guides

5 new developer guides (80+ pages of technical content)
- HNSW Vector Store Internals - layers, search/insert algorithms
- KV Store Architecture - MVCC, TTL, WAL integration
- Native V2 Binary Format - file structure, clusters, recovery
- Pub/Sub Implementation - event types, filtering, WAL integration
- Query API Enhancements - prefix scan, pattern matching, subscriptions
Zero compiler warnings (v1.4.1 cleanup)
Updated docs/INDEX.md with component architecture section

v1.4.1 - Code Quality

Reduced compiler warnings from 8 to 0
Added #[cfg(test)] to test-only modules
Removed unused imports

v1.4.0 - Pub/Sub Query Enhancements

Pattern-based subscriptions (kind_patterns, name_patterns)
KV prefix scanning for efficient queries
Node query by kind and name pattern

v1.3.0 - Graph Algorithms Library

35 algorithms across 13 categories
- Core Graph Theory: WCC, SCC, transitive closure, transitive reduction, topological sort
- Reachability: forward/backward reachability, can-reach, unreachable-nodes
- CFG Analysis: dominators, post-dominators, control dependence, dominance frontiers, natural loops
- Path Analysis: path enumeration with constraints, critical path, cycle basis
- Program Analysis: backward/forward slicing, SCC collapse
- Distributed Systems: min cut, min vertex cut, graph partitioning
- Observability: happens-before analysis, impact radius
- ML/Inference: subgraph isomorphism, graph rewriting, structural similarity
- Graph Diff: structural delta, refactor validation
- Security: taint propagation, sink analysis, source/sink discovery
CLI commands for all 35 algorithms with progress tracking
Foundation for compiler optimization, security analysis, and program understanding

Full ACID Transactions - Complete transaction correctness

Atomicity with full rollback support
Consistency validation at runtime
Isolation via MVCC snapshots
Durability with WAL recovery

Developer Documentation - Comprehensive guides for contributors

Architecture - System design and data flow
Testing Guide - Test patterns and utilities
Debugging Guide - Profiling and troubleshooting
Contributing - Development workflow

Test Coverage: 530+ tests passing (59 pubsub + 42 WAL + 53 MVCC + 180 algorithms + 134 HNSW + 65 others)

SQLiteGraph is an embedded graph database in Rust featuring a dual backend architecture. It provides SQLite and Native V2 storage options with graph algorithms, HNSW vector search, and MVCC snapshots.

See CHANGELOG.md for version history.

SQLiteGraph provides two backend options:

SQLite Backend: SQLite storage with ACID transactions
Native V2 Backend: Clustered adjacency storage with WAL

Features

Native V2 Architecture

Clustered Adjacency Storage: Stores edges in clusters for locality
Write-Ahead Logging (WAL): Transaction logging with crash recovery
Snapshot System: Export/import with lifecycle management
Cross-Platform Atomic Operations: Concurrent access across platforms
Storage Format: Binary format with 70%+ size reduction vs legacy V1
Pub/Sub Events: In-process event notification for graph changes (Native V2 only)

Dual Backend Architecture

SQLite Backend: Traditional SQLite with full ACID transactions
Native V2 Backend: Clustered adjacency for traversal-heavy workloads
Unified API: Single API works with both backends
Runtime Selection: Switch backends via configuration

Core Graph Operations

Entity/Node Management: Insert, update, retrieve, delete
Edge Management: Create and manage typed relationships
JSON Data Storage: Arbitrary JSON metadata on entities and edges
Bulk Operations: Batch insert for higher throughput

Traversal & Querying

Neighbor Queries: Get incoming/outgoing connections
Pattern Matching: Graph pattern queries
Traversal Algorithms: BFS, shortest path, connected components

Graph Algorithms Library

35 algorithms across 13 categories: Comprehensive collection for CFG analysis, program slicing, security
- Core Graph Theory: WCC, SCC, transitive closure, transitive reduction, topological sort
- Reachability: forward/backward reachability, can-reach, unreachable-nodes
- CFG Analysis: dominators, post-dominators, control dependence, dominance frontiers, natural loops
- Path Analysis: path enumeration with constraints, critical path, cycle basis
- Program Analysis: backward/forward slicing, SCC collapse
- Distributed Systems: min cut, min vertex cut, graph partitioning
- Observability: happens-before analysis, impact radius
- ML/Inference: subgraph isomorphism, graph rewriting, structural similarity
- Graph Diff: structural delta, refactor validation
- Security: taint propagation, sink analysis, source/sink discovery

Performance & Reliability

MVCC Snapshots: Read isolation with snapshot views
Parallel WAL Recovery: 2-3x speedup for large WAL files (500+ transactions)
Automated Benchmarks: Criterion-based regression detection
Safety Tools: Orphan edge detection and integrity checks

Vector Search (HNSW)

HNSW Algorithm: Hierarchical Navigable Small World for ANN search
Supported Metrics: Cosine, Euclidean, Dot Product, Manhattan
OpenAI Compatible: Support for 1536-dimensional embeddings
Flexible Dimensions: Any size from 1-4096

Developer Tools (Phase 9)

Introspection API: GraphIntrospection for statistics and debugging
Progress Tracking: ProgressCallback with ConsoleProgress
CLI Debug Commands: debug-stats, debug-dump, debug-trace
Algorithm CLI Commands: pagerank, betweenness, louvain with progress bars

Performance Benchmarks

Benchmark Methodology:

Hardware: Linux x86_64 (kernel 6.18+)
Sizes: 100-500 nodes (V2 backend has 8MB node region limit, ~2048 nodes max)
Cache state: Warm (after warmup iterations)
Measurements: Criterion-based statistical analysis (95% confidence interval)

Native V2 vs SQLite Backend (Phase 24, 2026-01-21):

Operation	Size	Native V2	SQLite	Ratio
Node Insert	100	1.14 ms	3.63 ms	3.2x faster
Node Insert	500	4.91 ms	10.57 ms	2.2x faster
Edge Insert (star)	100	3.85 ms	7.18 ms	1.9x faster
BFS Traversal (star)	100	4.68 ms	7.28 ms	1.6x faster
BFS Traversal (chain)	100	15.38 ms	7.24 ms	2.1x slower
BFS Traversal (chain)	500	266.50 ms	24.98 ms	10.7x slower
1-Hop Query	100	3.87 ms	6.93 ms	1.8x faster

Key Findings:

Native V2 excels at insert operations (1.3-3.2x faster)
Star-pattern traversals favor Native V2 (clustered adjacency locality)
Chain traversals show regression (V2 cluster lookup overhead vs SQLite indexed adjacency)
Workload pattern matters: choose backend based on your graph shape and access patterns

Connection Pooling:

Warm checkout: 205 ns (pooled) vs 16.4 µs (direct) = 79.8x faster
First checkout overhead: ~5 ms (pool initialization)

HNSW Vector Search:

Insertion: 3-5 ms for 100 vectors (64-256 dimensions)
Search: Sub-millisecond typical latency
Accuracy: 95%+ recall on standard datasets

Storage Efficiency:

Native V2 format: 70%+ size reduction vs legacy V1 format

Caveats:

Numbers are for single-node embedded use (not distributed)
Performance varies based on graph topology, hardware, and configuration
V2 backend currently constrained to ~2048 nodes (8MB reserved region)
In-memory benchmarks show 1000-10000x headroom for future optimization

Quick Start

Add to your Cargo.toml:

[dependencies]
sqlitegraph = "1.4.2"

SQLite Backend (Default)

use sqlitegraph::{SqliteGraph, GraphEntity, GraphEdge};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let graph = SqliteGraph::open_in_memory()?;

    let user_entity = GraphEntity {
        id: 0,
        kind: "User".to_string(),
        name: "Alice".to_string(),
        file_path: None,
        data: serde_json::json!({"age": 30}),
    };

    let user_id = graph.insert_entity(&user_entity)?;
    println!("Created entity: {}", user_id);

    Ok(())
}

Native V2 Backend

[dependencies]
sqlitegraph = { version = "1.4.2", features = ["native-v2"] }

use sqlitegraph::{GraphConfig, open_graph, NodeSpec, EdgeSpec};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let cfg = GraphConfig::native();
    let temp_dir = tempfile::tempdir()?;
    let db_path = temp_dir.path().join("graph.db");

    let graph = open_graph(&db_path, &cfg)?;

    let node_spec = NodeSpec {
        kind: "User".to_string(),
        name: "Alice".to_string(),
        file_path: None,
        data: serde_json::json!({"age": 30}),
    };
    let user_id = graph.insert_node(node_spec)?;

    println!("Created node: {}", user_id);
    Ok(())
}

Pub/Sub Events (Native V2)

[dependencies]
sqlitegraph = { version = "1.4.2", features = ["native-v2"] }

use sqlitegraph::{GraphConfig, open_graph};
use sqlitegraph::backend::SubscriptionFilter;

let cfg = GraphConfig::native();
let graph = open_graph("graph.db", &cfg)?;

// Subscribe to all node change events
let filter = SubscriptionFilter::all();
let (subscriber_id, rx) = graph.subscribe(filter)?;

// In a separate task or thread, receive events
while let Ok(event) = rx.recv() {
    println!("Event: {:?}", event);
    // Events contain only IDs - read actual data from graph using snapshot_id
}

// Unsubscribe when done
graph.unsubscribe(subscriber_id)?;

Backend Selection Guide

Use Case	Recommended Backend	Why
Write-Heavy Workloads	Native V2 Backend	1.3-3.2x faster insert operations
Star-Pattern Graphs	Native V2 Backend	Clustered adjacency benefits local queries
Chain-Depth Traversals	SQLite Backend	V2 has 2-10x chain traversal regression
Enterprise Applications	SQLite Backend	ACID transactions, tooling ecosystem
Existing SQLite Integration	SQLite Backend	Direct compatibility
Vector Search Workloads	Native V2 Backend	HNSW integration
Development/Testing	Either Backend	Unified API, both support in-memory
Small Graphs (<2K nodes)	Either Backend	V2 has node region limit, SQLite scales better

Feature Flags

# Default - SQLite backend only
sqlitegraph = "1.4.2"

# Native V2 backend (with pub/sub support)
sqlitegraph = { version = "1.4.2", features = ["native-v2"] }

# Development features - I/O tracing
sqlitegraph = { version = "1.4.2", features = ["trace_v2_io"] }

CLI Tool

# Basic status
sqlitegraph --command status --database memory

# List entities
sqlitegraph --command list --database mygraph.db

# Export/import
sqlitegraph --command dump-graph --output backup.json --database mygraph.db
sqlitegraph --command load-graph --input backup.json --database mygraph.db

# HNSW vector search
sqlitegraph --backend sqlite --db mygraph.db hnsw-create --dimension 768 --distance-metric cosine
sqlitegraph --backend sqlite --db mygraph.db hnsw-insert --index-name vectors --input vectors.json
sqlitegraph --backend sqlite --db mygraph.db hnsw-search --index-name vectors --input query.json --k 10

# Algorithm commands (with progress bars)
sqlitegraph --backend sqlite --db mygraph.db pagerank --progress
sqlitegraph --backend sqlite --db mygraph.db betweenness --progress
sqlitegraph --backend sqlite --db mygraph.db louvain --progress

Graph Algorithms

use sqlitegraph::algo;

// PageRank - importance ranking
let scores = algo::pagerank(&graph, 0.85, 50)?;

// Betweenness Centrality - node importance via shortest paths
let centrality = algo::betweenness_centrality(&graph)?;

// Label Propagation - fast community detection
let communities = algo::label_propagation(&graph)?;

// Louvain - modularity-based clustering
let partition = algo::louvain_communities(&graph, 0.01)?;

// With progress tracking
use sqlitegraph::progress::ConsoleProgress;
let scores = algo::pagerank_with_progress(&graph, 0.85, 50, ConsoleProgress::new())?;

Testing

Test Coverage (v1.4.2):

59 pubsub tests passing (event emission, filtering, multiple subscribers)
42 WAL tests passing (recovery, corruption, checkpoints)
53 concurrent MVCC tests passing (snapshots, stress testing)
180+ algorithm tests passing (35 algorithms across 13 categories)
134 HNSW tests passing
65 MVCC lifecycle tests passing

# Run all tests
cargo test --workspace

# With Native V2 backend
cargo test --workspace --features native-v2

# Run benchmarks
cargo bench

# Documentation tests
cargo test --doc

Grounded Tool Scripts

Keep every change truth-based by running the Magellan stack before touching files:

scripts/watch-magellan.sh — starts magellan watch --root sqlitegraph/src with .codemcp/codegraph.db scoped to the Rust sources.
scripts/toolchain-ready.sh [symbol] — runs magellan status + llmgrep search (defaults to ToolRegistry) so you can verify tool readiness and capture execution IDs before editing.

Run these before any reading/editing steps so the CLI and LLM focus on deterministic spans instead of guessing through rg.

Documentation

User Documentation

Operator Manual - Comprehensive usage guide (14 sections)
API Docs - Quick API reference
CHANGELOG - Version history

Developer Documentation

Documentation Index - Navigation for all docs
Architecture - System architecture and design
Testing Guide - Testing patterns and utilities
Debugging Guide - Debugging and profiling
Contributing - Contribution guidelines

Development Guides

Architecture

Design Principles

300 LOC Module Limit: Maintainable boundaries
TDD Methodology: Test-driven development
Performance Benchmarks: Criterion-based regression gates

Module Organization

Core graph operations with dual backend support
Graph algorithms (centrality, community detection)
HNSW vector search with persistence
MVCC snapshots for read isolation
Introspection and debugging tools

Compiler Warnings

SQLiteGraph compiles with zero warnings as of v1.4.1:

All test modules properly gated with #[cfg(test)]
Unused imports cleaned up
Clean compilation output for better developer experience

Grounded Development Workflow

SQLiteGraph uses a grounded tool workflow to prevent guessing and ensure code changes are truth-based:

Magellan - Code graph indexing and symbol discovery

magellan watch --root sqlitegraph/src --db .codemcp/codegraph.db --debounce-ms 500

llmgrep - Semantic code search with span references

llmgrep search --db .codemcp/codegraph.db --query "symbolName" --output json

Splice / llm-transform - Span-safe code editing

splice edit --span-id <id> --execution-id <exec_id> ...

This workflow ensures every code change is grounded in actual code graph data rather than assumptions.

Built With

SQLiteGraph was developed using the following grounded development tools:

Tool	Description
Magellan (crates.io)	Code graph navigation and symbol analysis
Splice (crates.io)	Safe code editing with span-based operations
llmgrep (crates.io)	Semantic code search powered by embeddings

License

GPL-3.0-or-later - see LICENSE for details.

Contributing

Contributions welcome. Please:

Read the Contributing Guide
Read the Architecture for system understanding
Run tests to verify setup
Follow TDD methodology
Keep modules under 300 LOC
Add tests for new features

sqlitegraph 1.4.2