SQLiteGraph
Embedded Graph Database with Native V2 Backend and Graph Algorithms Library
What's New in v1.4.2
Developer Documentation - Comprehensive internal architecture guides
- 5 new developer guides (80+ pages of technical content)
- HNSW Vector Store Internals - layers, search/insert algorithms
- KV Store Architecture - MVCC, TTL, WAL integration
- Native V2 Binary Format - file structure, clusters, recovery
- Pub/Sub Implementation - event types, filtering, WAL integration
- Query API Enhancements - prefix scan, pattern matching, subscriptions
- Zero compiler warnings (v1.4.1 cleanup)
- Updated docs/INDEX.md with component architecture section
v1.4.1 - Code Quality
- Reduced compiler warnings from 8 to 0
- Added #[cfg(test)] to test-only modules
- Removed unused imports
v1.4.0 - Pub/Sub Query Enhancements
- Pattern-based subscriptions (kind_patterns, name_patterns)
- KV prefix scanning for efficient queries
- Node query by kind and name pattern
v1.3.0 - Graph Algorithms Library
- 35 algorithms across 13 categories
- Core Graph Theory: WCC, SCC, transitive closure, transitive reduction, topological sort
- Reachability: forward/backward reachability, can-reach, unreachable-nodes
- CFG Analysis: dominators, post-dominators, control dependence, dominance frontiers, natural loops
- Path Analysis: path enumeration with constraints, critical path, cycle basis
- Program Analysis: backward/forward slicing, SCC collapse
- Distributed Systems: min cut, min vertex cut, graph partitioning
- Observability: happens-before analysis, impact radius
- ML/Inference: subgraph isomorphism, graph rewriting, structural similarity
- Graph Diff: structural delta, refactor validation
- Security: taint propagation, sink analysis, source/sink discovery
- CLI commands for all 35 algorithms with progress tracking
- Foundation for compiler optimization, security analysis, and program understanding
Full ACID Transactions - Complete transaction correctness
- Atomicity with full rollback support
- Consistency validation at runtime
- Isolation via MVCC snapshots
- Durability with WAL recovery
Developer Documentation - Comprehensive guides for contributors
- Architecture - System design and data flow
- Testing Guide - Test patterns and utilities
- Debugging Guide - Profiling and troubleshooting
- Contributing - Development workflow
Test Coverage: 530+ tests passing (59 pubsub + 42 WAL + 53 MVCC + 180 algorithms + 134 HNSW + 65 others)
SQLiteGraph is an embedded graph database in Rust featuring a dual backend architecture. It provides SQLite and Native V2 storage options with graph algorithms, HNSW vector search, and MVCC snapshots.
See CHANGELOG.md for version history.
SQLiteGraph provides two backend options:
- SQLite Backend: SQLite storage with ACID transactions
- Native V2 Backend: Clustered adjacency storage with WAL
Features
Native V2 Architecture
- Clustered Adjacency Storage: Stores edges in clusters for locality
- Write-Ahead Logging (WAL): Transaction logging with crash recovery
- Snapshot System: Export/import with lifecycle management
- Cross-Platform Atomic Operations: Concurrent access across platforms
- Storage Format: Binary format with 70%+ size reduction vs legacy V1
- Pub/Sub Events: In-process event notification for graph changes (Native V2 only)
Dual Backend Architecture
- SQLite Backend: Traditional SQLite with full ACID transactions
- Native V2 Backend: Clustered adjacency for traversal-heavy workloads
- Unified API: Single API works with both backends
- Runtime Selection: Switch backends via configuration
Core Graph Operations
- Entity/Node Management: Insert, update, retrieve, delete
- Edge Management: Create and manage typed relationships
- JSON Data Storage: Arbitrary JSON metadata on entities and edges
- Bulk Operations: Batch insert for higher throughput
Traversal & Querying
- Neighbor Queries: Get incoming/outgoing connections
- Pattern Matching: Graph pattern queries
- Traversal Algorithms: BFS, shortest path, connected components
Graph Algorithms Library
- 35 algorithms across 13 categories: Comprehensive collection for CFG analysis, program slicing, security
- Core Graph Theory: WCC, SCC, transitive closure, transitive reduction, topological sort
- Reachability: forward/backward reachability, can-reach, unreachable-nodes
- CFG Analysis: dominators, post-dominators, control dependence, dominance frontiers, natural loops
- Path Analysis: path enumeration with constraints, critical path, cycle basis
- Program Analysis: backward/forward slicing, SCC collapse
- Distributed Systems: min cut, min vertex cut, graph partitioning
- Observability: happens-before analysis, impact radius
- ML/Inference: subgraph isomorphism, graph rewriting, structural similarity
- Graph Diff: structural delta, refactor validation
- Security: taint propagation, sink analysis, source/sink discovery
Performance & Reliability
- MVCC Snapshots: Read isolation with snapshot views
- Parallel WAL Recovery: 2-3x speedup for large WAL files (500+ transactions)
- Automated Benchmarks: Criterion-based regression detection
- Safety Tools: Orphan edge detection and integrity checks
Vector Search (HNSW)
- HNSW Algorithm: Hierarchical Navigable Small World for ANN search
- Supported Metrics: Cosine, Euclidean, Dot Product, Manhattan
- OpenAI Compatible: Support for 1536-dimensional embeddings
- Flexible Dimensions: Any size from 1-4096
Developer Tools (Phase 9)
- Introspection API:
GraphIntrospectionfor statistics and debugging - Progress Tracking:
ProgressCallbackwithConsoleProgress - CLI Debug Commands:
debug-stats,debug-dump,debug-trace - Algorithm CLI Commands:
pagerank,betweenness,louvainwith progress bars
Performance Benchmarks
Benchmark Methodology:
- Hardware: Linux x86_64 (kernel 6.18+)
- Sizes: 100-500 nodes (V2 backend has 8MB node region limit, ~2048 nodes max)
- Cache state: Warm (after warmup iterations)
- Measurements: Criterion-based statistical analysis (95% confidence interval)
Native V2 vs SQLite Backend (Phase 24, 2026-01-21):
| Operation | Size | Native V2 | SQLite | Ratio |
|---|---|---|---|---|
| Node Insert | 100 | 1.14 ms | 3.63 ms | 3.2x faster |
| Node Insert | 500 | 4.91 ms | 10.57 ms | 2.2x faster |
| Edge Insert (star) | 100 | 3.85 ms | 7.18 ms | 1.9x faster |
| BFS Traversal (star) | 100 | 4.68 ms | 7.28 ms | 1.6x faster |
| BFS Traversal (chain) | 100 | 15.38 ms | 7.24 ms | 2.1x slower |
| BFS Traversal (chain) | 500 | 266.50 ms | 24.98 ms | 10.7x slower |
| 1-Hop Query | 100 | 3.87 ms | 6.93 ms | 1.8x faster |
Key Findings:
- Native V2 excels at insert operations (1.3-3.2x faster)
- Star-pattern traversals favor Native V2 (clustered adjacency locality)
- Chain traversals show regression (V2 cluster lookup overhead vs SQLite indexed adjacency)
- Workload pattern matters: choose backend based on your graph shape and access patterns
Connection Pooling:
- Warm checkout: 205 ns (pooled) vs 16.4 µs (direct) = 79.8x faster
- First checkout overhead: ~5 ms (pool initialization)
HNSW Vector Search:
- Insertion: 3-5 ms for 100 vectors (64-256 dimensions)
- Search: Sub-millisecond typical latency
- Accuracy: 95%+ recall on standard datasets
Storage Efficiency:
- Native V2 format: 70%+ size reduction vs legacy V1 format
Caveats:
- Numbers are for single-node embedded use (not distributed)
- Performance varies based on graph topology, hardware, and configuration
- V2 backend currently constrained to ~2048 nodes (8MB reserved region)
- In-memory benchmarks show 1000-10000x headroom for future optimization
Quick Start
Add to your Cargo.toml:
[]
= "1.4.2"
SQLite Backend (Default)
use ;
Native V2 Backend
[]
= { = "1.4.2", = ["native-v2"] }
use ;
Pub/Sub Events (Native V2)
[]
= { = "1.4.2", = ["native-v2"] }
use ;
use SubscriptionFilter;
let cfg = native;
let graph = open_graph?;
// Subscribe to all node change events
let filter = all;
let = graph.subscribe?;
// In a separate task or thread, receive events
while let Ok = rx.recv
// Unsubscribe when done
graph.unsubscribe?;
Backend Selection Guide
| Use Case | Recommended Backend | Why |
|---|---|---|
| Write-Heavy Workloads | Native V2 Backend | 1.3-3.2x faster insert operations |
| Star-Pattern Graphs | Native V2 Backend | Clustered adjacency benefits local queries |
| Chain-Depth Traversals | SQLite Backend | V2 has 2-10x chain traversal regression |
| Enterprise Applications | SQLite Backend | ACID transactions, tooling ecosystem |
| Existing SQLite Integration | SQLite Backend | Direct compatibility |
| Vector Search Workloads | Native V2 Backend | HNSW integration |
| Development/Testing | Either Backend | Unified API, both support in-memory |
| Small Graphs (<2K nodes) | Either Backend | V2 has node region limit, SQLite scales better |
Feature Flags
# Default - SQLite backend only
= "1.4.2"
# Native V2 backend (with pub/sub support)
= { = "1.4.2", = ["native-v2"] }
# Development features - I/O tracing
= { = "1.4.2", = ["trace_v2_io"] }
CLI Tool
# Basic status
# List entities
# Export/import
# HNSW vector search
# Algorithm commands (with progress bars)
Graph Algorithms
use algo;
// PageRank - importance ranking
let scores = pagerank?;
// Betweenness Centrality - node importance via shortest paths
let centrality = betweenness_centrality?;
// Label Propagation - fast community detection
let communities = label_propagation?;
// Louvain - modularity-based clustering
let partition = louvain_communities?;
// With progress tracking
use ConsoleProgress;
let scores = pagerank_with_progress?;
Testing
Test Coverage (v1.4.2):
- 59 pubsub tests passing (event emission, filtering, multiple subscribers)
- 42 WAL tests passing (recovery, corruption, checkpoints)
- 53 concurrent MVCC tests passing (snapshots, stress testing)
- 180+ algorithm tests passing (35 algorithms across 13 categories)
- 134 HNSW tests passing
- 65 MVCC lifecycle tests passing
# Run all tests
# With Native V2 backend
# Run benchmarks
# Documentation tests
Grounded Tool Scripts
Keep every change truth-based by running the Magellan stack before touching files:
scripts/watch-magellan.sh— startsmagellan watch --root sqlitegraph/srcwith.codemcp/codegraph.dbscoped to the Rust sources.scripts/toolchain-ready.sh [symbol]— runsmagellan status+llmgrep search(defaults toToolRegistry) so you can verify tool readiness and capture execution IDs before editing.
Run these before any reading/editing steps so the CLI and LLM focus on deterministic spans instead of guessing through rg.
Documentation
User Documentation
- Operator Manual - Comprehensive usage guide (14 sections)
- API Docs - Quick API reference
- CHANGELOG - Version history
Developer Documentation
- Documentation Index - Navigation for all docs
- Architecture - System architecture and design
- Testing Guide - Testing patterns and utilities
- Debugging Guide - Debugging and profiling
- Contributing - Contribution guidelines
Development Guides
Architecture
Design Principles
- 300 LOC Module Limit: Maintainable boundaries
- TDD Methodology: Test-driven development
- Performance Benchmarks: Criterion-based regression gates
Module Organization
- Core graph operations with dual backend support
- Graph algorithms (centrality, community detection)
- HNSW vector search with persistence
- MVCC snapshots for read isolation
- Introspection and debugging tools
Compiler Warnings
SQLiteGraph compiles with zero warnings as of v1.4.1:
- All test modules properly gated with
#[cfg(test)] - Unused imports cleaned up
- Clean compilation output for better developer experience
Grounded Development Workflow
SQLiteGraph uses a grounded tool workflow to prevent guessing and ensure code changes are truth-based:
-
Magellan - Code graph indexing and symbol discovery
-
llmgrep - Semantic code search with span references
-
Splice / llm-transform - Span-safe code editing
This workflow ensures every code change is grounded in actual code graph data rather than assumptions.
Built With
SQLiteGraph was developed using the following grounded development tools:
| Tool | Description |
|---|---|
| Magellan (crates.io) | Code graph navigation and symbol analysis |
| Splice (crates.io) | Safe code editing with span-based operations |
| llmgrep (crates.io) | Semantic code search powered by embeddings |
License
GPL-3.0-or-later - see LICENSE for details.
Contributing
Contributions welcome. Please:
- Read the Contributing Guide
- Read the Architecture for system understanding
- Run tests to verify setup
- Follow TDD methodology
- Keep modules under 300 LOC
- Add tests for new features