SQLiteGraph
Embedded Graph Database with Native V2 Backend
What's New in v1.2.7
Pub/Sub Event System - In-process event notification for graph changes
- Four event types:
NodeChanged,EdgeChanged,KVChanged,SnapshotCommitted - ID-only design for decoupled event schemas
- Channel-based delivery with filtering by event type and entity IDs
- Native V2 backend only
Full ACID Transactions - Complete transaction correctness
- Atomicity with full rollback support
- Consistency validation at runtime
- Isolation via MVCC snapshots
- Durability with WAL recovery
Developer Documentation - Comprehensive guides for contributors
- Architecture - System design and data flow
- Testing Guide - Test patterns and utilities
- Debugging Guide - Profiling and troubleshooting
- Contributing - Development workflow
Test Coverage: 380+ tests passing (59 pubsub + 42 WAL + 53 MVCC + 27 algorithms + 134 HNSW + 65 others)
SQLiteGraph is an embedded graph database in Rust featuring a dual backend architecture. It provides SQLite and Native V2 storage options with graph algorithms, HNSW vector search, and MVCC snapshots.
See CHANGELOG.md for version history.
SQLiteGraph provides two backend options:
- SQLite Backend: SQLite storage with ACID transactions
- Native V2 Backend: Clustered adjacency storage with WAL
Features
Native V2 Architecture
- Clustered Adjacency Storage: Stores edges in clusters for locality
- Write-Ahead Logging (WAL): Transaction logging with crash recovery
- Snapshot System: Export/import with lifecycle management
- Cross-Platform Atomic Operations: Concurrent access across platforms
- Storage Format: Binary format with 70%+ size reduction vs legacy V1
- Pub/Sub Events: In-process event notification for graph changes (Native V2 only)
Dual Backend Architecture
- SQLite Backend: Traditional SQLite with full ACID transactions
- Native V2 Backend: Clustered adjacency for traversal-heavy workloads
- Unified API: Single API works with both backends
- Runtime Selection: Switch backends via configuration
Core Graph Operations
- Entity/Node Management: Insert, update, retrieve, delete
- Edge Management: Create and manage typed relationships
- JSON Data Storage: Arbitrary JSON metadata on entities and edges
- Bulk Operations: Batch insert for higher throughput
Traversal & Querying
- Neighbor Queries: Get incoming/outgoing connections
- Pattern Matching: Graph pattern queries
- Traversal Algorithms: BFS, shortest path, connected components
Graph Algorithms (Phase 8)
- PageRank: Importance ranking (O(|E|) iterations)
- Betweenness Centrality: Node importance via shortest paths (O(|V||E|))
- Label Propagation: Fast community detection (O(|E|))
- Louvain Method: Modularity-based clustering (O(|E| log |V|))
Performance & Reliability
- MVCC Snapshots: Read isolation with snapshot views
- Parallel WAL Recovery: 2-3x speedup for large WAL files (500+ transactions)
- Automated Benchmarks: Criterion-based regression detection
- Safety Tools: Orphan edge detection and integrity checks
Vector Search (HNSW)
- HNSW Algorithm: Hierarchical Navigable Small World for ANN search
- Supported Metrics: Cosine, Euclidean, Dot Product, Manhattan
- OpenAI Compatible: Support for 1536-dimensional embeddings
- Flexible Dimensions: Any size from 1-4096
Developer Tools (Phase 9)
- Introspection API:
GraphIntrospectionfor statistics and debugging - Progress Tracking:
ProgressCallbackwithConsoleProgress - CLI Debug Commands:
debug-stats,debug-dump,debug-trace - Algorithm CLI Commands:
pagerank,betweenness,louvainwith progress bars
Performance Benchmarks
Benchmark Methodology:
- Hardware: Linux x86_64 (kernel 6.18+)
- Sizes: 100-500 nodes (V2 backend has 8MB node region limit, ~2048 nodes max)
- Cache state: Warm (after warmup iterations)
- Measurements: Criterion-based statistical analysis (95% confidence interval)
Native V2 vs SQLite Backend (Phase 24, 2026-01-21):
| Operation | Size | Native V2 | SQLite | Ratio |
|---|---|---|---|---|
| Node Insert | 100 | 1.14 ms | 3.63 ms | 3.2x faster |
| Node Insert | 500 | 4.91 ms | 10.57 ms | 2.2x faster |
| Edge Insert (star) | 100 | 3.85 ms | 7.18 ms | 1.9x faster |
| BFS Traversal (star) | 100 | 4.68 ms | 7.28 ms | 1.6x faster |
| BFS Traversal (chain) | 100 | 15.38 ms | 7.24 ms | 2.1x slower |
| BFS Traversal (chain) | 500 | 266.50 ms | 24.98 ms | 10.7x slower |
| 1-Hop Query | 100 | 3.87 ms | 6.93 ms | 1.8x faster |
Key Findings:
- Native V2 excels at insert operations (1.3-3.2x faster)
- Star-pattern traversals favor Native V2 (clustered adjacency locality)
- Chain traversals show regression (V2 cluster lookup overhead vs SQLite indexed adjacency)
- Workload pattern matters: choose backend based on your graph shape and access patterns
Connection Pooling:
- Warm checkout: 205 ns (pooled) vs 16.4 µs (direct) = 79.8x faster
- First checkout overhead: ~5 ms (pool initialization)
HNSW Vector Search:
- Insertion: 3-5 ms for 100 vectors (64-256 dimensions)
- Search: Sub-millisecond typical latency
- Accuracy: 95%+ recall on standard datasets
Storage Efficiency:
- Native V2 format: 70%+ size reduction vs legacy V1 format
Caveats:
- Numbers are for single-node embedded use (not distributed)
- Performance varies based on graph topology, hardware, and configuration
- V2 backend currently constrained to ~2048 nodes (8MB reserved region)
- In-memory benchmarks show 1000-10000x headroom for future optimization
Quick Start
Add to your Cargo.toml:
[]
= "1.2.7"
SQLite Backend (Default)
use ;
Native V2 Backend
[]
= { = "1.2.7", = ["native-v2"] }
use ;
Pub/Sub Events (Native V2)
[]
= { = "1.2.7", = ["native-v2"] }
use ;
use SubscriptionFilter;
let cfg = native;
let graph = open_graph?;
// Subscribe to all node change events
let filter = all;
let = graph.subscribe?;
// In a separate task or thread, receive events
while let Ok = rx.recv
// Unsubscribe when done
graph.unsubscribe?;
Backend Selection Guide
| Use Case | Recommended Backend | Why |
|---|---|---|
| Write-Heavy Workloads | Native V2 Backend | 1.3-3.2x faster insert operations |
| Star-Pattern Graphs | Native V2 Backend | Clustered adjacency benefits local queries |
| Chain-Depth Traversals | SQLite Backend | V2 has 2-10x chain traversal regression |
| Enterprise Applications | SQLite Backend | ACID transactions, tooling ecosystem |
| Existing SQLite Integration | SQLite Backend | Direct compatibility |
| Vector Search Workloads | Native V2 Backend | HNSW integration |
| Development/Testing | Either Backend | Unified API, both support in-memory |
| Small Graphs (<2K nodes) | Either Backend | V2 has node region limit, SQLite scales better |
Feature Flags
# Default - SQLite backend only
= "1.2.7"
# Native V2 backend (with pub/sub support)
= { = "1.2.7", = ["native-v2"] }
# Development features - I/O tracing
= { = "1.2.7", = ["trace_v2_io"] }
CLI Tool
# Basic status
# List entities
# Export/import
# HNSW vector search
# Algorithm commands (with progress bars)
Graph Algorithms
use algo;
// PageRank - importance ranking
let scores = pagerank?;
// Betweenness Centrality - node importance via shortest paths
let centrality = betweenness_centrality?;
// Label Propagation - fast community detection
let communities = label_propagation?;
// Louvain - modularity-based clustering
let partition = louvain_communities?;
// With progress tracking
use ConsoleProgress;
let scores = pagerank_with_progress?;
Testing
Test Coverage (v1.2.7):
- 59 pubsub tests passing (event emission, filtering, multiple subscribers)
- 42 WAL tests passing (recovery, corruption, checkpoints)
- 53 concurrent MVCC tests passing (snapshots, stress testing)
- 27 algorithm tests passing (PageRank, Betweenness, Louvain, Label Propagation)
- 134 HNSW tests passing
- 65 MVCC lifecycle tests passing
# Run all tests
# With Native V2 backend
# Run benchmarks
# Documentation tests
Grounded Tool Scripts
Keep every change truth-based by running the Magellan stack before touching files:
scripts/watch-magellan.sh— startsmagellan watch --root sqlitegraph/srcwith.codemcp/codegraph.dbscoped to the Rust sources.scripts/toolchain-ready.sh [symbol]— runsmagellan status+llmgrep search(defaults toToolRegistry) so you can verify tool readiness and capture execution IDs before editing.
Run these before any reading/editing steps so the CLI and LLM focus on deterministic spans instead of guessing through rg.
Documentation
User Documentation
- Operator Manual - Comprehensive usage guide (14 sections)
- API Docs - Quick API reference
- CHANGELOG - Version history
Developer Documentation
- Documentation Index - Navigation for all docs
- Architecture - System architecture and design
- Testing Guide - Testing patterns and utilities
- Debugging Guide - Debugging and profiling
- Contributing - Contribution guidelines
Development Guides
Architecture
Design Principles
- 300 LOC Module Limit: Maintainable boundaries
- TDD Methodology: Test-driven development
- Performance Benchmarks: Criterion-based regression gates
Module Organization
- Core graph operations with dual backend support
- Graph algorithms (centrality, community detection)
- HNSW vector search with persistence
- MVCC snapshots for read isolation
- Introspection and debugging tools
Compiler Warnings
SQLiteGraph is actively developed with 73 intentional compiler warnings as of v1.2.7:
| Category | Count | Description |
|---|---|---|
| SIMD unsafe blocks | 18 | Rust 2024 edition requires explicit unsafe blocks within unsafe fn for SIMD intrinsics (AVX2). These are low-overhead and necessary for performance. |
| Dead code (API completeness) | ~55 | Intentionally unused methods/fields preserved for: public API stability, future features, test-only functionality, and serialized format compatibility. |
These warnings are documented and acceptable - they represent intentional design choices, not technical debt. The codebase compiles cleanly with cargo check --lib and all tests pass.
Grounded Development Workflow
SQLiteGraph uses a grounded tool workflow to prevent guessing and ensure code changes are truth-based:
-
Magellan - Code graph indexing and symbol discovery
-
llmgrep - Semantic code search with span references
-
Splice / llm-transform - Span-safe code editing
This workflow ensures every code change is grounded in actual code graph data rather than assumptions.
Built With
SQLiteGraph was developed using the following grounded development tools:
| Tool | Description |
|---|---|
| Magellan (crates.io) | Code graph navigation and symbol analysis |
| Splice (crates.io) | Safe code editing with span-based operations |
| llmgrep (crates.io) | Semantic code search powered by embeddings |
License
GPL-3.0-or-later - see LICENSE for details.
Contributing
Contributions welcome. Please:
- Read the Contributing Guide
- Read the Architecture for system understanding
- Run tests to verify setup
- Follow TDD methodology
- Keep modules under 300 LOC
- Add tests for new features