sqlitegraph 1.2.7

Embedded graph database with full ACID transactions, HNSW vector search, and dual backend support
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
# SQLiteGraph

[![crates.io](https://img.shields.io/crates/v/sqlitegraph.svg)](https://crates.io/crates/sqlitegraph)
[![Documentation](https://docs.rs/sqlitegraph/badge.svg)](https://docs.rs/sqlitegraph)

**Embedded Graph Database with Native V2 Backend**

## What's New in v1.2.7

**Pub/Sub Event System** - In-process event notification for graph changes
- Four event types: `NodeChanged`, `EdgeChanged`, `KVChanged`, `SnapshotCommitted`
- ID-only design for decoupled event schemas
- Channel-based delivery with filtering by event type and entity IDs
- Native V2 backend only

**Full ACID Transactions** - Complete transaction correctness
- Atomicity with full rollback support
- Consistency validation at runtime
- Isolation via MVCC snapshots
- Durability with WAL recovery

**Developer Documentation** - Comprehensive guides for contributors
- [Architecture]docs/ARCHITECTURE.md - System design and data flow
- [Testing Guide]docs/TESTING.md - Test patterns and utilities
- [Debugging Guide]docs/DEBUGGING.md - Profiling and troubleshooting
- [Contributing]docs/CONTRIBUTING.md - Development workflow

**Test Coverage**: 380+ tests passing (59 pubsub + 42 WAL + 53 MVCC + 27 algorithms + 134 HNSW + 65 others)

---

SQLiteGraph is an embedded graph database in Rust featuring a dual backend architecture. It provides SQLite and Native V2 storage options with graph algorithms, HNSW vector search, and MVCC snapshots.

See [CHANGELOG.md](CHANGELOG.md) for version history.

SQLiteGraph provides two backend options:
- **SQLite Backend**: SQLite storage with ACID transactions
- **Native V2 Backend**: Clustered adjacency storage with WAL

## Features

### Native V2 Architecture
- **Clustered Adjacency Storage**: Stores edges in clusters for locality
- **Write-Ahead Logging (WAL)**: Transaction logging with crash recovery
- **Snapshot System**: Export/import with lifecycle management
- **Cross-Platform Atomic Operations**: Concurrent access across platforms
- **Storage Format**: Binary format with 70%+ size reduction vs legacy V1
- **Pub/Sub Events**: In-process event notification for graph changes (Native V2 only)

### Dual Backend Architecture
- **SQLite Backend**: Traditional SQLite with full ACID transactions
- **Native V2 Backend**: Clustered adjacency for traversal-heavy workloads
- **Unified API**: Single API works with both backends
- **Runtime Selection**: Switch backends via configuration

### Core Graph Operations
- **Entity/Node Management**: Insert, update, retrieve, delete
- **Edge Management**: Create and manage typed relationships
- **JSON Data Storage**: Arbitrary JSON metadata on entities and edges
- **Bulk Operations**: Batch insert for higher throughput

### Traversal & Querying
- **Neighbor Queries**: Get incoming/outgoing connections
- **Pattern Matching**: Graph pattern queries
- **Traversal Algorithms**: BFS, shortest path, connected components

### Graph Algorithms (Phase 8)
- **PageRank**: Importance ranking (O(|E|) iterations)
- **Betweenness Centrality**: Node importance via shortest paths (O(|V||E|))
- **Label Propagation**: Fast community detection (O(|E|))
- **Louvain Method**: Modularity-based clustering (O(|E| log |V|))

### Performance & Reliability
- **MVCC Snapshots**: Read isolation with snapshot views
- **Parallel WAL Recovery**: 2-3x speedup for large WAL files (500+ transactions)
- **Automated Benchmarks**: Criterion-based regression detection
- **Safety Tools**: Orphan edge detection and integrity checks

### Vector Search (HNSW)
- **HNSW Algorithm**: Hierarchical Navigable Small World for ANN search
- **Supported Metrics**: Cosine, Euclidean, Dot Product, Manhattan
- **OpenAI Compatible**: Support for 1536-dimensional embeddings
- **Flexible Dimensions**: Any size from 1-4096

### Developer Tools (Phase 9)
- **Introspection API**: `GraphIntrospection` for statistics and debugging
- **Progress Tracking**: `ProgressCallback` with `ConsoleProgress`
- **CLI Debug Commands**: `debug-stats`, `debug-dump`, `debug-trace`
- **Algorithm CLI Commands**: `pagerank`, `betweenness`, `louvain` with progress bars

## Performance Benchmarks

**Benchmark Methodology:**
- Hardware: Linux x86_64 (kernel 6.18+)
- Sizes: 100-500 nodes (V2 backend has 8MB node region limit, ~2048 nodes max)
- Cache state: Warm (after warmup iterations)
- Measurements: Criterion-based statistical analysis (95% confidence interval)

**Native V2 vs SQLite Backend (Phase 24, 2026-01-21):**

| Operation | Size | Native V2 | SQLite | Ratio |
|-----------|------|-----------|--------|-------|
| Node Insert | 100 | 1.14 ms | 3.63 ms | 3.2x faster |
| Node Insert | 500 | 4.91 ms | 10.57 ms | 2.2x faster |
| Edge Insert (star) | 100 | 3.85 ms | 7.18 ms | 1.9x faster |
| BFS Traversal (star) | 100 | 4.68 ms | 7.28 ms | 1.6x faster |
| BFS Traversal (chain) | 100 | 15.38 ms | 7.24 ms | 2.1x **slower** |
| BFS Traversal (chain) | 500 | 266.50 ms | 24.98 ms | 10.7x **slower** |
| 1-Hop Query | 100 | 3.87 ms | 6.93 ms | 1.8x faster |

**Key Findings:**
- Native V2 excels at insert operations (1.3-3.2x faster)
- Star-pattern traversals favor Native V2 (clustered adjacency locality)
- Chain traversals show regression (V2 cluster lookup overhead vs SQLite indexed adjacency)
- Workload pattern matters: choose backend based on your graph shape and access patterns

**Connection Pooling:**
- Warm checkout: 205 ns (pooled) vs 16.4 µs (direct) = **79.8x faster**
- First checkout overhead: ~5 ms (pool initialization)

**HNSW Vector Search:**
- Insertion: 3-5 ms for 100 vectors (64-256 dimensions)
- Search: Sub-millisecond typical latency
- Accuracy: 95%+ recall on standard datasets

**Storage Efficiency:**
- Native V2 format: 70%+ size reduction vs legacy V1 format

**Caveats:**
- Numbers are for single-node embedded use (not distributed)
- Performance varies based on graph topology, hardware, and configuration
- V2 backend currently constrained to ~2048 nodes (8MB reserved region)
- In-memory benchmarks show 1000-10000x headroom for future optimization

## Quick Start

Add to your `Cargo.toml`:

```toml
[dependencies]
sqlitegraph = "1.2.7"
```

### SQLite Backend (Default)

```rust
use sqlitegraph::{SqliteGraph, GraphEntity, GraphEdge};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let graph = SqliteGraph::open_in_memory()?;

    let user_entity = GraphEntity {
        id: 0,
        kind: "User".to_string(),
        name: "Alice".to_string(),
        file_path: None,
        data: serde_json::json!({"age": 30}),
    };

    let user_id = graph.insert_entity(&user_entity)?;
    println!("Created entity: {}", user_id);

    Ok(())
}
```

### Native V2 Backend

```toml
[dependencies]
sqlitegraph = { version = "1.2.7", features = ["native-v2"] }
```

```rust
use sqlitegraph::{GraphConfig, open_graph, NodeSpec, EdgeSpec};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let cfg = GraphConfig::native();
    let temp_dir = tempfile::tempdir()?;
    let db_path = temp_dir.path().join("graph.db");

    let graph = open_graph(&db_path, &cfg)?;

    let node_spec = NodeSpec {
        kind: "User".to_string(),
        name: "Alice".to_string(),
        file_path: None,
        data: serde_json::json!({"age": 30}),
    };
    let user_id = graph.insert_node(node_spec)?;

    println!("Created node: {}", user_id);
    Ok(())
}
```

### Pub/Sub Events (Native V2)

```toml
[dependencies]
sqlitegraph = { version = "1.2.7", features = ["native-v2"] }
```

```rust
use sqlitegraph::{GraphConfig, open_graph};
use sqlitegraph::backend::SubscriptionFilter;

let cfg = GraphConfig::native();
let graph = open_graph("graph.db", &cfg)?;

// Subscribe to all node change events
let filter = SubscriptionFilter::all();
let (subscriber_id, rx) = graph.subscribe(filter)?;

// In a separate task or thread, receive events
while let Ok(event) = rx.recv() {
    println!("Event: {:?}", event);
    // Events contain only IDs - read actual data from graph using snapshot_id
}

// Unsubscribe when done
graph.unsubscribe(subscriber_id)?;
```

## Backend Selection Guide

| Use Case | Recommended Backend | Why |
|----------|-------------------|-----|
| **Write-Heavy Workloads** | Native V2 Backend | 1.3-3.2x faster insert operations |
| **Star-Pattern Graphs** | Native V2 Backend | Clustered adjacency benefits local queries |
| **Chain-Depth Traversals** | SQLite Backend | V2 has 2-10x chain traversal regression |
| **Enterprise Applications** | SQLite Backend | ACID transactions, tooling ecosystem |
| **Existing SQLite Integration** | SQLite Backend | Direct compatibility |
| **Vector Search Workloads** | Native V2 Backend | HNSW integration |
| **Development/Testing** | Either Backend | Unified API, both support in-memory |
| **Small Graphs (<2K nodes)** | Either Backend | V2 has node region limit, SQLite scales better |

### Feature Flags

```toml
# Default - SQLite backend only
sqlitegraph = "1.2.7"

# Native V2 backend (with pub/sub support)
sqlitegraph = { version = "1.2.7", features = ["native-v2"] }

# Development features - I/O tracing
sqlitegraph = { version = "1.2.7", features = ["trace_v2_io"] }
```

## CLI Tool

```bash
# Basic status
sqlitegraph --command status --database memory

# List entities
sqlitegraph --command list --database mygraph.db

# Export/import
sqlitegraph --command dump-graph --output backup.json --database mygraph.db
sqlitegraph --command load-graph --input backup.json --database mygraph.db

# HNSW vector search
sqlitegraph --backend sqlite --db mygraph.db hnsw-create --dimension 768 --distance-metric cosine
sqlitegraph --backend sqlite --db mygraph.db hnsw-insert --index-name vectors --input vectors.json
sqlitegraph --backend sqlite --db mygraph.db hnsw-search --index-name vectors --input query.json --k 10

# Algorithm commands (with progress bars)
sqlitegraph --backend sqlite --db mygraph.db pagerank --progress
sqlitegraph --backend sqlite --db mygraph.db betweenness --progress
sqlitegraph --backend sqlite --db mygraph.db louvain --progress
```

## Graph Algorithms

```rust
use sqlitegraph::algo;

// PageRank - importance ranking
let scores = algo::pagerank(&graph, 0.85, 50)?;

// Betweenness Centrality - node importance via shortest paths
let centrality = algo::betweenness_centrality(&graph)?;

// Label Propagation - fast community detection
let communities = algo::label_propagation(&graph)?;

// Louvain - modularity-based clustering
let partition = algo::louvain_communities(&graph, 0.01)?;

// With progress tracking
use sqlitegraph::progress::ConsoleProgress;
let scores = algo::pagerank_with_progress(&graph, 0.85, 50, ConsoleProgress::new())?;
```

## Testing

**Test Coverage (v1.2.7):**
- 59 pubsub tests passing (event emission, filtering, multiple subscribers)
- 42 WAL tests passing (recovery, corruption, checkpoints)
- 53 concurrent MVCC tests passing (snapshots, stress testing)
- 27 algorithm tests passing (PageRank, Betweenness, Louvain, Label Propagation)
- 134 HNSW tests passing
- 65 MVCC lifecycle tests passing

```bash
# Run all tests
cargo test --workspace

# With Native V2 backend
cargo test --workspace --features native-v2

# Run benchmarks
cargo bench

# Documentation tests
cargo test --doc
```

## Grounded Tool Scripts

Keep every change truth-based by running the Magellan stack before touching files:

- `scripts/watch-magellan.sh` — starts `magellan watch --root sqlitegraph/src` with `.codemcp/codegraph.db` scoped to the Rust sources.
- `scripts/toolchain-ready.sh [symbol]` — runs `magellan status` + `llmgrep search` (defaults to `ToolRegistry`) so you can verify tool readiness and capture execution IDs before editing.

Run these before any reading/editing steps so the CLI and LLM focus on deterministic spans instead of guessing through `rg`.

## Documentation

### User Documentation
- **[Operator Manual]MANUAL.md** - Comprehensive usage guide (14 sections)
- **[API Docs]API.md** - Quick API reference
- **[CHANGELOG]CHANGELOG.md** - Version history

### Developer Documentation
- **[Documentation Index]docs/INDEX.md** - Navigation for all docs
- **[Architecture]docs/ARCHITECTURE.md** - System architecture and design
- **[Testing Guide]docs/TESTING.md** - Testing patterns and utilities
- **[Debugging Guide]docs/DEBUGGING.md** - Debugging and profiling
- **[Contributing]docs/CONTRIBUTING.md** - Contribution guidelines

### Development Guides
- **[Adding a Graph Algorithm]docs/DEVELOPMENT_GUIDES/adding-a-graph-algorithm.md**
- **[Adding a Distance Metric]docs/DEVELOPMENT_GUIDES/adding-a-distance-metric.md**
- **[Adding a CLI Command]docs/DEVELOPMENT_GUIDES/adding-a-cli-command.md**

## Architecture

### Design Principles
- **300 LOC Module Limit**: Maintainable boundaries
- **TDD Methodology**: Test-driven development
- **Performance Benchmarks**: Criterion-based regression gates

### Module Organization
- Core graph operations with dual backend support
- Graph algorithms (centrality, community detection)
- HNSW vector search with persistence
- MVCC snapshots for read isolation
- Introspection and debugging tools

## Compiler Warnings

SQLiteGraph is actively developed with **73 intentional compiler warnings** as of v1.2.7:

| Category | Count | Description |
|----------|-------|-------------|
| SIMD unsafe blocks | 18 | Rust 2024 edition requires explicit `unsafe` blocks within `unsafe fn` for SIMD intrinsics (AVX2). These are low-overhead and necessary for performance. |
| Dead code (API completeness) | ~55 | Intentionally unused methods/fields preserved for: public API stability, future features, test-only functionality, and serialized format compatibility. |

**These warnings are documented and acceptable** - they represent intentional design choices, not technical debt. The codebase compiles cleanly with `cargo check --lib` and all tests pass.

### Grounded Development Workflow

SQLiteGraph uses a **grounded tool workflow** to prevent guessing and ensure code changes are truth-based:

1. **Magellan** - Code graph indexing and symbol discovery
   ```bash
   magellan watch --root sqlitegraph/src --db .codemcp/codegraph.db --debounce-ms 500
   ```

2. **llmgrep** - Semantic code search with span references
   ```bash
   llmgrep search --db .codemcp/codegraph.db --query "symbolName" --output json
   ```

3. **Splice / llm-transform** - Span-safe code editing
   ```bash
   splice edit --span-id <id> --execution-id <exec_id> ...
   ```

This workflow ensures every code change is grounded in actual code graph data rather than assumptions.

## Built With

SQLiteGraph was developed using the following grounded development tools:

| Tool | Description |
|------|-------------|
| **[Magellan]https://github.com/oldnordic/magellan** ([crates.io]https://crates.io/crates/magellan) | Code graph navigation and symbol analysis |
| **[Splice]https://github.com/oldnordic/splice** ([crates.io]https://crates.io/crates/splice) | Safe code editing with span-based operations |
| **[llmgrep]https://github.com/oldnordic/llmgrep** ([crates.io]https://crates.io/crates/llmgrep) | Semantic code search powered by embeddings |

## License

GPL-3.0-or-later - see [LICENSE](LICENSE) for details.

## Contributing

Contributions welcome. Please:
1. Read the [Contributing Guide]docs/CONTRIBUTING.md
2. Read the [Architecture]docs/ARCHITECTURE.md for system understanding
3. Run tests to verify setup
4. Follow TDD methodology
5. Keep modules under 300 LOC
6. Add tests for new features