sqlitegraph 3.2.3

Embedded graph database with full ACID transactions, HNSW vector search, dual backend support, and comprehensive graph algorithms library
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
# SQLiteGraph Changelog

## [3.2.3] - 2026-06-07

### Changed

- **Benchmarking guidance corrected and consolidated** — Added
  `docs/BENCHMARKING.md`, updated README benchmark instructions to use release
  mode for the quick comparison example, and removed stale claims that no
  longer matched current clean runs.
- **`sqlite_v3_curated` benchmark added** — A small-case Criterion suite for
  high-signal SQLite vs V3 comparisons that completes in practical time on a
  developer workstation.
- **`examples/test_performance_comparison.rs` now describes itself correctly**  The example is documented as a warm-cache microbenchmark and no longer ends
  with hardcoded backend recommendations that could contradict measured output.

- **HNSW index lock upgraded from `std::sync::Mutex` to `parking_lot::Mutex`**  Removed poison handling at 10 call sites in `index_api.rs`. `parking_lot::Mutex`
  provides smaller lock size and eliminates poison handling overhead. (SG-1)

- **All remaining `std::sync::Mutex` replaced with `parking_lot::Mutex`**  `progress.rs` (2 sites), `statement_tracker.rs` (1 site), `publisher.rs`
  (5 sites). All `.expect("... poisoned")` patterns eliminated. (SG-2)

- **HNSW runtime operation counters added with atomics**`HnswIndexStats`
  now reports lock-free `insert_count`, `search_count`, `vector_cache_hits`,
  and `vector_cache_misses`, all backed by `AtomicU64`. Rebuild/autoload paths
  reset these counters after internal recovery so they reflect runtime traffic
  instead of startup repair work. `Publisher::next_id` also now uses
  `AtomicU64` instead of `Arc<Mutex<u64>>`. (SG-2/SG-3)

- **`std::sync::RwLock` replaced with `parking_lot::RwLock` in `query_cache.rs`**  11 poison handling blocks removed. (SG-2)

### Added

- **Streaming graph traversal iterators**`bfs_iter`, `dfs_iter`,
  `topological_sort_iter`, `connected_components_iter` in
  `algo::backend::iterator`. Implement `GraphIterator` trait (BFS/DFS/topo)
  and `Iterator<Item=Result<Vec<i64>>>` (connected components) for lazy,
  composable traversal. O(frontier) memory for node iterators,
  O(|V_component|) for connected components. (SG-4)

## [3.1.4] - 2026-06-06

### Fixed
- Set `busy_timeout(5000ms)` on `conn_for_storage` in `hnsw_index_persistent`. Without
  this, concurrent writes from the magellan service daemon caused `persist_topology` to
  receive SQLITE_BUSY and silently discard entry_points and layer data (via
  `let _ = self.persist_topology()`). Result: 0 rows in hnsw_entry_points,
  incomplete hnsw_layers, and "Index not initialized" errors on every hopgraph query.

## [3.1.3] - 2026-06-06

### Fixed
- `search_layer` now implements proper HNSW greedy search with early termination. Previous
  implementation stopped after `k + M` candidates (e.g. 21 for k=5, M=16), exploring only
  the entry point's immediate neighborhood. New implementation uses `ef_search` as the
  candidate pool size and stops when the closest unexplored candidate is farther than the
  worst result seen, matching the standard HNSW algorithm. Fixes hopgraph returning only
  symbols from the first indexed file regardless of query.

## [3.1.2] - 2026-06-04

### Fixed
- `store_vector` now uses `INSERT` without explicit `id` + `last_insert_rowid()` instead of
  `SELECT MAX(id) + 1`. Removes a TOCTOU race where two concurrent writers to the same DB
  could compute the same next-id and collide on PRIMARY KEY. `store_vector_with_id` (used
  only for topology restore) retains explicit-id `INSERT OR IGNORE` semantics unchanged.

## [3.0.4] - 2026-05-26

### Fixed
- Replaced 46 bare `.unwrap()` calls in production code with `.expect("invariant: ...")`
  - `algo/`: 29 sites (centrality, scc, transitive_closure, cycle_basis, critical_path, topological_sort, graph_ops, traversal, backend/centrality)
  - `backend/native/v3/index_persistence.rs`: 8 `.try_into().unwrap()` in deserialization
  - `backend/native/v3/pubsub/publisher.rs`: 5 `.lock().unwrap()` on poisoned-mutex-vulnerable paths
  - `backend/sqlite/impl_.rs`: 2 sites (publisher init, infallible string write)
  - `backend/native/v3/forensics.rs`: 1 `.get_mut().unwrap()`
  - `backend/native/v3/storage/adaptive_page.rs`: 1 `.as_ref().unwrap()`
- Fixed all pre-existing clippy warnings across workspace (43 fixes)
  - Removed dead code: unused helpers, constants, variants, imports across examples/tests/benches
  - Replaced `Arc` with `Rc` for non-thread-shared snapshot tests
  - Fixed `println!("")``println!()`, needless range loops, unnecessary casts, needless borrows
  - Added benchmark cases for `Incoming` and `Undirected` directions
- Production bare `.unwrap()` count: **0** (all remaining unwraps are in `#[cfg(test)]` code)
- `cargo clippy --all-targets -- -D warnings` now passes clean

## [1.5.3] - 2026-02-08

### 🐛 Critical Bug Fixes

#### Header Corruption Fix - Multiple GraphFile Instances
**Location**: `src/backend/native/graph_file/mod.rs:164-175, 222-228`
**Severity**: CRITICAL - Data corruption during concurrent access
**Impact**: Fixed `node_count` reset to 0 when multiple GraphFile instances access the same file

**Root Cause**: When multiple `GraphFile` instances access the same database file (e.g., main thread and watcher thread), the Drop implementation blindly writes the in-memory header to disk. The second instance (which never wrote any nodes) has `node_count=0` and overwrites the correct data from the first instance.

**Fixes Applied**:
1. Added `sync_all()` call to `GraphFile::write_header()` to ensure header reaches disk before Drop
2. Added guard to Drop impl to skip header write if `node_count=0`, preventing read-only instances from corrupting data

**Code Changes**:
```rust
// Before: Only flush() to OS buffer
pub fn write_header(&mut self) -> NativeResult<()> {
    let header_bytes = encode_persistent_header(&self.persistent_header)?;
    self.file.seek(SeekFrom::Start(0))?;
    self.file.write_all(&header_bytes)?;
    self.file.flush()?;  // Only flushes to OS buffer, not disk
    Ok(())
}

// After: Includes sync_all() for durability
pub fn write_header(&mut self) -> NativeResult<()> {
    let header_bytes = encode_persistent_header(&self.persistent_header)?;
    self.file.seek(SeekFrom::Start(0))?;
    self.file.write_all(&header_bytes)?;
    self.file.flush()?;
    self.file.sync_all().map_err(NativeBackendError::Io)?;  // Ensures data reaches disk
    Ok(())
}

// Drop impl now guards against stale overwrites
impl Drop for GraphFile {
    fn drop(&mut self) {
        // Don't overwrite if this instance never wrote any nodes
        if self.persistent_header.node_count == 0 {
            return;
        }
        let _ = self.write_header();
        let _ = self.sync();
    }
}
```

**Testing**:
- Verified database headers now persist correctly even after process crashes
- Status commands now report correct file/symbol counts after abnormal termination
- No data loss when using multiple GraphFile instances concurrently

**Related Documentation**:
- See `docs/bug_report_node_count_not_updated.md` for detailed analysis

---

## [1.5.2] - 2026-02-07

### 🚀 New Features

#### GraphBackend API Enhancement - Clustered Neighbor Queries

**Location**: `src/backend/mod.rs`

Added `neighbors_clustered()` method to the `GraphBackend` trait for direct access to clustered adjacency operations. This provides a dedicated API path for performance-optimized neighbor queries on backends that support clustered storage (Native V2).

**API Changes:**
- Added `neighbors_clustered()` method to `GraphBackend` trait with default implementation
- Default implementation falls back to standard `neighbors()` for backends without clustered storage
- Explicit implementations for `SqliteGraphBackend` (delegates to standard neighbors)
- Explicit implementations for `NativeGraphBackend` (uses V2 clustered adjacency)

**Wrapper Implementations Added:**
- `impl<B> GraphBackend for &B` - includes `neighbors_clustered` method
- `impl<B> GraphBackend for Rc<B>` - includes `neighbors_clustered` method

**Usage Example:**
```rust
use sqlitegraph::backend::GraphBackend;

// Direct clustered access (delegates to standard neighbors on SQLite backend)
let neighbors = graph.neighbors_clustered(node_id, snapshot_id)?;
```

**Performance Notes:**
- For Native V2 backend: Uses optimized V2 clustered adjacency internally
- For SQLite backend: Delegates to standard `neighbors()` query (no clustering)
- The standard `neighbors()` method already uses V2 clustered adjacency when available
- This API provides explicit access while maintaining backward compatibility

---

## [0.2.5] - 2025-12-21

### 🚀 Major New Features: V2 Snapshot System & Atomic Operations

**V2 Snapshot System with complete lifecycle management**

#### V2 Snapshot System ✅
- **Complete Implementation**: Full TDD methodology with 95% test coverage
- **Atomic Operations**: Database-grade filesystem operations with fsync discipline
- **Lifecycle Management**: Explicit state management with deterministic behavior
- **Cross-Platform**: Enhanced filesystem compatibility across platforms
- **Crash Safety**: Atomic copy operations with rollback on failure

**Location**: `src/backend/native/v2/snapshot/`, `src/graph/snapshot.rs`

#### Key Features
- **Instant Snapshots**: Bypass WAL complexity with direct file operations
- **State Validation**: Strict invariant validation before export operations
- **Atomic Exports**: Temporary file + rename pattern for crash safety
- **Import/Export**: Complete snapshot serialization and deserialization
- **Recovery Support**: Consistent state restoration from snapshots

#### Implementation Highlights
```rust
// Atomic file copy with preconditions validation
AtomicFileOperations::atomic_copy_file(source, destination)
    .with_precondition_validation()
    .with_fsync_discipline()
    .with_overwrite_protection()
```

### 🐛 Critical Bug Fixes

#### Atomic File Operations Bug Fix (SNAPSHOT-DIR-FILE-BUG-001)
**Location**: `src/backend/native/v2/snapshot/atomic_ops.rs:133-138`
**Severity**: CRITICAL - Directory vs file confusion in snapshot export
**Impact**: Fixed "Is a directory" error preventing snapshot operations

**Root Cause**: Atomic operations accepted directory paths as source files
**Solution Applied**: Explicit file validation with detailed error messages
```rust
// BEFORE BUG: Ambiguous path validation
if !source.exists() { /* error */ }

// AFTER FIX: Explicit file validation
if !source.is_file() {
    return Err(NativeBackendError::InvalidParameter {
        context: format!("Source path is not a file: {:?} (is_directory: {})",
                        source, source.is_dir()),
    });
}
```

#### V2 Adjacency System Fixes
**Location**: `src/backend/native/adjacency/core_iterator.rs:250-264`
- **Infinite Loop Resolution**: Fixed stack overflow crashes in AdjacencyIterator
- **Header Consistency**: Fixed edge count metadata updates for record visibility
- **Circular Dependencies**: Eliminated recursive calls causing stack overflows
- **Performance Recovery**: Restored 181/181 tests passing (100% success rate)

### ⚡ Performance Improvements

#### V2 Cluster Architecture Stable
- **I/O Improvement**: Clustered adjacency with direct edge scanning
- **Sub-millisecond Operations**: Fast path for common graph operations
- **Storage Efficiency**: >70% improvement over V1 format
- **Query Performance**: 5,000-50,000 ops/sec for graph traversals

#### WAL System Performance Gains
- **Write Throughput**: Optimized write-ahead logging implementation
- **Concurrent Operations**: 30-50% improvement for mixed read/write workloads
- **Transaction Speed**: 58% faster transaction commits than DELETE mode
- **Memory Efficiency**: 64MB cache with optimized synchronous settings

### 🏗️ Architecture Enhancements

#### MVCC Snapshots for Read Isolation
- **Deterministic State Management**: Explicit lifecycle states with validation
- **Read-Only Connections**: Isolated snapshot access with in-memory databases
- **Cache State Integration**: Automatic cache state updates for consistency
- **Thread Safety**: Multi-threaded snapshot access with Arc sharing

#### Cross-Platform Atomic Operations
- **Database-Grade File Operations**: fsync discipline for crash safety
- **Temporary File Management**: Atomic rename pattern with cleanup
- **Error Recovery**: Comprehensive rollback on operation failures
- **Directory Validation**: Enhanced filesystem compatibility checks

### 🧪 Testing Infrastructure

#### Comprehensive TDD Implementation
- **New Test Suites**: 41 V2-specific test files with systematic coverage
- **Regression Gates**: 33 new V2-specific performance gates
- **Integration Tests**: End-to-end workflow validation
- **Bug Regression Tests**: Specific invariant violation detection

#### Performance Benchmarking
- **Comparative Analysis**: NetworkX, SQLite FTS5, and simple adjacency benchmarks
- **Baseline Enforcement**: Performance gates preventing regressions
- **Automated CI**: Continuous performance validation in test suite
- **Documentation**: Comprehensive performance reports and analysis

### 📚 Documentation Updates

#### Technical Documentation
- **Implementation Guides**: Complete V2 architecture documentation
- **API Reference**: Updated for new snapshot and atomic operations APIs
- **Migration Guides**: V1 to V2 transition assistance
- **Performance Reports**: Detailed benchmarking analysis and results

#### Bug Fix Documentation
- **Root Cause Analysis**: Detailed investigation reports for critical bugs
- **Fix Verification**: Evidence-based validation of bug resolutions
- **Regression Prevention**: Barriers to prevent reintroduction of fixed issues
- **Performance Impact**: Performance measurements before and after fixes

### 🔧 Breaking Changes

#### API Changes
- **Snapshot API**: New `acquire_snapshot()` method returns MVCC isolation
- **Atomic Operations**: Enhanced error types with detailed diagnostics
- **V2 Configuration**: Updated feature flags for V2-only architecture
- **V1 Removal**: Complete removal of legacy V1 data structures

#### Migration Path
- **V1 Legacy Removal**: 10,604 lines of V1 code permanently removed
- **V1 Prevention**: 7 compile-time barriers prevent V1 reintroduction
- **Upgrade Requirements**: Databases automatically upgrade to V2 format
- **Backward Compatibility**: V1 databases upgraded safely on first access

### 📊 Metrics & Statistics

#### Code Quality
- **Total Source Files**: 106 Rust files with clean architecture
- **Test Coverage**: 87 public APIs with 85.1% coverage (74 complete, 9 partial)
- **Invariant Enforcement**: 46 invariants across 6 categories
- **Regression Protection**: 58 existing + 33 new V2-specific performance gates

#### Performance Characteristics
- **Node Insertion**: 1,000 - 100,000 ops/sec (scales with dataset)
- **Edge Insertion**: 1,000 - 100,000 ops/sec (scales with dataset)
- **Query Performance**: 5,000 - 50,000 ops/sec for traversals
- **Memory Efficiency**: ~4-10 bytes/edge depending on graph topology

---

## [0.2.4] - 2025-12-19

### 🚀 Stable WAL Mode Implementation (SQLite Backend Only)

**Write-Ahead Logging (WAL) mode is now documented and validated**

#### WAL Mode Features ✅
- **Zero Configuration**: WAL mode enabled by default for all file-based SQLite databases
- **SQLite Backend Only**: WAL mode applies to SQLite backend, not Native V2 backend (uses direct file I/O)
- **Automatic Optimization**: 64MB cache, NORMAL synchronous mode, 256GB memory-mapped I/O
- **Concurrent Performance**: 30-50% improvement for concurrent read/write workloads
- **Graceful Fallback**: Automatic fallback to DELETE mode on unsupported filesystems
- **Validation**: Error handling and validation coverage

#### Implementation Details
**Location**: `src/graph/core.rs:85-102`
- Automatic WAL mode activation for file-based databases
- Exclusion for in-memory databases (WAL not applicable)
- Optimized PRAGMA settings for real workloads

#### Documentation Added
- **Comprehensive Guide**: `docs/WAL_MODE_IMPLEMENTATION_GUIDE.md` (400+ lines)
- **Performance Analysis**: Benchmark results and configuration tuning
- **Usage Examples**: Code samples and best practices
- **Troubleshooting**: Common issues and solutions

#### Test Coverage
- **8 New Tests**: `tests/wal_mode_default_tests.rs`
- **100% Pass Rate**: All WAL functionality validated
- **Comprehensive Scenarios**: Default behavior, performance, transactions, large datasets

#### Performance Results
**BFS Benchmark**: 5.93ms for 100-node graph traversal
- **Concurrent Reads**: 38% faster than DELETE mode
- **Concurrent Writes**: 37% faster than DELETE mode
- **Mixed Operations**: 43% faster than DELETE mode
- **Transaction Commits**: 58% faster than DELETE mode

#### File Structure
WAL mode automatically creates additional files for file-based databases:
```
database.db        -- Main database file
database.db-wal    -- WAL journal (write-ahead log)
database.db-shm    -- Shared memory file for WAL coordination
```

#### API Compatibility
- **Zero Breaking Changes**: All existing APIs work unchanged
- **Transparent Integration**: WAL mode benefits automatic for users
- **Configuration Options**: Advanced tuning available for power users

#### Usage Examples

**Default Usage (Recommended)**:
```rust
// WAL mode enabled automatically for file-based databases
let graph = SqliteGraph::open("my_database.db")?;
```

**Advanced Configuration**:
```rust
let config = SqliteConfig::new("my_database.db")
    .with_wal_mode()
    .with_performance_mode();

let graph = SqliteGraph::with_config(config)?;
```

---

## [0.2.3] - 2025-01-19

### 🛠️ Critical V2 Fixes and Performance Improvements

**Major V2 backend stability and performance fixes with corruption prevention**

---

## [Unreleased]

### Internal: Dead Code Audit Completed
A full audit of all clippy `dead_code` warnings was performed:

- 149 warnings flagged
- 149 confirmed as false positives
- 0 unused or obsolete items found

Warnings come from:
- CLI modules
- benchmark tooling
- dual-runtime system
- tests
- DSL/pipeline parsers

No code removed and no suppressions added. Documentation updated accordingly.