SQLiteGraph

Deterministic, embedded graph database with SQLite and high-performance Native V2 backends.

Current Status: Active Development

SQLiteGraph provides two backend options for different use cases:

SQLite Backend: Mature, ACID-compliant, WAL mode enabled by default for concurrent performance (file-based databases)
Native V2 Backend: High-performance, clustered adjacency with complete WAL implementation for transactional durability (direct file I/O)

Features

Dual Backend Architecture

SQLite Backend: Traditional SQLite storage with full ACID transactions and WAL mode for concurrency
Native V2 Backend: Custom binary format with clustered adjacency for maximum performance
Backend Abstraction: Unified API works with either backend
Easy Migration: Switch backends with configuration changes

WAL Mode (Both Backends)

SQLite Backend: WAL mode enabled by default for all file-based databases
Native V2 Backend: Complete WAL implementation with cluster-affinity logging
Automatic Enablement: WAL enabled by default (SQLite) or available via config (Native V2)
Concurrent Performance: 30-50% improvement for concurrent read/write workloads
Crash Recovery: Full transaction recovery from WAL logs
ACID Compliance: Full transaction support with rollback capabilities
Automatic File Management: WAL and SHM files created and managed automatically
Network Filesystem Support: Graceful fallback to DELETE mode when WAL is unsupported

Core Graph Operations

Entity Management: Insert, update, retrieve, delete graph entities
Edge Management: Create and manage relationships between entities
JSON Data Storage: Arbitrary JSON metadata with entities and edges
Deterministic Operations: Consistent ordering and behavior

Traversal & Querying

Neighbor Queries: Get incoming/outgoing connections
Pattern Matching: Advanced graph pattern queries with fast-path caching
Traversal Algorithms: BFS, shortest path, connected components
Query Cache: Cached K-hop and shortest path queries
Reasoning Pipelines: Multi-step analysis with filtering and scoring

HNSW Vector Search

Approximate Nearest Neighbor: O(log N) search complexity
High Performance: In-memory vector index with 95%+ recall
Multiple Distance Metrics: Cosine, Euclidean, Dot Product, Manhattan
SIMD Optimized: AVX2/AVX-512 support for distance calculations
Dynamic Updates: Insert and delete vectors without full rebuilds
Configuration: Flexible HNSW parameters for accuracy/speed tradeoffs

Bulk Operations & Snapshots

Bulk Insert: High-performance batch entity and edge insertion
Snapshot Export: Atomic graph snapshot creation with 70%+ storage efficiency
Snapshot Import: Fast graph restoration from snapshots
Cross-Platform: Platform-independent binary snapshot format

MVCC & Transactions

MVCC Snapshots: Read isolation with snapshot consistency
Transaction Support: Full ACID transactions (SQLite) or WAL transactions (Native V2)
Rollback: Complete transaction rollback capabilities

Performance & Safety

Benchmark Gates: Automated performance regression prevention
Safety Tools: Orphan edge detection, integrity validation
Memory Management: Configurable caching and buffer management
Error Handling: Comprehensive error reporting and recovery

Quick Start

Add to your Cargo.toml:

[dependencies]
sqlitegraph = "0.2.1"

SQLite Backend (Default)

use sqlitegraph::{SqliteGraph, GraphEntity, GraphEdge};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create in-memory SQLite graph
    let graph = SqliteGraph::open_in_memory()?;

    // Create entities with proper structure
    let user_entity = GraphEntity {
        id: 0, // Assigned by database
        kind: "User".to_string(),
        name: "Alice".to_string(),
        file_path: None,
        data: serde_json::json!({"age": 30}),
    };

    let project_entity = GraphEntity {
        id: 0,
        kind: "Project".to_string(),
        name: "SQLiteGraph".to_string(),
        file_path: None,
        data: serde_json::json!({"status": "active"}),
    };

    // Insert entities
    let user_id = graph.insert_entity(&user_entity)?;
    let project_id = graph.insert_entity(&project_entity)?;

    // Create relationship
    let works_on_edge = GraphEdge {
        id: 0,
        from_id: user_id,
        to_id: project_id,
        edge_type: "works_on".to_string(),
        data: serde_json::json!({"role": "developer"}),
    };

    let edge_id = graph.insert_edge(&works_on_edge)?;

    println!("Created graph: {} entities, {} edges", 2, 1);
    println!("Edge ID: {}", edge_id);

    Ok(())
}

Native V2 Backend (High Performance)

Enable the Native V2 backend in your Cargo.toml:

[dependencies]
sqlitegraph = { version = "0.2.1", features = ["native-v2"] }

use sqlitegraph::{GraphConfig, open_graph, NodeSpec, EdgeSpec};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Use Native V2 backend
    let cfg = GraphConfig::native();
    let temp_dir = tempfile::tempdir()?;
    let db_path = temp_dir.path().join("graph.db");

    let graph = open_graph(&db_path, &cfg)?;

    // Insert nodes
    let node_spec = NodeSpec {
        kind: "User".to_string(),
        name: "Alice".to_string(),
        file_path: None,
        data: serde_json::json!({"age": 30}),
    };
    let user_id = graph.insert_node(node_spec)?;

    // Insert edges
    let edge_spec = EdgeSpec {
        from: user_id,
        to: user_id, // self-loop for demo
        edge_type: "self_ref".to_string(),
        data: serde_json::json!({"type": "demo"}),
    };
    let edge_id = graph.insert_edge(edge_spec)?;

    println!("Native V2: Node {}, Edge {}", user_id, edge_id);
    Ok(())
}

Testing

# Run all tests
cargo test

# Test specific backend
cargo test --features native-v2

# Run benchmarks
cargo bench

# Run working examples
cargo run --example basic_functionality_test
cargo run --example native_v2_test --features native-v2

Current Capabilities

What Works Today

Core Operations:

Entity CRUD operations with JSON metadata
Edge creation and management
In-memory and persistent storage
Both backends fully functional

Performance:

Native V2: 50K-100K operations/second (benchmarked)
SQLite: Standard SQLite performance with optimizations
Deterministic behavior across platforms

Data Integrity:

ACID transactions (SQLite backend)
Corruption prevention in V2 backend
Comprehensive safety checks
Benchmark regression gates

Current Limitations

Scope:

Focused on embedded use cases (not distributed)
Single-machine graph processing
No built-in clustering or replication

API Surface:

Concentrated on graph operations, limited advanced analytics
No built-in machine learning or advanced analytics
Limited visualization capabilities

Performance Characteristics:

Native V2 optimized for read-heavy workloads
Write performance varies by workload pattern
Large graphs (>1M edges) may need tuning

Documentation

Manual - Detailed operator guide
API Documentation - Complete API reference
Examples - Working code examples
CHANGELOG - Version history and changes

License

GPL-3.0-only - see LICENSE for details.

Development Notes

V2 Architecture Status

V2 Native Backend Status

All V1 legacy code removed
Clustered adjacency storage implemented
Corruption prevention active
Comprehensive test coverage
Experimental high-performance features

Performance Benchmarks

Current performance characteristics (Native V2):

Node insertion: ~50K ops/sec
Edge insertion: ~100K ops/sec
Traversal: Varies by graph structure
Memory usage: Optimized with configurable buffers

Known Limitations

Compilation Warnings: ~50 warnings (non-critical, mostly unused code paths)
Single Machine: No built-in distributed capabilities
Memory Usage: Large graphs may require buffer tuning
Documentation: API evolving as new features added

sqlitegraph 0.2.10