sqlite-knowledge-graph 0.11.1

A Rust library for building and querying knowledge graphs using SQLite as the backend, with graph algorithms, vector search, and RAG support
Documentation

SQLite Knowledge Graph

A Rust library for building and querying knowledge graphs using SQLite as the backend, with graph algorithms and RAG support.

Features

Core Features

  • Entity Management: Create, read, update, and delete typed entities with JSON properties
  • Relation Storage: Define weighted relations between entities with graph traversal support
  • Vector Search: Store embeddings and perform semantic search using cosine similarity
  • Transaction Support: Batch operations with ACID guarantees
  • SQLite Native: Full SQLite compatibility with bundling for portability

Graph Algorithms ✅

  • Path-finding: BFS, DFS, Shortest Path algorithms
  • Centrality: PageRank algorithm for importance ranking
  • Community Detection: Louvain algorithm for graph clustering
  • Connectivity: Connected components (weak and strong)

RAG Integration ✅

  • Two-Stage Retrieval: TurboQuant ANN (Stage 1) → exact cosine rerank (Stage 2) — MemRL
  • Graph Expansion: BFS-based candidate expansion via graph neighbours — RAPO
  • Context Sizing: Pool-prioritised BFS context collection per result — Memex(RL)
  • Quality Filtering: Configurable score thresholds — SuperLocalMemory
  • Pluggable Embedder: Embedder trait with SubprocessEmbedder (Python line protocol) built-in

SQLite Extension ✅

  • Loadable Extension: Use as SQLite extension (.dylib/.so)
  • SQL Functions: Graph algorithms exposed as SQL functions
    • kg_version() - Extension version
    • kg_stats() - Graph statistics
    • kg_pagerank(damping, max_iterations, tolerance) - PageRank algorithm
    • kg_louvain() - Community detection
    • kg_bfs(start_id, max_depth) - BFS traversal
    • kg_shortest_path(from_id, to_id, max_depth) - Shortest path
    • kg_connected_components() - Connected components
  • CLI Tool: Command-line interface for common operations

Installation

Note: This crate is not yet published to crates.io. Use git dependency or local path for now.

Add this to your Cargo.toml:

[dependencies]
sqlite-knowledge-graph = { git = "https://github.com/hiyenwong/sqlite-knowledge-graph" }

Or for local development:

[dependencies]
sqlite-knowledge-graph = { path = "../sqlite-knowledge-graph" }

Semantic Search Dependencies

Semantic search requires vector embeddings generated by sentence-transformers. Install with:

pip install sentence-transformers

Default model: all-MiniLM-L6-v2 (384 dimensions, fast and accurate).

To generate embeddings for your knowledge graph:

sqlite-kg embed --model all-MiniLM-L6-v2 --db knowledge.db

Building SQLite Extension

cd sqlite-knowledge-graph
cargo build --release

# Extension will be at:
# target/release/libsqlite_knowledge_graph.dylib (macOS)
# target/release/libsqlite_knowledge_graph.so (Linux)

Quick Start

use sqlite_knowledge_graph::{KnowledgeGraph, Entity, Relation, PageRankConfig};

// Open or create a knowledge graph
let kg = KnowledgeGraph::open("knowledge.db")?;

// Create an entity with properties
let mut entity = Entity::new("paper", "Deep Learning Advances");
entity.set_property("author", serde_json::json!("Alice"));
entity.set_property("year", serde_json::json!(2024));
let paper_id = kg.insert_entity(&entity)?;

// Create a relation
let relation = Relation::new(paper_id, other_id, "cites", 0.8)?;
kg.insert_relation(&relation)?;

// Graph traversal (BFS/DFS)
let neighbors = kg.get_neighbors(paper_id, 2)?;

// Shortest path between entities
let path = kg.kg_shortest_path(from_id, to_id, 5)?;

// PageRank centrality
let pagerank = kg.kg_pagerank(None)?;

// Louvain community detection
let communities = kg.kg_louvain()?;

// Connected components
let components = kg.kg_connected_components()?;

// Vector search for similar entities
let embedding = vec![0.1, 0.2, 0.3, ...];
kg.insert_vector(paper_id, embedding)?;
let results = kg.search_vectors(query_embedding, 10)?;

API Overview

KnowledgeGraph

The main entry point for the library.

impl KnowledgeGraph {
    // Connection
    pub fn open<P: AsRef<Path>>(path: P) -> Result<Self>
    pub fn open_in_memory() -> Result<Self>

    // Entity operations
    pub fn insert_entity(&self, entity: &Entity) -> Result<i64>
    pub fn get_entity(&self, id: i64) -> Result<Entity>
    pub fn list_entities(&self, entity_type: Option<&str>, limit: Option<i64>) -> Result<Vec<Entity>>
    pub fn update_entity(&self, entity: &Entity) -> Result<()>
    pub fn delete_entity(&self, id: i64) -> Result<()>

    // Relation operations
    pub fn insert_relation(&self, relation: &Relation) -> Result<i64>
    pub fn get_neighbors(&self, entity_id: i64, depth: u32) -> Result<Vec<Neighbor>>

    // Graph traversal
    pub fn kg_bfs_traversal(&self, start_id: i64, direction: Direction, max_depth: u32) -> Result<Vec<TraversalNode>>
    pub fn kg_dfs_traversal(&self, start_id: i64, direction: Direction, max_depth: u32) -> Result<Vec<TraversalNode>>
    pub fn kg_shortest_path(&self, from_id: i64, to_id: i64, max_depth: u32) -> Result<Option<TraversalPath>>
    pub fn kg_graph_stats(&self) -> Result<GraphStats>

    // Graph algorithms
    pub fn kg_pagerank(&self, config: Option<PageRankConfig>) -> Result<Vec<(i64, f64)>>
    pub fn kg_louvain(&self) -> Result<CommunityResult>
    pub fn kg_connected_components(&self) -> Result<Vec<Vec<i64>>>
    pub fn kg_analyze(&self) -> Result<GraphAnalysis>

    // Vector operations
    pub fn insert_vector(&self, entity_id: i64, vector: Vec<f32>) -> Result<()>
    pub fn search_vectors(&self, query: Vec<f32>, k: usize) -> Result<Vec<SearchResult>>

    // Legacy RAG helpers (simple, no graph expansion)
    pub fn kg_semantic_search(&self, query_embedding: Vec<f32>, k: usize) -> Result<Vec<SearchResultWithEntity>>
    pub fn kg_get_context(&self, entity_id: i64, depth: u32) -> Result<GraphContext>
    pub fn kg_hybrid_search(&self, query_text: &str, query_embedding: Vec<f32>, k: usize) -> Result<Vec<HybridSearchResult>>
}

// Paper-driven two-stage RAG engine (recommended)
impl RagEngine {
    pub fn new(config: RagConfig) -> Self
    pub fn search(&self, conn: &Connection, embedder: &dyn Embedder, query: &str, k: usize) -> Result<Vec<RagResult>>
}

Graph Algorithms

PageRank

use sqlite_knowledge_graph::PageRankConfig;

let config = PageRankConfig {
    damping: 0.85,      // Default: 0.85
    max_iterations: 100, // Default: 100
    tolerance: 1e-6,    // Default: 1e-6
};

let rankings = kg.kg_pagerank(Some(config))?;
for (entity_id, score) in rankings.iter().take(10) {
    println!("Entity {}: score = {:.4}", entity_id, score);
}

Paper-Driven RAG Engine

use sqlite_knowledge_graph::{RagEngine, RagConfig, embedder::SubprocessEmbedder};

// Spin up a Python embedding subprocess (see Installation above)
let embedder = SubprocessEmbedder::new("python3", &["embed_server.py"])?;

let engine = RagEngine::new(RagConfig {
    top_k_candidates: 50,      // Stage-1 ANN breadth (MemRL)
    top_k_rerank: 20,          // Stage-2 exact rerank (MemRL)
    enable_graph_expansion: true, // RAPO graph expansion
    max_context_entities: 5,   // Memex(RL) context limit
    min_combined_score: 0.3,   // SuperLocalMemory quality gate
    ..RagConfig::default()
});

let results = engine.search(kg.connection(), &embedder, "transformer architecture", 5)?;
for r in results {
    println!("{} (v={:.3} g={:.3} c={:.3})",
        r.entity.name, r.vector_score, r.graph_score, r.combined_score);
    println!("  context: {:?}", r.context_entities.iter().map(|e| &e.name).collect::<Vec<_>>());
}

Louvain Community Detection

let result = kg.kg_louvain()?;
println!("Found {} communities", result.num_communities);
println!("Modularity: {:.4}", result.modularity);

for (entity_id, community_id) in result.memberships {
    println!("Entity {} -> Community {}", entity_id, community_id);
}

Connected Components

let components = kg.kg_connected_components()?;
println!("Found {} components", components.len());
println!("Largest component: {} entities", components[0].len());

CLI Tool

# Show statistics
sqlite-kg stats --db knowledge.db

# Search entities
sqlite-kg search --query "neural network" --top-k 10 --db knowledge.db

# Get entity context
sqlite-kg context --id 123 --depth 2 --db knowledge.db

# Migrate data
sqlite-kg migrate --source knowledge.db --target kg.db

SQLite Extension Usage

-- Load extension
SELECT load_extension('./libsqlite_knowledge_graph', 'sqlite3_sqlite_knowledge_graph_init');

-- Get version
SELECT kg_version();
-- Returns: "0.7.0"

-- Get stats
SELECT kg_stats();
-- Returns: JSON with graph statistics

-- PageRank (optional parameters: damping, max_iterations, tolerance)
SELECT kg_pagerank();
SELECT kg_pagerank(0.85);           -- with custom damping
SELECT kg_pagerank(0.85, 100);      -- with custom damping and iterations
SELECT kg_pagerank(0.85, 100, 1e-6); -- full parameters
-- Returns: JSON with algorithm info and note to use Rust API for full results

-- Louvain community detection
SELECT kg_louvain();
-- Returns: JSON with algorithm info

-- BFS traversal (required: start_id, optional: max_depth)
SELECT kg_bfs(1);
SELECT kg_bfs(1, 3);
-- Returns: JSON with algorithm parameters

-- Shortest path (required: from_id, to_id, optional: max_depth)
SELECT kg_shortest_path(1, 5);
SELECT kg_shortest_path(1, 5, 10);
-- Returns: JSON with path parameters

-- Connected components
SELECT kg_connected_components();
-- Returns: JSON with algorithm info

-- Graph search example
WITH neural_papers AS (
    SELECT id, name FROM kg_entities 
    WHERE entity_type = 'paper' 
    AND name LIKE '%neural network%'
)
SELECT e.name, r.rel_type
FROM neural_papers np
JOIN kg_relations r ON r.source_id = np.id
JOIN kg_entities e ON r.target_id = e.id
WHERE e.entity_type = 'skill'
LIMIT 10;

Database Schema

kg_entities

CREATE TABLE kg_entities (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    entity_type TEXT NOT NULL,
    name TEXT NOT NULL,
    properties TEXT,  -- JSON
    created_at INTEGER,
    updated_at INTEGER
);

CREATE INDEX idx_entities_type ON kg_entities(entity_type);
CREATE INDEX idx_entities_name ON kg_entities(name);

kg_relations

CREATE TABLE kg_relations (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    source_id INTEGER NOT NULL,
    target_id INTEGER NOT NULL,
    rel_type TEXT NOT NULL,
    weight REAL DEFAULT 1.0,
    properties TEXT,  -- JSON
    created_at INTEGER,
    FOREIGN KEY (source_id) REFERENCES kg_entities(id) ON DELETE CASCADE,
    FOREIGN KEY (target_id) REFERENCES kg_entities(id) ON DELETE CASCADE
);

CREATE INDEX idx_relations_source ON kg_relations(source_id);
CREATE INDEX idx_relations_target ON kg_relations(target_id);
CREATE INDEX idx_relations_type ON kg_relations(rel_type);

kg_vectors

CREATE TABLE kg_vectors (
    entity_id INTEGER NOT NULL PRIMARY KEY,
    vector BLOB NOT NULL,
    dimension INTEGER NOT NULL,
    created_at INTEGER,
    FOREIGN KEY (entity_id) REFERENCES kg_entities(id) ON DELETE CASCADE
);

kg_hyperedges

CREATE TABLE kg_hyperedges (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    hyperedge_type TEXT NOT NULL,
    entity_ids TEXT NOT NULL,  -- JSON array of entity IDs
    weight REAL DEFAULT 1.0,
    arity INTEGER NOT NULL,    -- Number of entities in hyperedge
    properties TEXT,  -- JSON
    created_at INTEGER,
    updated_at INTEGER
);

CREATE INDEX idx_hyperedges_type ON kg_hyperedges(hyperedge_type);
CREATE INDEX idx_hyperedges_arity ON kg_hyperedges(arity);

kg_hyperedge_entities

CREATE TABLE kg_hyperedge_entities (
    hyperedge_id INTEGER NOT NULL,
    entity_id INTEGER NOT NULL,
    position INTEGER NOT NULL,  -- Position in hyperedge
    PRIMARY KEY (hyperedge_id, entity_id),
    FOREIGN KEY (hyperedge_id) REFERENCES kg_hyperedges(id) ON DELETE CASCADE,
    FOREIGN KEY (entity_id) REFERENCES kg_entities(id) ON DELETE CASCADE
);

CREATE INDEX idx_hyperedge_entities_entity ON kg_hyperedge_entities(entity_id);

Async API

Requires the async feature (opt-in, zero overhead when not enabled):

[dependencies]
sqlite-knowledge-graph = { git = "...", features = ["async"] }
tokio = { version = "1", features = ["rt-multi-thread", "macros"] }

All blocking SQLite operations are dispatched to tokio::task::spawn_blocking, keeping the async executor thread free. The async API mirrors the sync API but takes owned values (required for 'static closures).

use sqlite_knowledge_graph::{AsyncKnowledgeGraph, Entity, Relation};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let kg = std::sync::Arc::new(AsyncKnowledgeGraph::open_in_memory_sync()?);

    // Concurrent inserts
    let handles: Vec<_> = (0..10).map(|i| {
        let kg = std::sync::Arc::clone(&kg);
        tokio::spawn(async move {
            kg.insert_entity(Entity::new("paper", format!("Paper {i}"))).await
        })
    }).collect();

    // CRUD
    let entity = Entity::new("paper", "Async Paper");
    let id = kg.insert_entity(entity).await?;
    let retrieved = kg.get_entity(id).await?;

    // Graph algorithms (CPU-bound, runs off executor)
    let scores = kg.kg_pagerank(None).await?;
    let communities = kg.kg_louvain().await?;

    // Vector search
    let results = kg.kg_semantic_search(query_embedding, 10).await?;

    // Convert existing sync instance
    let sync_kg = KnowledgeGraph::open("my.db")?;
    let async_kg = sync_kg.into_async();

    Ok(())
}

Async embedding generation (non-blocking Python subprocess):

use sqlite_knowledge_graph::AsyncEmbeddingGenerator;

let gen = AsyncEmbeddingGenerator::new();
let embeddings = gen.generate_embeddings(vec!["hello world".into()]).await?;

Note: AsyncKnowledgeGraph serialises all operations through a single Mutex. For read-heavy concurrent workloads, open multiple instances on the same WAL-mode file.

Performance

Benchmarks on a knowledge graph with 2,619 entities and 1.48M relations:

Operation Time
Entity insert < 1ms
Relation insert < 1ms
BFS (depth 3) ~50ms
PageRank ~200ms
Louvain ~500ms
Vector search (k=10) ~10ms

Implementation Status

Feature Status
Entity/Relation CRUD ✅ Complete
Graph Traversal (BFS/DFS) ✅ Complete
Shortest Path ✅ Complete
PageRank ✅ Complete
Louvain Community Detection ✅ Complete
Connected Components ✅ Complete
Vector Storage ✅ Complete
Semantic Search ✅ Complete
RAG Integration ✅ Complete
SQLite Extension ✅ Complete
CLI Tool ✅ Complete
GitHub Actions CI ✅ Complete
More Extension Functions ✅ Complete (v0.7.0)
Vector Indexing (TurboQuant) Complete (v0.8.0)
Higher-order Relations (Hyperedge) Complete (v0.10.0)
Paper-driven RAG Engine Complete (v0.10.1)
Graph Visualization Export (D3/DOT) ✅ Complete
Async API (tokio) Complete (v0.11.0)

Testing

# Run all tests
cargo test

# Run with verbose output
cargo test -- --nocapture

# Run specific test
cargo test test_pagerank

Current test coverage: 122 unit tests + 14 integration tests passing (includes 11 async tests)

Projects Using This Library

  • OpenClaw Knowledge Base: 2,497 papers, 122 skills, 1.48M relations
  • Research Paper Analysis: Graph-based paper discovery

License

MIT License

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

Acknowledgments

Built with:

Changelog

See CHANGELOG.md for version history.

Star History