sqlite-knowledge-graph 0.8.0

A Rust library for building and querying knowledge graphs using SQLite as the backend, with graph algorithms, vector search, and RAG support
Documentation

SQLite Knowledge Graph

A Rust library for building and querying knowledge graphs using SQLite as the backend, with graph algorithms and RAG support.

Features

Core Features

  • Entity Management: Create, read, update, and delete typed entities with JSON properties
  • Relation Storage: Define weighted relations between entities with graph traversal support
  • Vector Search: Store embeddings and perform semantic search using cosine similarity
  • Transaction Support: Batch operations with ACID guarantees
  • SQLite Native: Full SQLite compatibility with bundling for portability

Graph Algorithms ✅

  • Path-finding: BFS, DFS, Shortest Path algorithms
  • Centrality: PageRank algorithm for importance ranking
  • Community Detection: Louvain algorithm for graph clustering
  • Connectivity: Connected components (weak and strong)

RAG Integration ✅

  • Semantic Search: Vector similarity search
  • Context Retrieval: Multi-hop context extraction
  • Hybrid Search: Combine keyword and semantic search

SQLite Extension ✅

  • Loadable Extension: Use as SQLite extension (.dylib/.so)
  • SQL Functions: Graph algorithms exposed as SQL functions
    • kg_version() - Extension version
    • kg_stats() - Graph statistics
    • kg_pagerank(damping, max_iterations, tolerance) - PageRank algorithm
    • kg_louvain() - Community detection
    • kg_bfs(start_id, max_depth) - BFS traversal
    • kg_shortest_path(from_id, to_id, max_depth) - Shortest path
    • kg_connected_components() - Connected components
  • CLI Tool: Command-line interface for common operations

Installation

Note: This crate is not yet published to crates.io. Use git dependency or local path for now.

Add this to your Cargo.toml:

[dependencies]
sqlite-knowledge-graph = { git = "https://github.com/hiyenwong/sqlite-knowledge-graph" }

Or for local development:

[dependencies]
sqlite-knowledge-graph = { path = "../sqlite-knowledge-graph" }

Building SQLite Extension

cd sqlite-knowledge-graph
cargo build --release

# Extension will be at:
# target/release/libsqlite_knowledge_graph.dylib (macOS)
# target/release/libsqlite_knowledge_graph.so (Linux)

Quick Start

use sqlite_knowledge_graph::{KnowledgeGraph, Entity, Relation, PageRankConfig};

// Open or create a knowledge graph
let kg = KnowledgeGraph::open("knowledge.db")?;

// Create an entity with properties
let mut entity = Entity::new("paper", "Deep Learning Advances");
entity.set_property("author", serde_json::json!("Alice"));
entity.set_property("year", serde_json::json!(2024));
let paper_id = kg.insert_entity(&entity)?;

// Create a relation
let relation = Relation::new(paper_id, other_id, "cites", 0.8)?;
kg.insert_relation(&relation)?;

// Graph traversal (BFS/DFS)
let neighbors = kg.get_neighbors(paper_id, 2)?;

// Shortest path between entities
let path = kg.kg_shortest_path(from_id, to_id, 5)?;

// PageRank centrality
let pagerank = kg.kg_pagerank(None)?;

// Louvain community detection
let communities = kg.kg_louvain()?;

// Connected components
let components = kg.kg_connected_components()?;

// Vector search for similar entities
let embedding = vec![0.1, 0.2, 0.3, ...];
kg.insert_vector(paper_id, embedding)?;
let results = kg.search_vectors(query_embedding, 10)?;

API Overview

KnowledgeGraph

The main entry point for the library.

impl KnowledgeGraph {
    // Connection
    pub fn open<P: AsRef<Path>>(path: P) -> Result<Self>
    pub fn open_in_memory() -> Result<Self>

    // Entity operations
    pub fn insert_entity(&self, entity: &Entity) -> Result<i64>
    pub fn get_entity(&self, id: i64) -> Result<Entity>
    pub fn list_entities(&self, entity_type: Option<&str>, limit: Option<i64>) -> Result<Vec<Entity>>
    pub fn update_entity(&self, entity: &Entity) -> Result<()>
    pub fn delete_entity(&self, id: i64) -> Result<()>

    // Relation operations
    pub fn insert_relation(&self, relation: &Relation) -> Result<i64>
    pub fn get_neighbors(&self, entity_id: i64, depth: u32) -> Result<Vec<Neighbor>>

    // Graph traversal
    pub fn kg_bfs_traversal(&self, start_id: i64, direction: Direction, max_depth: u32) -> Result<Vec<TraversalNode>>
    pub fn kg_dfs_traversal(&self, start_id: i64, direction: Direction, max_depth: u32) -> Result<Vec<TraversalNode>>
    pub fn kg_shortest_path(&self, from_id: i64, to_id: i64, max_depth: u32) -> Result<Option<TraversalPath>>
    pub fn kg_graph_stats(&self) -> Result<GraphStats>

    // Graph algorithms
    pub fn kg_pagerank(&self, config: Option<PageRankConfig>) -> Result<Vec<(i64, f64)>>
    pub fn kg_louvain(&self) -> Result<CommunityResult>
    pub fn kg_connected_components(&self) -> Result<Vec<Vec<i64>>>
    pub fn kg_analyze(&self) -> Result<GraphAnalysis>

    // Vector operations
    pub fn insert_vector(&self, entity_id: i64, vector: Vec<f32>) -> Result<()>
    pub fn search_vectors(&self, query: Vec<f32>, k: usize) -> Result<Vec<SearchResult>>

    // RAG functions
    pub fn kg_semantic_search(&self, query_embedding: Vec<f32>, k: usize) -> Result<Vec<SearchResult>>
    pub fn kg_get_context(&self, entity_id: i64, depth: u32) -> Result<EntityContext>
    pub fn kg_hybrid_search(&self, query_text: &str, query_embedding: Vec<f32>, k: usize) -> Result<Vec<HybridSearchResult>>
}

Graph Algorithms

PageRank

use sqlite_knowledge_graph::PageRankConfig;

let config = PageRankConfig {
    damping: 0.85,      // Default: 0.85
    max_iterations: 100, // Default: 100
    tolerance: 1e-6,    // Default: 1e-6
};

let rankings = kg.kg_pagerank(Some(config))?;
for (entity_id, score) in rankings.iter().take(10) {
    println!("Entity {}: score = {:.4}", entity_id, score);
}

Louvain Community Detection

let result = kg.kg_louvain()?;
println!("Found {} communities", result.num_communities);
println!("Modularity: {:.4}", result.modularity);

for (entity_id, community_id) in result.memberships {
    println!("Entity {} -> Community {}", entity_id, community_id);
}

Connected Components

let components = kg.kg_connected_components()?;
println!("Found {} components", components.len());
println!("Largest component: {} entities", components[0].len());

CLI Tool

# Show statistics
sqlite-kg stats --db knowledge.db

# Search entities
sqlite-kg search --query "neural network" --top-k 10 --db knowledge.db

# Get entity context
sqlite-kg context --id 123 --depth 2 --db knowledge.db

# Migrate data
sqlite-kg migrate --source knowledge.db --target kg.db

SQLite Extension Usage

-- Load extension
SELECT load_extension('./libsqlite_knowledge_graph', 'sqlite3_sqlite_knowledge_graph_init');

-- Get version
SELECT kg_version();
-- Returns: "0.7.0"

-- Get stats
SELECT kg_stats();
-- Returns: JSON with graph statistics

-- PageRank (optional parameters: damping, max_iterations, tolerance)
SELECT kg_pagerank();
SELECT kg_pagerank(0.85);           -- with custom damping
SELECT kg_pagerank(0.85, 100);      -- with custom damping and iterations
SELECT kg_pagerank(0.85, 100, 1e-6); -- full parameters
-- Returns: JSON with algorithm info and note to use Rust API for full results

-- Louvain community detection
SELECT kg_louvain();
-- Returns: JSON with algorithm info

-- BFS traversal (required: start_id, optional: max_depth)
SELECT kg_bfs(1);
SELECT kg_bfs(1, 3);
-- Returns: JSON with algorithm parameters

-- Shortest path (required: from_id, to_id, optional: max_depth)
SELECT kg_shortest_path(1, 5);
SELECT kg_shortest_path(1, 5, 10);
-- Returns: JSON with path parameters

-- Connected components
SELECT kg_connected_components();
-- Returns: JSON with algorithm info

-- Graph search example
WITH neural_papers AS (
    SELECT id, name FROM kg_entities 
    WHERE entity_type = 'paper' 
    AND name LIKE '%neural network%'
)
SELECT e.name, r.rel_type
FROM neural_papers np
JOIN kg_relations r ON r.source_id = np.id
JOIN kg_entities e ON r.target_id = e.id
WHERE e.entity_type = 'skill'
LIMIT 10;

Database Schema

kg_entities

CREATE TABLE kg_entities (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    entity_type TEXT NOT NULL,
    name TEXT NOT NULL,
    properties TEXT,  -- JSON
    created_at INTEGER,
    updated_at INTEGER
);

CREATE INDEX idx_entities_type ON kg_entities(entity_type);
CREATE INDEX idx_entities_name ON kg_entities(name);

kg_relations

CREATE TABLE kg_relations (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    source_id INTEGER NOT NULL,
    target_id INTEGER NOT NULL,
    rel_type TEXT NOT NULL,
    weight REAL DEFAULT 1.0,
    properties TEXT,  -- JSON
    created_at INTEGER,
    FOREIGN KEY (source_id) REFERENCES kg_entities(id) ON DELETE CASCADE,
    FOREIGN KEY (target_id) REFERENCES kg_entities(id) ON DELETE CASCADE
);

CREATE INDEX idx_relations_source ON kg_relations(source_id);
CREATE INDEX idx_relations_target ON kg_relations(target_id);
CREATE INDEX idx_relations_type ON kg_relations(rel_type);

kg_vectors

CREATE TABLE kg_vectors (
    entity_id INTEGER NOT NULL PRIMARY KEY,
    vector BLOB NOT NULL,
    dimension INTEGER NOT NULL,
    created_at INTEGER,
    FOREIGN KEY (entity_id) REFERENCES kg_entities(id) ON DELETE CASCADE
);

Performance

Benchmarks on a knowledge graph with 2,619 entities and 1.48M relations:

Operation Time
Entity insert < 1ms
Relation insert < 1ms
BFS (depth 3) ~50ms
PageRank ~200ms
Louvain ~500ms
Vector search (k=10) ~10ms

Implementation Status

Feature Status
Entity/Relation CRUD ✅ Complete
Graph Traversal (BFS/DFS) ✅ Complete
Shortest Path ✅ Complete
PageRank ✅ Complete
Louvain Community Detection ✅ Complete
Connected Components ✅ Complete
Vector Storage ✅ Complete
Semantic Search ✅ Complete
RAG Integration ✅ Complete
SQLite Extension ✅ Complete
CLI Tool ✅ Complete
GitHub Actions CI ✅ Complete
More Extension Functions ✅ Complete (v0.7.0)
Vector Indexing (TurboQuant) Complete (v0.8.0)
Higher-order Relations ⏳ Planned
Graph Visualization Export ⏳ Planned
Async API ⏳ Planned

Testing

# Run all tests
cargo test

# Run with verbose output
cargo test -- --nocapture

# Run specific test
cargo test test_pagerank

Current test coverage: 38 tests passing

Projects Using This Library

  • OpenClaw Knowledge Base: 2,497 papers, 122 skills, 1.48M relations
  • Research Paper Analysis: Graph-based paper discovery

License

MIT License

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

Acknowledgments

Built with:

Changelog

See CHANGELOG.md for version history.