sqlite-knowledge-graph 0.8.0

A Rust library for building and querying knowledge graphs using SQLite as the backend, with graph algorithms, vector search, and RAG support
Documentation
# SQLite Knowledge Graph

A Rust library for building and querying knowledge graphs using SQLite as the backend, with graph algorithms and RAG support.

## Features

### Core Features
- **Entity Management**: Create, read, update, and delete typed entities with JSON properties
- **Relation Storage**: Define weighted relations between entities with graph traversal support
- **Vector Search**: Store embeddings and perform semantic search using cosine similarity
- **Transaction Support**: Batch operations with ACID guarantees
- **SQLite Native**: Full SQLite compatibility with bundling for portability

### Graph Algorithms ✅
- **Path-finding**: BFS, DFS, Shortest Path algorithms
- **Centrality**: PageRank algorithm for importance ranking
- **Community Detection**: Louvain algorithm for graph clustering
- **Connectivity**: Connected components (weak and strong)

### RAG Integration ✅
- **Semantic Search**: Vector similarity search
- **Context Retrieval**: Multi-hop context extraction
- **Hybrid Search**: Combine keyword and semantic search

### SQLite Extension ✅
- **Loadable Extension**: Use as SQLite extension (.dylib/.so)
- **SQL Functions**: Graph algorithms exposed as SQL functions
  - `kg_version()` - Extension version
  - `kg_stats()` - Graph statistics
  - `kg_pagerank(damping, max_iterations, tolerance)` - PageRank algorithm
  - `kg_louvain()` - Community detection
  - `kg_bfs(start_id, max_depth)` - BFS traversal
  - `kg_shortest_path(from_id, to_id, max_depth)` - Shortest path
  - `kg_connected_components()` - Connected components
- **CLI Tool**: Command-line interface for common operations

## Installation

> **Note**: This crate is not yet published to [crates.io]https://crates.io. Use git dependency or local path for now.

Add this to your `Cargo.toml`:

```toml
[dependencies]
sqlite-knowledge-graph = { git = "https://github.com/hiyenwong/sqlite-knowledge-graph" }
```

Or for local development:

```toml
[dependencies]
sqlite-knowledge-graph = { path = "../sqlite-knowledge-graph" }
```

### Building SQLite Extension

```bash
cd sqlite-knowledge-graph
cargo build --release

# Extension will be at:
# target/release/libsqlite_knowledge_graph.dylib (macOS)
# target/release/libsqlite_knowledge_graph.so (Linux)
```

## Quick Start

```rust
use sqlite_knowledge_graph::{KnowledgeGraph, Entity, Relation, PageRankConfig};

// Open or create a knowledge graph
let kg = KnowledgeGraph::open("knowledge.db")?;

// Create an entity with properties
let mut entity = Entity::new("paper", "Deep Learning Advances");
entity.set_property("author", serde_json::json!("Alice"));
entity.set_property("year", serde_json::json!(2024));
let paper_id = kg.insert_entity(&entity)?;

// Create a relation
let relation = Relation::new(paper_id, other_id, "cites", 0.8)?;
kg.insert_relation(&relation)?;

// Graph traversal (BFS/DFS)
let neighbors = kg.get_neighbors(paper_id, 2)?;

// Shortest path between entities
let path = kg.kg_shortest_path(from_id, to_id, 5)?;

// PageRank centrality
let pagerank = kg.kg_pagerank(None)?;

// Louvain community detection
let communities = kg.kg_louvain()?;

// Connected components
let components = kg.kg_connected_components()?;

// Vector search for similar entities
let embedding = vec![0.1, 0.2, 0.3, ...];
kg.insert_vector(paper_id, embedding)?;
let results = kg.search_vectors(query_embedding, 10)?;
```

## API Overview

### KnowledgeGraph

The main entry point for the library.

```rust
impl KnowledgeGraph {
    // Connection
    pub fn open<P: AsRef<Path>>(path: P) -> Result<Self>
    pub fn open_in_memory() -> Result<Self>

    // Entity operations
    pub fn insert_entity(&self, entity: &Entity) -> Result<i64>
    pub fn get_entity(&self, id: i64) -> Result<Entity>
    pub fn list_entities(&self, entity_type: Option<&str>, limit: Option<i64>) -> Result<Vec<Entity>>
    pub fn update_entity(&self, entity: &Entity) -> Result<()>
    pub fn delete_entity(&self, id: i64) -> Result<()>

    // Relation operations
    pub fn insert_relation(&self, relation: &Relation) -> Result<i64>
    pub fn get_neighbors(&self, entity_id: i64, depth: u32) -> Result<Vec<Neighbor>>

    // Graph traversal
    pub fn kg_bfs_traversal(&self, start_id: i64, direction: Direction, max_depth: u32) -> Result<Vec<TraversalNode>>
    pub fn kg_dfs_traversal(&self, start_id: i64, direction: Direction, max_depth: u32) -> Result<Vec<TraversalNode>>
    pub fn kg_shortest_path(&self, from_id: i64, to_id: i64, max_depth: u32) -> Result<Option<TraversalPath>>
    pub fn kg_graph_stats(&self) -> Result<GraphStats>

    // Graph algorithms
    pub fn kg_pagerank(&self, config: Option<PageRankConfig>) -> Result<Vec<(i64, f64)>>
    pub fn kg_louvain(&self) -> Result<CommunityResult>
    pub fn kg_connected_components(&self) -> Result<Vec<Vec<i64>>>
    pub fn kg_analyze(&self) -> Result<GraphAnalysis>

    // Vector operations
    pub fn insert_vector(&self, entity_id: i64, vector: Vec<f32>) -> Result<()>
    pub fn search_vectors(&self, query: Vec<f32>, k: usize) -> Result<Vec<SearchResult>>

    // RAG functions
    pub fn kg_semantic_search(&self, query_embedding: Vec<f32>, k: usize) -> Result<Vec<SearchResult>>
    pub fn kg_get_context(&self, entity_id: i64, depth: u32) -> Result<EntityContext>
    pub fn kg_hybrid_search(&self, query_text: &str, query_embedding: Vec<f32>, k: usize) -> Result<Vec<HybridSearchResult>>
}
```

## Graph Algorithms

### PageRank

```rust
use sqlite_knowledge_graph::PageRankConfig;

let config = PageRankConfig {
    damping: 0.85,      // Default: 0.85
    max_iterations: 100, // Default: 100
    tolerance: 1e-6,    // Default: 1e-6
};

let rankings = kg.kg_pagerank(Some(config))?;
for (entity_id, score) in rankings.iter().take(10) {
    println!("Entity {}: score = {:.4}", entity_id, score);
}
```

### Louvain Community Detection

```rust
let result = kg.kg_louvain()?;
println!("Found {} communities", result.num_communities);
println!("Modularity: {:.4}", result.modularity);

for (entity_id, community_id) in result.memberships {
    println!("Entity {} -> Community {}", entity_id, community_id);
}
```

### Connected Components

```rust
let components = kg.kg_connected_components()?;
println!("Found {} components", components.len());
println!("Largest component: {} entities", components[0].len());
```

## CLI Tool

```bash
# Show statistics
sqlite-kg stats --db knowledge.db

# Search entities
sqlite-kg search --query "neural network" --top-k 10 --db knowledge.db

# Get entity context
sqlite-kg context --id 123 --depth 2 --db knowledge.db

# Migrate data
sqlite-kg migrate --source knowledge.db --target kg.db
```

## SQLite Extension Usage

```sql
-- Load extension
SELECT load_extension('./libsqlite_knowledge_graph', 'sqlite3_sqlite_knowledge_graph_init');

-- Get version
SELECT kg_version();
-- Returns: "0.7.0"

-- Get stats
SELECT kg_stats();
-- Returns: JSON with graph statistics

-- PageRank (optional parameters: damping, max_iterations, tolerance)
SELECT kg_pagerank();
SELECT kg_pagerank(0.85);           -- with custom damping
SELECT kg_pagerank(0.85, 100);      -- with custom damping and iterations
SELECT kg_pagerank(0.85, 100, 1e-6); -- full parameters
-- Returns: JSON with algorithm info and note to use Rust API for full results

-- Louvain community detection
SELECT kg_louvain();
-- Returns: JSON with algorithm info

-- BFS traversal (required: start_id, optional: max_depth)
SELECT kg_bfs(1);
SELECT kg_bfs(1, 3);
-- Returns: JSON with algorithm parameters

-- Shortest path (required: from_id, to_id, optional: max_depth)
SELECT kg_shortest_path(1, 5);
SELECT kg_shortest_path(1, 5, 10);
-- Returns: JSON with path parameters

-- Connected components
SELECT kg_connected_components();
-- Returns: JSON with algorithm info

-- Graph search example
WITH neural_papers AS (
    SELECT id, name FROM kg_entities 
    WHERE entity_type = 'paper' 
    AND name LIKE '%neural network%'
)
SELECT e.name, r.rel_type
FROM neural_papers np
JOIN kg_relations r ON r.source_id = np.id
JOIN kg_entities e ON r.target_id = e.id
WHERE e.entity_type = 'skill'
LIMIT 10;
```

## Database Schema

### kg_entities

```sql
CREATE TABLE kg_entities (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    entity_type TEXT NOT NULL,
    name TEXT NOT NULL,
    properties TEXT,  -- JSON
    created_at INTEGER,
    updated_at INTEGER
);

CREATE INDEX idx_entities_type ON kg_entities(entity_type);
CREATE INDEX idx_entities_name ON kg_entities(name);
```

### kg_relations

```sql
CREATE TABLE kg_relations (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    source_id INTEGER NOT NULL,
    target_id INTEGER NOT NULL,
    rel_type TEXT NOT NULL,
    weight REAL DEFAULT 1.0,
    properties TEXT,  -- JSON
    created_at INTEGER,
    FOREIGN KEY (source_id) REFERENCES kg_entities(id) ON DELETE CASCADE,
    FOREIGN KEY (target_id) REFERENCES kg_entities(id) ON DELETE CASCADE
);

CREATE INDEX idx_relations_source ON kg_relations(source_id);
CREATE INDEX idx_relations_target ON kg_relations(target_id);
CREATE INDEX idx_relations_type ON kg_relations(rel_type);
```

### kg_vectors

```sql
CREATE TABLE kg_vectors (
    entity_id INTEGER NOT NULL PRIMARY KEY,
    vector BLOB NOT NULL,
    dimension INTEGER NOT NULL,
    created_at INTEGER,
    FOREIGN KEY (entity_id) REFERENCES kg_entities(id) ON DELETE CASCADE
);
```

## Performance

Benchmarks on a knowledge graph with 2,619 entities and 1.48M relations:

| Operation | Time |
|-----------|------|
| Entity insert | < 1ms |
| Relation insert | < 1ms |
| BFS (depth 3) | ~50ms |
| PageRank | ~200ms |
| Louvain | ~500ms |
| Vector search (k=10) | ~10ms |

## Implementation Status

| Feature | Status |
|---------|--------|
| Entity/Relation CRUD | ✅ Complete |
| Graph Traversal (BFS/DFS) | ✅ Complete |
| Shortest Path | ✅ Complete |
| PageRank | ✅ Complete |
| Louvain Community Detection | ✅ Complete |
| Connected Components | ✅ Complete |
| Vector Storage | ✅ Complete |
| Semantic Search | ✅ Complete |
| RAG Integration | ✅ Complete |
| SQLite Extension | ✅ Complete |
| CLI Tool | ✅ Complete |
| GitHub Actions CI | ✅ Complete |
| More Extension Functions | ✅ Complete (v0.7.0) |
| **Vector Indexing (TurboQuant)** |**Complete (v0.8.0)** |
| Higher-order Relations | ⏳ Planned |
| Graph Visualization Export | ⏳ Planned |
| Async API | ⏳ Planned |

## Testing

```bash
# Run all tests
cargo test

# Run with verbose output
cargo test -- --nocapture

# Run specific test
cargo test test_pagerank
```

Current test coverage: **38 tests passing**

## Projects Using This Library

- **OpenClaw Knowledge Base**: 2,497 papers, 122 skills, 1.48M relations
- **Research Paper Analysis**: Graph-based paper discovery

## License

MIT License

## Contributing

Contributions are welcome! Please open an issue or submit a pull request.

## Acknowledgments

Built with:
- [rusqlite]https://github.com/rusqlite/rusqlite - SQLite bindings
- [sqlite-loadable]https://github.com/nickolay/nickolay.github.io/tree/main/sqlite-loadable-rs - SQLite extension support
- [serde]https://serde.rs/ - Serialization framework
- [thiserror]https://docs.rs/thiserror/ - Error handling

## Changelog

See [CHANGELOG.md](./CHANGELOG.md) for version history.