# SQLite Knowledge Graph
A Rust library for building and querying knowledge graphs using SQLite as the backend, with graph algorithms and RAG support.
## Features
### Core Features
- **Entity Management**: Create, read, update, and delete typed entities with JSON properties
- **Relation Storage**: Define weighted relations between entities with graph traversal support
- **Vector Search**: Store embeddings and perform semantic search using cosine similarity
- **Transaction Support**: Batch operations with ACID guarantees
- **SQLite Native**: Full SQLite compatibility with bundling for portability
### Graph Algorithms ✅
- **Path-finding**: BFS, DFS, Shortest Path algorithms
- **Centrality**: PageRank algorithm for importance ranking
- **Community Detection**: Louvain algorithm for graph clustering
- **Connectivity**: Connected components (weak and strong)
### RAG Integration ✅
- **Semantic Search**: Vector similarity search
- **Context Retrieval**: Multi-hop context extraction
- **Hybrid Search**: Combine keyword and semantic search
### SQLite Extension ✅
- **Loadable Extension**: Use as SQLite extension (.dylib/.so)
- **SQL Functions**: Graph algorithms exposed as SQL functions
- `kg_version()` - Extension version
- `kg_stats()` - Graph statistics
- `kg_pagerank(damping, max_iterations, tolerance)` - PageRank algorithm
- `kg_louvain()` - Community detection
- `kg_bfs(start_id, max_depth)` - BFS traversal
- `kg_shortest_path(from_id, to_id, max_depth)` - Shortest path
- `kg_connected_components()` - Connected components
- **CLI Tool**: Command-line interface for common operations
## Installation
> **Note**: This crate is not yet published to [crates.io](https://crates.io). Use git dependency or local path for now.
Add this to your `Cargo.toml`:
```toml
[dependencies]
sqlite-knowledge-graph = { git = "https://github.com/hiyenwong/sqlite-knowledge-graph" }
```
Or for local development:
```toml
[dependencies]
sqlite-knowledge-graph = { path = "../sqlite-knowledge-graph" }
```
### Semantic Search Dependencies
Semantic search requires vector embeddings generated by [sentence-transformers](https://www.sbert.net/). Install with:
```bash
pip install sentence-transformers
```
Default model: `all-MiniLM-L6-v2` (384 dimensions, fast and accurate).
To generate embeddings for your knowledge graph:
```bash
sqlite-kg embed --model all-MiniLM-L6-v2 --db knowledge.db
```
### Building SQLite Extension
```bash
cd sqlite-knowledge-graph
cargo build --release
# Extension will be at:
# target/release/libsqlite_knowledge_graph.dylib (macOS)
# target/release/libsqlite_knowledge_graph.so (Linux)
```
## Quick Start
```rust
use sqlite_knowledge_graph::{KnowledgeGraph, Entity, Relation, PageRankConfig};
// Open or create a knowledge graph
let kg = KnowledgeGraph::open("knowledge.db")?;
// Create an entity with properties
let mut entity = Entity::new("paper", "Deep Learning Advances");
entity.set_property("author", serde_json::json!("Alice"));
entity.set_property("year", serde_json::json!(2024));
let paper_id = kg.insert_entity(&entity)?;
// Create a relation
let relation = Relation::new(paper_id, other_id, "cites", 0.8)?;
kg.insert_relation(&relation)?;
// Graph traversal (BFS/DFS)
let neighbors = kg.get_neighbors(paper_id, 2)?;
// Shortest path between entities
let path = kg.kg_shortest_path(from_id, to_id, 5)?;
// PageRank centrality
let pagerank = kg.kg_pagerank(None)?;
// Louvain community detection
let communities = kg.kg_louvain()?;
// Connected components
let components = kg.kg_connected_components()?;
// Vector search for similar entities
let embedding = vec![0.1, 0.2, 0.3, ...];
kg.insert_vector(paper_id, embedding)?;
let results = kg.search_vectors(query_embedding, 10)?;
```
## API Overview
### KnowledgeGraph
The main entry point for the library.
```rust
impl KnowledgeGraph {
// Connection
pub fn open<P: AsRef<Path>>(path: P) -> Result<Self>
pub fn open_in_memory() -> Result<Self>
// Entity operations
pub fn insert_entity(&self, entity: &Entity) -> Result<i64>
pub fn get_entity(&self, id: i64) -> Result<Entity>
pub fn list_entities(&self, entity_type: Option<&str>, limit: Option<i64>) -> Result<Vec<Entity>>
pub fn update_entity(&self, entity: &Entity) -> Result<()>
pub fn delete_entity(&self, id: i64) -> Result<()>
// Relation operations
pub fn insert_relation(&self, relation: &Relation) -> Result<i64>
pub fn get_neighbors(&self, entity_id: i64, depth: u32) -> Result<Vec<Neighbor>>
// Graph traversal
pub fn kg_bfs_traversal(&self, start_id: i64, direction: Direction, max_depth: u32) -> Result<Vec<TraversalNode>>
pub fn kg_dfs_traversal(&self, start_id: i64, direction: Direction, max_depth: u32) -> Result<Vec<TraversalNode>>
pub fn kg_shortest_path(&self, from_id: i64, to_id: i64, max_depth: u32) -> Result<Option<TraversalPath>>
pub fn kg_graph_stats(&self) -> Result<GraphStats>
// Graph algorithms
pub fn kg_pagerank(&self, config: Option<PageRankConfig>) -> Result<Vec<(i64, f64)>>
pub fn kg_louvain(&self) -> Result<CommunityResult>
pub fn kg_connected_components(&self) -> Result<Vec<Vec<i64>>>
pub fn kg_analyze(&self) -> Result<GraphAnalysis>
// Vector operations
pub fn insert_vector(&self, entity_id: i64, vector: Vec<f32>) -> Result<()>
pub fn search_vectors(&self, query: Vec<f32>, k: usize) -> Result<Vec<SearchResult>>
// RAG functions
pub fn kg_semantic_search(&self, query_embedding: Vec<f32>, k: usize) -> Result<Vec<SearchResult>>
pub fn kg_get_context(&self, entity_id: i64, depth: u32) -> Result<EntityContext>
pub fn kg_hybrid_search(&self, query_text: &str, query_embedding: Vec<f32>, k: usize) -> Result<Vec<HybridSearchResult>>
}
```
## Graph Algorithms
### PageRank
```rust
use sqlite_knowledge_graph::PageRankConfig;
let config = PageRankConfig {
damping: 0.85, // Default: 0.85
max_iterations: 100, // Default: 100
tolerance: 1e-6, // Default: 1e-6
};
let rankings = kg.kg_pagerank(Some(config))?;
for (entity_id, score) in rankings.iter().take(10) {
println!("Entity {}: score = {:.4}", entity_id, score);
}
```
### Louvain Community Detection
```rust
let result = kg.kg_louvain()?;
println!("Found {} communities", result.num_communities);
println!("Modularity: {:.4}", result.modularity);
for (entity_id, community_id) in result.memberships {
println!("Entity {} -> Community {}", entity_id, community_id);
}
```
### Connected Components
```rust
let components = kg.kg_connected_components()?;
println!("Found {} components", components.len());
println!("Largest component: {} entities", components[0].len());
```
## CLI Tool
```bash
# Show statistics
sqlite-kg stats --db knowledge.db
# Search entities
sqlite-kg search --query "neural network" --top-k 10 --db knowledge.db
# Get entity context
sqlite-kg context --id 123 --depth 2 --db knowledge.db
# Migrate data
sqlite-kg migrate --source knowledge.db --target kg.db
```
## SQLite Extension Usage
```sql
-- Load extension
SELECT load_extension('./libsqlite_knowledge_graph', 'sqlite3_sqlite_knowledge_graph_init');
-- Get version
SELECT kg_version();
-- Returns: "0.7.0"
-- Get stats
SELECT kg_stats();
-- Returns: JSON with graph statistics
-- PageRank (optional parameters: damping, max_iterations, tolerance)
SELECT kg_pagerank();
SELECT kg_pagerank(0.85); -- with custom damping
SELECT kg_pagerank(0.85, 100); -- with custom damping and iterations
SELECT kg_pagerank(0.85, 100, 1e-6); -- full parameters
-- Returns: JSON with algorithm info and note to use Rust API for full results
-- Louvain community detection
SELECT kg_louvain();
-- Returns: JSON with algorithm info
-- BFS traversal (required: start_id, optional: max_depth)
SELECT kg_bfs(1);
SELECT kg_bfs(1, 3);
-- Returns: JSON with algorithm parameters
-- Shortest path (required: from_id, to_id, optional: max_depth)
SELECT kg_shortest_path(1, 5);
SELECT kg_shortest_path(1, 5, 10);
-- Returns: JSON with path parameters
-- Connected components
SELECT kg_connected_components();
-- Returns: JSON with algorithm info
-- Graph search example
WITH neural_papers AS (
SELECT id, name FROM kg_entities
WHERE entity_type = 'paper'
AND name LIKE '%neural network%'
)
SELECT e.name, r.rel_type
FROM neural_papers np
JOIN kg_relations r ON r.source_id = np.id
JOIN kg_entities e ON r.target_id = e.id
WHERE e.entity_type = 'skill'
LIMIT 10;
```
## Database Schema
### kg_entities
```sql
CREATE TABLE kg_entities (
id INTEGER PRIMARY KEY AUTOINCREMENT,
entity_type TEXT NOT NULL,
name TEXT NOT NULL,
properties TEXT, -- JSON
created_at INTEGER,
updated_at INTEGER
);
CREATE INDEX idx_entities_type ON kg_entities(entity_type);
CREATE INDEX idx_entities_name ON kg_entities(name);
```
### kg_relations
```sql
CREATE TABLE kg_relations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
source_id INTEGER NOT NULL,
target_id INTEGER NOT NULL,
rel_type TEXT NOT NULL,
weight REAL DEFAULT 1.0,
properties TEXT, -- JSON
created_at INTEGER,
FOREIGN KEY (source_id) REFERENCES kg_entities(id) ON DELETE CASCADE,
FOREIGN KEY (target_id) REFERENCES kg_entities(id) ON DELETE CASCADE
);
CREATE INDEX idx_relations_source ON kg_relations(source_id);
CREATE INDEX idx_relations_target ON kg_relations(target_id);
CREATE INDEX idx_relations_type ON kg_relations(rel_type);
```
### kg_vectors
```sql
CREATE TABLE kg_vectors (
entity_id INTEGER NOT NULL PRIMARY KEY,
vector BLOB NOT NULL,
dimension INTEGER NOT NULL,
created_at INTEGER,
FOREIGN KEY (entity_id) REFERENCES kg_entities(id) ON DELETE CASCADE
);
```
### kg_hyperedges
```sql
CREATE TABLE kg_hyperedges (
id INTEGER PRIMARY KEY AUTOINCREMENT,
hyperedge_type TEXT NOT NULL,
entity_ids TEXT NOT NULL, -- JSON array of entity IDs
weight REAL DEFAULT 1.0,
arity INTEGER NOT NULL, -- Number of entities in hyperedge
properties TEXT, -- JSON
created_at INTEGER,
updated_at INTEGER
);
CREATE INDEX idx_hyperedges_type ON kg_hyperedges(hyperedge_type);
CREATE INDEX idx_hyperedges_arity ON kg_hyperedges(arity);
```
### kg_hyperedge_entities
```sql
CREATE TABLE kg_hyperedge_entities (
hyperedge_id INTEGER NOT NULL,
entity_id INTEGER NOT NULL,
position INTEGER NOT NULL, -- Position in hyperedge
PRIMARY KEY (hyperedge_id, entity_id),
FOREIGN KEY (hyperedge_id) REFERENCES kg_hyperedges(id) ON DELETE CASCADE,
FOREIGN KEY (entity_id) REFERENCES kg_entities(id) ON DELETE CASCADE
);
CREATE INDEX idx_hyperedge_entities_entity ON kg_hyperedge_entities(entity_id);
```
## Performance
Benchmarks on a knowledge graph with 2,619 entities and 1.48M relations:
| Entity insert | < 1ms |
| Relation insert | < 1ms |
| BFS (depth 3) | ~50ms |
| PageRank | ~200ms |
| Louvain | ~500ms |
| Vector search (k=10) | ~10ms |
## Implementation Status
| Entity/Relation CRUD | ✅ Complete |
| Graph Traversal (BFS/DFS) | ✅ Complete |
| Shortest Path | ✅ Complete |
| PageRank | ✅ Complete |
| Louvain Community Detection | ✅ Complete |
| Connected Components | ✅ Complete |
| Vector Storage | ✅ Complete |
| Semantic Search | ✅ Complete |
| RAG Integration | ✅ Complete |
| SQLite Extension | ✅ Complete |
| CLI Tool | ✅ Complete |
| GitHub Actions CI | ✅ Complete |
| More Extension Functions | ✅ Complete (v0.7.0) |
| **Vector Indexing (TurboQuant)** | ✅ **Complete (v0.8.0)** |
| **Higher-order Relations (Hyperedge)** | ✅ **Complete (v0.9.0)** |
| Graph Visualization Export | ⏳ Planned |
| Async API | ⏳ Planned |
## Testing
```bash
# Run all tests
cargo test
# Run with verbose output
cargo test -- --nocapture
# Run specific test
cargo test test_pagerank
```
Current test coverage: **38 tests passing**
## Projects Using This Library
- **OpenClaw Knowledge Base**: 2,497 papers, 122 skills, 1.48M relations
- **Research Paper Analysis**: Graph-based paper discovery
## License
MIT License
## Contributing
Contributions are welcome! Please open an issue or submit a pull request.
## Acknowledgments
Built with:
- [rusqlite](https://github.com/rusqlite/rusqlite) - SQLite bindings
- [sqlite-loadable](https://github.com/nickolay/nickolay.github.io/tree/main/sqlite-loadable-rs) - SQLite extension support
- [serde](https://serde.rs/) - Serialization framework
- [thiserror](https://docs.rs/thiserror/) - Error handling
## Changelog
See [CHANGELOG.md](./CHANGELOG.md) for version history.
## Star History
<a href="https://www.star-history.com/?repos=hiyenwong%2Fsqlite-knowledge-graph&type=date&legend=top-left">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/image?repos=hiyenwong/sqlite-knowledge-graph&type=date&theme=dark&legend=top-left" />
<source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/image?repos=hiyenwong/sqlite-knowledge-graph&type=date&theme=light&legend=top-left" />
<img alt="Star History Chart" src="https://api.star-history.com/image?repos=hiyenwong/sqlite-knowledge-graph&type=date&legend=top-left" />
</picture>
</a>