docs.rs failed to build pg_knowledge_graph-0.1.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
pg_knowledge_graph
PostgreSQL extension that adds graph algorithm capabilities to complement pgvector. Think of it as "pgvector with graph traversal".
Features
Graph Operations
| Function | Description |
|---|---|
kg_version() |
Returns extension version |
kg_stats() |
Returns entity count, relation count, graph density |
kg_bfs(start_id, max_depth) |
Breadth-first traversal, returns SETOF json |
kg_dfs(start_id, max_depth) |
Depth-first traversal, returns SETOF json |
kg_shortest_path(from_id, to_id, max_depth) |
BFS shortest path as json |
kg_pagerank(damping, max_iter) |
PageRank scores, returns TABLE(entity_id, score) |
kg_louvain() |
Louvain community detection, returns TABLE(entity_id, community_id, modularity) |
kg_connected_components() |
Weakly connected components |
kg_strongly_connected_components() |
Kosaraju's SCC |
Vector Search (Phase 3)
| Function | Description |
|---|---|
kg_vector_search(query_vector, k) |
Semantic search using pgvector cosine similarity |
kg_hybrid_search(query_vector, k, graph_depth, alpha, beta) |
Hybrid search combining vector similarity + graph structure |
kg_get_context(entity_id, depth) |
Extract N-hop neighborhood for RAG context enrichment |
Vector Quantization (Phase 4 - TurboQuant)
| Function | Description |
|---|---|
kg_quantized_search(query_vector, k, level) |
Fast approximate search with configurable quantization level (default 'int8') |
kg_quantize_info() |
Returns available quantization levels and compression ratios |
Quantization Levels:
| Level | Compression | Recall Loss | Notes |
|---|---|---|---|
int8 |
4x | ~0% | Default; near-lossless |
int4 |
8x | ~2% | Good balance |
binary |
32x | ~5% | Maximum compression |
TurboQuant Algorithm (based on arXiv:2504.19874):
- L2-normalize + √d scale — coordinates become approximately N(0,1), satisfying the Gaussian optimality assumption for Lloyd-Max quantization
- Random sign flip (xorshift64 PRNG) — lightweight dimension decorrelation (SRHT diagonal matrix D), O(d) storage vs O(d²) for full rotation
- Gaussian-optimal Lloyd-Max codebook — data-independent; no training data required
- Two-stage QJL residual — after main quantization, a 1-bit Quantized Johnson-Lindenstrauss projection of the residual
e = original − decode(main)is stored:
At query time the correctionqjl_bit = sign(r · e) # 1 bit residual_norm = ‖e‖₂ # 4 bytes (f32)qjl_bit × ‖e‖ × (r·y / ‖r‖) × √(2/π)is added to the main dot product, making the inner product estimate unbiased (+QJL applied to Int8/Int4 only) - SIMD-accelerated fused decode+dot — no intermediate
Vec<f32>allocation;#[target_feature]enables auto-vectorisation:- ARM64: NEON (always available on ARMv8-A)
- x86_64: AVX2 + FMA (runtime-detected via
is_x86_feature_detected!) - Other: scalar fallback
Requirements
- PostgreSQL 16, 17, or 18
- Rust 1.75+
- cargo-pgrx 0.17.0
- pgvector (for vector search)
Installation
Then in psql:
CREATE EXTENSION pgvector; -- Required for vector search
CREATE EXTENSION pg_knowledge_graph;
Quick Start
-- Create entities with embeddings
INSERT INTO kg_entities (entity_type, name, properties, embedding)
VALUES ('person', 'Alice', '{"age": 30}', '[0.1, 0.2, ...]'::vector),
('person', 'Bob', '{"age": 25}', '[0.3, 0.4, ...]'::vector),
('person', 'Carol', '{"age": 28}', '[0.5, 0.6, ...]'::vector);
-- Create relations
INSERT INTO kg_relations (source_id, target_id, rel_type, weight)
VALUES (1, 2, 'knows', 1.0),
(2, 3, 'knows', 0.8);
-- BFS from Alice with depth 2
SELECT * FROM kg_bfs(1, 2);
-- PageRank
SELECT * FROM kg_pagerank(0.85, 100) ORDER BY score DESC;
-- Community detection
SELECT * FROM kg_louvain;
-- Shortest path Alice -> Carol
SELECT kg_shortest_path(1, 3, 5);
-- Vector search (find similar entities)
SELECT * FROM kg_vector_search('[0.1, 0.2, ...]'::vector, 10);
-- Hybrid search (vector + graph structure)
SELECT * FROM kg_hybrid_search('[0.1, 0.2, ...]'::vector, 10, 2, 0.7, 0.3);
-- Quantized search (faster, approximate) — default int8
SELECT * FROM kg_quantized_search('[0.1, 0.2, ...]'::vector, 10);
-- Quantized search with explicit level
SELECT * FROM kg_quantized_search('[0.1, 0.2, ...]'::vector, 10, 'int4');
-- View available quantization levels
SELECT kg_quantize_info;
Development
# Set up environment (macOS)
# Run tests against PG18
# Run a single test
# Lint and format
Architecture
src/
├── lib.rs # #[pg_extern] entry points for all SQL functions
├── quantize.rs # TurboQuant: Lloyd-Max codebook, two-stage QJL, SIMD decode+dot
├── graph/
│ ├── mod.rs # Shared SPI helpers (load_edges, load_entity_ids)
│ ├── traversal.rs # BFS, DFS, shortest path
│ ├── pagerank.rs # Iterative PageRank with dangling node handling
│ ├── louvain.rs # Greedy Louvain community detection
│ └── components.rs # Weakly/strongly connected components (Kosaraju)
├── vector.rs # pgvector integration, semantic search
└── rag.rs # Hybrid search, context extraction for RAG
sql/
└── pg_knowledge_graph--0.1.0.sql # DDL: kg_entities, kg_relations, indexes
Data layer is accessed entirely via pgrx::Spi — no external database drivers.
Development Roadmap
| Phase | Status | Description |
|---|---|---|
| Phase 1 | ✅ Complete | Schema DDL, version, stats, CI setup |
| Phase 2 | ✅ Complete | Graph algorithms (BFS/DFS, PageRank, Louvain, SCC) |
| Phase 3 | ✅ Complete | pgvector integration, hybrid search, RAG context |
| Phase 4 | ✅ Complete | TurboQuant quantization: Lloyd-Max codebook, two-stage QJL residual, SIMD decode+dot |
License
MIT