pg_knowledge_graph 0.1.0

PostgreSQL knowledge graph extension with graph algorithms and pgvector integration
docs.rs failed to build pg_knowledge_graph-0.1.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

pg_knowledge_graph

PostgreSQL extension that adds graph algorithm capabilities to complement pgvector. Think of it as "pgvector with graph traversal".

Features

Graph Operations

Function Description
kg_version() Returns extension version
kg_stats() Returns entity count, relation count, graph density
kg_bfs(start_id, max_depth) Breadth-first traversal, returns SETOF json
kg_dfs(start_id, max_depth) Depth-first traversal, returns SETOF json
kg_shortest_path(from_id, to_id, max_depth) BFS shortest path as json
kg_pagerank(damping, max_iter) PageRank scores, returns TABLE(entity_id, score)
kg_louvain() Louvain community detection, returns TABLE(entity_id, community_id, modularity)
kg_connected_components() Weakly connected components
kg_strongly_connected_components() Kosaraju's SCC

Vector Search (Phase 3)

Function Description
kg_vector_search(query_vector, k) Semantic search using pgvector cosine similarity
kg_hybrid_search(query_vector, k, graph_depth, alpha, beta) Hybrid search combining vector similarity + graph structure
kg_get_context(entity_id, depth) Extract N-hop neighborhood for RAG context enrichment

Vector Quantization (Phase 4 - TurboQuant)

Function Description
kg_quantized_search(query_vector, k, level) Fast approximate search with configurable quantization level (default 'int8')
kg_quantize_info() Returns available quantization levels and compression ratios

Quantization Levels:

Level Compression Recall Loss Notes
int8 4x ~0% Default; near-lossless
int4 8x ~2% Good balance
binary 32x ~5% Maximum compression

TurboQuant Algorithm (based on arXiv:2504.19874):

  1. L2-normalize + √d scale — coordinates become approximately N(0,1), satisfying the Gaussian optimality assumption for Lloyd-Max quantization
  2. Random sign flip (xorshift64 PRNG) — lightweight dimension decorrelation (SRHT diagonal matrix D), O(d) storage vs O(d²) for full rotation
  3. Gaussian-optimal Lloyd-Max codebook — data-independent; no training data required
  4. Two-stage QJL residual — after main quantization, a 1-bit Quantized Johnson-Lindenstrauss projection of the residual e = original − decode(main) is stored:
    qjl_bit = sign(r · e)           # 1 bit
    residual_norm = ‖e‖₂            # 4 bytes (f32)
    
    At query time the correction qjl_bit × ‖e‖ × (r·y / ‖r‖) × √(2/π) is added to the main dot product, making the inner product estimate unbiased (+QJL applied to Int8/Int4 only)
  5. SIMD-accelerated fused decode+dot — no intermediate Vec<f32> allocation; #[target_feature] enables auto-vectorisation:
    • ARM64: NEON (always available on ARMv8-A)
    • x86_64: AVX2 + FMA (runtime-detected via is_x86_feature_detected!)
    • Other: scalar fallback

Requirements

Installation

cargo install cargo-pgrx --version "=0.17.0" --locked
cargo pgrx init --pg18 /Applications/Postgres.app/Contents/Versions/18/bin/pg_config
cargo pgrx install --no-default-features --features pg18

Then in psql:

CREATE EXTENSION pgvector;  -- Required for vector search
CREATE EXTENSION pg_knowledge_graph;

Quick Start

-- Create entities with embeddings
INSERT INTO kg_entities (entity_type, name, properties, embedding)
VALUES ('person', 'Alice', '{"age": 30}', '[0.1, 0.2, ...]'::vector),
       ('person', 'Bob',   '{"age": 25}', '[0.3, 0.4, ...]'::vector),
       ('person', 'Carol', '{"age": 28}', '[0.5, 0.6, ...]'::vector);

-- Create relations
INSERT INTO kg_relations (source_id, target_id, rel_type, weight)
VALUES (1, 2, 'knows', 1.0),
       (2, 3, 'knows', 0.8);

-- BFS from Alice with depth 2
SELECT * FROM kg_bfs(1, 2);

-- PageRank
SELECT * FROM kg_pagerank(0.85, 100) ORDER BY score DESC;

-- Community detection
SELECT * FROM kg_louvain();

-- Shortest path Alice -> Carol
SELECT kg_shortest_path(1, 3, 5);

-- Vector search (find similar entities)
SELECT * FROM kg_vector_search('[0.1, 0.2, ...]'::vector, 10);

-- Hybrid search (vector + graph structure)
SELECT * FROM kg_hybrid_search('[0.1, 0.2, ...]'::vector, 10, 2, 0.7, 0.3);

-- Quantized search (faster, approximate) — default int8
SELECT * FROM kg_quantized_search('[0.1, 0.2, ...]'::vector, 10);

-- Quantized search with explicit level
SELECT * FROM kg_quantized_search('[0.1, 0.2, ...]'::vector, 10, 'int4');

-- View available quantization levels
SELECT kg_quantize_info();

Development

# Set up environment (macOS)
export SDKROOT=$(xcrun --show-sdk-path)
export BINDGEN_EXTRA_CLANG_ARGS="-I$(xcrun --show-sdk-path)/usr/include"

# Run tests against PG18
cargo pgrx test pg18

# Run a single test
cargo pgrx test pg18 -- test_kg_version

# Lint and format
cargo clippy --no-default-features --features pg18 -- -D warnings
cargo fmt

Architecture

src/
├── lib.rs                  # #[pg_extern] entry points for all SQL functions
├── quantize.rs             # TurboQuant: Lloyd-Max codebook, two-stage QJL, SIMD decode+dot
├── graph/
│   ├── mod.rs              # Shared SPI helpers (load_edges, load_entity_ids)
│   ├── traversal.rs        # BFS, DFS, shortest path
│   ├── pagerank.rs         # Iterative PageRank with dangling node handling
│   ├── louvain.rs          # Greedy Louvain community detection
│   └── components.rs       # Weakly/strongly connected components (Kosaraju)
├── vector.rs               # pgvector integration, semantic search
└── rag.rs                  # Hybrid search, context extraction for RAG
sql/
└── pg_knowledge_graph--0.1.0.sql  # DDL: kg_entities, kg_relations, indexes

Data layer is accessed entirely via pgrx::Spi — no external database drivers.

Development Roadmap

Phase Status Description
Phase 1 ✅ Complete Schema DDL, version, stats, CI setup
Phase 2 ✅ Complete Graph algorithms (BFS/DFS, PageRank, Louvain, SCC)
Phase 3 ✅ Complete pgvector integration, hybrid search, RAG context
Phase 4 ✅ Complete TurboQuant quantization: Lloyd-Max codebook, two-stage QJL residual, SIMD decode+dot

License

MIT