docs.rs failed to build pg_knowledge_graph-0.1.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

pg_knowledge_graph

PostgreSQL extension that adds graph algorithm capabilities to complement pgvector. Think of it as "pgvector with graph traversal".

Features

Graph Operations

Function	Description
`kg_version()`	Returns extension version
`kg_stats()`	Returns entity count, relation count, graph density
`kg_bfs(start_id, max_depth)`	Breadth-first traversal, returns `SETOF json`
`kg_dfs(start_id, max_depth)`	Depth-first traversal, returns `SETOF json`
`kg_shortest_path(from_id, to_id, max_depth)`	BFS shortest path as `json`
`kg_pagerank(damping, max_iter)`	PageRank scores, returns `TABLE(entity_id, score)`
`kg_louvain()`	Louvain community detection, returns `TABLE(entity_id, community_id, modularity)`
`kg_connected_components()`	Weakly connected components
`kg_strongly_connected_components()`	Kosaraju's SCC

Vector Search (Phase 3)

Function	Description
`kg_vector_search(query_vector, k)`	Semantic search using pgvector cosine similarity
`kg_hybrid_search(query_vector, k, graph_depth, alpha, beta)`	Hybrid search combining vector similarity + graph structure
`kg_get_context(entity_id, depth)`	Extract N-hop neighborhood for RAG context enrichment

Vector Quantization (Phase 4 - TurboQuant)

Function	Description
`kg_quantized_search(query_vector, k, level)`	Fast approximate search with configurable quantization level (default `'int8'`)
`kg_quantize_info()`	Returns available quantization levels and compression ratios

Quantization Levels:

Level	Compression	Recall Loss	Notes
`int8`	4x	~0%	Default; near-lossless
`int4`	8x	~2%	Good balance
`binary`	32x	~5%	Maximum compression

TurboQuant Algorithm (based on arXiv:2504.19874):

L2-normalize + √d scale — coordinates become approximately N(0,1), satisfying the Gaussian optimality assumption for Lloyd-Max quantization
Random sign flip (xorshift64 PRNG) — lightweight dimension decorrelation (SRHT diagonal matrix D), O(d) storage vs O(d²) for full rotation
Gaussian-optimal Lloyd-Max codebook — data-independent; no training data required
Two-stage QJL residual — after main quantization, a 1-bit Quantized Johnson-Lindenstrauss projection of the residual e = original − decode(main) is stored:
```
qjl_bit = sign(r · e)           # 1 bit
residual_norm = ‖e‖₂            # 4 bytes (f32)
```
At query time the correction qjl_bit × ‖e‖ × (r·y / ‖r‖) × √(2/π) is added to the main dot product, making the inner product estimate unbiased (+QJL applied to Int8/Int4 only)
SIMD-accelerated fused decode+dot — no intermediate Vec<f32> allocation; #[target_feature] enables auto-vectorisation:
- ARM64: NEON (always available on ARMv8-A)
- x86_64: AVX2 + FMA (runtime-detected via is_x86_feature_detected!)
- Other: scalar fallback

Requirements

PostgreSQL 16, 17, or 18
Rust 1.75+
cargo-pgrx 0.17.0
pgvector (for vector search)

Installation

cargo install cargo-pgrx --version "=0.17.0" --locked
cargo pgrx init --pg18 /Applications/Postgres.app/Contents/Versions/18/bin/pg_config
cargo pgrx install --no-default-features --features pg18

Then in psql:

CREATE EXTENSION pgvector;  -- Required for vector search
CREATE EXTENSION pg_knowledge_graph;

Quick Start

-- Create entities with embeddings
INSERT INTO kg_entities (entity_type, name, properties, embedding)
VALUES ('person', 'Alice', '{"age": 30}', '[0.1, 0.2, ...]'::vector),
       ('person', 'Bob',   '{"age": 25}', '[0.3, 0.4, ...]'::vector),
       ('person', 'Carol', '{"age": 28}', '[0.5, 0.6, ...]'::vector);

-- Create relations
INSERT INTO kg_relations (source_id, target_id, rel_type, weight)
VALUES (1, 2, 'knows', 1.0),
       (2, 3, 'knows', 0.8);

-- BFS from Alice with depth 2
SELECT * FROM kg_bfs(1, 2);

-- PageRank
SELECT * FROM kg_pagerank(0.85, 100) ORDER BY score DESC;

-- Community detection
SELECT * FROM kg_louvain();

-- Shortest path Alice -> Carol
SELECT kg_shortest_path(1, 3, 5);

-- Vector search (find similar entities)
SELECT * FROM kg_vector_search('[0.1, 0.2, ...]'::vector, 10);

-- Hybrid search (vector + graph structure)
SELECT * FROM kg_hybrid_search('[0.1, 0.2, ...]'::vector, 10, 2, 0.7, 0.3);

-- Quantized search (faster, approximate) — default int8
SELECT * FROM kg_quantized_search('[0.1, 0.2, ...]'::vector, 10);

-- Quantized search with explicit level
SELECT * FROM kg_quantized_search('[0.1, 0.2, ...]'::vector, 10, 'int4');

-- View available quantization levels
SELECT kg_quantize_info();

Development

# Set up environment (macOS)
export SDKROOT=$(xcrun --show-sdk-path)
export BINDGEN_EXTRA_CLANG_ARGS="-I$(xcrun --show-sdk-path)/usr/include"

# Run tests against PG18
cargo pgrx test pg18

# Run a single test
cargo pgrx test pg18 -- test_kg_version

# Lint and format
cargo clippy --no-default-features --features pg18 -- -D warnings
cargo fmt

Architecture

src/
├── lib.rs                  # #[pg_extern] entry points for all SQL functions
├── quantize.rs             # TurboQuant: Lloyd-Max codebook, two-stage QJL, SIMD decode+dot
├── graph/
│   ├── mod.rs              # Shared SPI helpers (load_edges, load_entity_ids)
│   ├── traversal.rs        # BFS, DFS, shortest path
│   ├── pagerank.rs         # Iterative PageRank with dangling node handling
│   ├── louvain.rs          # Greedy Louvain community detection
│   └── components.rs       # Weakly/strongly connected components (Kosaraju)
├── vector.rs               # pgvector integration, semantic search
└── rag.rs                  # Hybrid search, context extraction for RAG
sql/
└── pg_knowledge_graph--0.1.0.sql  # DDL: kg_entities, kg_relations, indexes

Data layer is accessed entirely via pgrx::Spi — no external database drivers.

Development Roadmap

Phase	Status	Description
Phase 1	✅ Complete	Schema DDL, version, stats, CI setup
Phase 2	✅ Complete	Graph algorithms (BFS/DFS, PageRank, Louvain, SCC)
Phase 3	✅ Complete	pgvector integration, hybrid search, RAG context
Phase 4	✅ Complete	TurboQuant quantization: Lloyd-Max codebook, two-stage QJL residual, SIMD decode+dot

License

MIT

pg_knowledge_graph 0.1.0