Skip to main content

Crate oxirs_graphrag

Crate oxirs_graphrag 

Source
Expand description

§OxiRS GraphRAG

GraphRAG (Graph Retrieval-Augmented Generation) is a production-ready Rust library that combines knowledge-graph topology traversal with vector similarity search to deliver context-rich answers for LLM pipelines — without any network dependencies at query time.

It is the JVM-free, pure-Rust counterpart of Microsoft’s GraphRAG and LangChain’s knowledge-graph QA stack, integrated directly with the OxiRS semantic-web engine.

§Data-flow overview

Natural-Language Query
        │
        ▼
┌───────────────────┐
│  Query Embedding  │  (oxirs-embed / Node2Vec / TransE)
└────────┬──────────┘
         │
  ┌──────┴──────┐
  │             │
  ▼             ▼
Vector        Keyword
KNN           BM25
Search        Search
  │             │
  └──────┬──────┘
         │
         ▼
 ┌───────────────┐
 │  RRF Fusion   │  Reciprocal Rank Fusion → Seed Entities
 └───────┬───────┘
         │
         ▼
 ┌────────────────────────┐
 │  SPARQL N-hop Expansion│  Graph traversal (up to 500 triples)
 └────────────┬───────────┘
              │
              ▼
 ┌────────────────────────┐
 │  Community Detection   │  Louvain / Leiden clustering
 └────────────┬───────────┘
              │
              ▼
 ┌────────────────────────┐
 │  Context Building      │  Subgraph → natural-language context
 └────────────┬───────────┘
              │
              ▼
 ┌────────────────────────┐
 │  LLM Generation        │  Answer + citations
 └────────────────────────┘

§Key modules

ModulePurpose
triple_extractorRule-based NLP → RDF triple extraction
community_detectorGreedy label-propagation community detection
path_finderBFS / DFS shortest-path retrieval in KGs
graph_embedderNode2Vec-style random-walk structural embeddings
summarizerCluster-based subgraph summarization for LLM context
path_rankerPredicate-weighted path ranking
context_builderN-hop subgraph extraction and truncation
knowledge_fusionMulti-source KG fusion with provenance
graph_summarizationPageRank-style community summary generation
entity_linkingEntity linking and disambiguation
explainabilityAttention weights, path explanation, provenance
feedbackSession-scoped user-feedback weight adaptation
graphCore community detection and graph traversal primitives
retrievalHybrid vector + keyword retrieval with RRF fusion
generationPrompt templates and LLM context building
temporalTemporal knowledge graph retrieval

§Quickstart — standalone pipeline (no network, no LLM)

The example below runs an end-to-end mini-pipeline entirely in memory on a synthetic 8-node knowledge graph: extract triples from text, detect communities, find paths, and summarize the result.

use oxirs_graphrag::triple_extractor::{ExtractionConfig, TripleExtractor};
use oxirs_graphrag::community_detector::{CommunityGraph, CommunityDetector};
use oxirs_graphrag::path_finder::{KnowledgeEdge, PathFinder, PathFinderConfig};
use oxirs_graphrag::summarizer::{KgEdge, KgNode, KgSubgraph, SubgraphSummarizer};

// ── Step 1: Extract triples from natural language ─────────────────────────
let corpus = [
    "Alice is a data scientist.",
    "Bob works at ACME.",
    "Carol is a software engineer.",
    "Dave is part of the AI team.",
    "ACME has a research division.",
];
let extractor = TripleExtractor::with_defaults(ExtractionConfig::default());
let all_triples: Vec<_> = corpus
    .iter()
    .flat_map(|sentence| extractor.extract(sentence))
    .collect();
assert!(!all_triples.is_empty(), "at least one triple extracted");

// ── Step 2: Build community graph and detect clusters ─────────────────────
let mut cg = CommunityGraph::new();
// 8 synthetic nodes
for (id, label) in [
    (1u64, "Alice"), (2, "Bob"), (3, "Carol"), (4, "Dave"),
    (5, "ACME"),    (6, "AI-Team"), (7, "Research"), (8, "Berlin"),
] {
    cg.add_node(id, label);
}
for (a, b) in [(1,5),(2,5),(3,6),(4,6),(5,7),(6,7),(7,8),(1,2)] {
    cg.add_edge(a, b, 1.0);
}
let detector = CommunityDetector::new(2, 50);
let detection = detector.detect(&mut cg);
assert!(!detection.communities.is_empty(), "at least one community");

// ── Step 3: Graph path retrieval ──────────────────────────────────────────
let edges = vec![
    KnowledgeEdge::new("Alice",    "works_at",    "ACME"),
    KnowledgeEdge::new("ACME",     "located_in",  "Berlin"),
    KnowledgeEdge::new("Bob",      "knows",       "Alice"),
    KnowledgeEdge::new("Alice",    "member_of",   "AI-Team"),
    KnowledgeEdge::new("AI-Team",  "part_of",     "ACME"),
    KnowledgeEdge::new("Carol",    "works_at",    "ACME"),
    KnowledgeEdge::new("Dave",     "leads",       "AI-Team"),
    KnowledgeEdge::new("Research", "division_of", "ACME"),
];
let finder = PathFinder::new(edges, PathFinderConfig::default());
let paths = finder.bfs_paths("Bob", "Berlin", 4);
assert!(!paths.is_empty(), "path Bob→Berlin found");

// ── Step 4: Summarize subgraph for LLM context ────────────────────────────
let mut subgraph = KgSubgraph::new();
for (id, label, ty) in [
    ("alice",    "Alice",    "Person"),
    ("bob",      "Bob",      "Person"),
    ("carol",    "Carol",    "Person"),
    ("acme",     "ACME",     "Organization"),
    ("berlin",   "Berlin",   "Place"),
    ("ai_team",  "AI-Team",  "Team"),
    ("research", "Research", "Department"),
    ("dave",     "Dave",     "Person"),
] {
    subgraph.add_node(KgNode::simple(id, label, ty));
}
subgraph.add_edge(KgEdge::unweighted("alice", "acme",  "works_at"));
subgraph.add_edge(KgEdge::unweighted("acme",  "berlin","located_in"));

let summarizer = SubgraphSummarizer::new();
let clusters = summarizer.summarize(&subgraph, 10);
assert!(!clusters.is_empty(), "at least one cluster");
let text_summary = summarizer.generate_text_summary(&clusters);
assert!(!text_summary.is_empty(), "non-empty summary text");

§Full engine usage (async, requires trait impls)

For production usage with a real vector index, embedding model, SPARQL engine, and LLM client:

use oxirs_graphrag::{GraphRAGEngine, GraphRAGConfig};
use std::sync::Arc;

let config = GraphRAGConfig {
    top_k: 20,
    expansion_hops: 2,
    enable_communities: true,
    ..Default::default()
};

// Provide your own implementations of VectorIndexTrait, EmbeddingModelTrait,
// SparqlEngineTrait, and LlmClientTrait:
let engine = GraphRAGEngine::new(
    Arc::new(my_vec_index),
    Arc::new(my_embedder),
    Arc::new(my_sparql),
    Arc::new(my_llm),
    config,
);

let result = engine.query("What safety issues affect battery cells?").await?;
println!("Answer: {}", result.answer);
println!("Confidence: {:.2}", result.confidence);

See docs/tutorial.md for a step-by-step walkthrough.

Re-exports§

pub use summarizer::GraphSummarizer;
pub use summarizer::GraphSummary;
pub use feedback::Relevance;
pub use feedback::TripleId;
pub use feedback::TripleRelevanceFeedback;
pub use gnn_encoder::AdjacencyGraph;
pub use gnn_encoder::EdgeList;
pub use gnn_encoder::GnnEncoder;
pub use gnn_encoder::GnnEncoderConfig;
pub use gnn_encoder::ScaledDotProductAttention;
pub use cache::query_cache::CacheEntry;
pub use cache::query_cache::CacheStats;
pub use cache::query_cache::QueryCache;
pub use cache::query_cache::QueryCacheConfig;
pub use config::CacheConfiguration;
pub use config::GraphRAGConfig;
pub use embeddings::node2vec::Node2VecConfig;
pub use embeddings::node2vec::Node2VecEmbedder;
pub use embeddings::node2vec::Node2VecEmbeddings;
pub use embeddings::node2vec::Node2VecWalkConfig;
pub use graph::community::CommunityAlgorithm;
pub use graph::community::CommunityConfig;
pub use graph::community::CommunityDetector;
pub use graph::embeddings::CommunityAwareEmbeddings;
pub use graph::embeddings::CommunityStructure;
pub use graph::embeddings::EmbeddingConfig;
pub use graph::traversal::GraphTraversal;
pub use hybrid::lora::LoraAdapter;
pub use hybrid::lora::LoraTrainer;
pub use query::planner::QueryPlanner;
pub use retrieval::fusion::FusionStrategy;
pub use model_loader::GgufMetadata;
pub use model_loader::GgufModelArch;
pub use model_loader::GgufParseError;
pub use model_loader::GgufParser;
pub use model_loader::GgufTensorInfo;
pub use model_loader::GgufValue;
pub use model_loader::ModelHandle;
pub use model_loader::ModelInfo;
pub use model_loader::ModelRegistry;
pub use model_loader::RegistryError;

Modules§

cache
Cache module for GraphRAG query results
community_detector
Graph community detection using a greedy label-propagation approach.
config
GraphRAG configuration
context_builder
Context building for graph-based RAG.
distributed
Distributed GraphRAG: federated subgraph expansion across multiple SPARQL endpoints.
embeddings
Graph embedding algorithms for GraphRAG.
entity_classifier
Entity type classification for knowledge graph nodes.
entity_linker
String-to-RDF entity linking: mention detection and candidate ranking.
entity_linking
Entity linking and disambiguation for knowledge graphs.
explainability
Explainability engine for graph-based RAG — attention weights, path explanation, provenance.
federation
Federation layer for distributed GraphRAG queries.
feedback
Interactive feedback loop for graph-based RAG retrieval refinement.
fusion
Fusion and reranking module for GraphRAG
generation
Answer generation module
gnn_encoder
GraphSAGE encoder for knowledge-graph entity embeddings.
graph
Graph processing module
graph_embedder
Graph Embedder
graph_partitioner
Graph partitioning using greedy and label-propagation methods.
graph_summarization
Graph Summarization for GraphRAG
hybrid
Hybrid GNN+LLM architecture — phases b, c, and d.
knowledge_fusion
Multi-source knowledge fusion.
model_loader
Pure-Rust GGUF model metadata loader and thread-safe model registry.
neuro_symbolic
Neuro-symbolic module: physics-informed entity scoring for knowledge graphs.
path_finder
Path Finder for Graph-RAG
path_ranker
Knowledge Graph Path Ranker
query
Query processing module
reasoning
Reasoning module for GraphRAG
retrieval
Retrieval module for GraphRAG
sparql
SPARQL extension functions for GraphRAG
streaming
Streaming subgraph extraction using SPARQL-like patterns.
summarizer
Knowledge Graph Subgraph Summarizer
temporal
Temporal reasoning and time-aware retrieval for GraphRAG
transe_model
TransE Knowledge Graph Embedding Model
triple_extractor
Triple Extractor

Structs§

CacheConfig
Cache configuration
CommunitySummary
Community summary for hierarchical retrieval
GraphRAGEngine
Main GraphRAG engine
GraphRAGResult2
GraphRAG query result
QueryProvenance
Query provenance for attribution
ScoredEntity
Entity with relevance score
Triple
Triple representation for RDF data

Enums§

GraphRAGError
GraphRAG error types
ScoreSource
Source of entity score

Traits§

EmbeddingModelTrait
Trait for embedding model operations
LlmClientTrait
Trait for LLM client operations
SparqlEngineTrait
Trait for SPARQL engine operations
VectorIndexTrait
Trait for vector index operations

Type Aliases§

GraphRAGResult