Crate oxirs_graphrag

Expand description

§OxiRS GraphRAG

GraphRAG (Graph Retrieval-Augmented Generation) is a production-ready Rust library that combines knowledge-graph topology traversal with vector similarity search to deliver context-rich answers for LLM pipelines — without any network dependencies at query time.

It is the JVM-free, pure-Rust counterpart of Microsoft’s GraphRAG and LangChain’s knowledge-graph QA stack, integrated directly with the OxiRS semantic-web engine.

§Data-flow overview

Natural-Language Query
        │
        ▼
┌───────────────────┐
│  Query Embedding  │  (oxirs-embed / Node2Vec / TransE)
└────────┬──────────┘
         │
  ┌──────┴──────┐
  │             │
  ▼             ▼
Vector        Keyword
KNN           BM25
Search        Search
  │             │
  └──────┬──────┘
         │
         ▼
 ┌───────────────┐
 │  RRF Fusion   │  Reciprocal Rank Fusion → Seed Entities
 └───────┬───────┘
         │
         ▼
 ┌────────────────────────┐
 │  SPARQL N-hop Expansion│  Graph traversal (up to 500 triples)
 └────────────┬───────────┘
              │
              ▼
 ┌────────────────────────┐
 │  Community Detection   │  Louvain / Leiden clustering
 └────────────┬───────────┘
              │
              ▼
 ┌────────────────────────┐
 │  Context Building      │  Subgraph → natural-language context
 └────────────┬───────────┘
              │
              ▼
 ┌────────────────────────┐
 │  LLM Generation        │  Answer + citations
 └────────────────────────┘

§Key modules

Module	Purpose
`triple_extractor`	Rule-based NLP → RDF triple extraction
`community_detector`	Greedy label-propagation community detection
`path_finder`	BFS / DFS shortest-path retrieval in KGs
`graph_embedder`	Node2Vec-style random-walk structural embeddings
`summarizer`	Cluster-based subgraph summarization for LLM context
`path_ranker`	Predicate-weighted path ranking
`context_builder`	N-hop subgraph extraction and truncation
`knowledge_fusion`	Multi-source KG fusion with provenance
`graph_summarization`	PageRank-style community summary generation
`entity_linking`	Entity linking and disambiguation
`explainability`	Attention weights, path explanation, provenance
`feedback`	Session-scoped user-feedback weight adaptation
`graph`	Core community detection and graph traversal primitives
`retrieval`	Hybrid vector + keyword retrieval with RRF fusion
`generation`	Prompt templates and LLM context building
`temporal`	Temporal knowledge graph retrieval

§Quickstart — standalone pipeline (no network, no LLM)

The example below runs an end-to-end mini-pipeline entirely in memory on a synthetic 8-node knowledge graph: extract triples from text, detect communities, find paths, and summarize the result.

use oxirs_graphrag::triple_extractor::{ExtractionConfig, TripleExtractor};
use oxirs_graphrag::community_detector::{CommunityGraph, CommunityDetector};
use oxirs_graphrag::path_finder::{KnowledgeEdge, PathFinder, PathFinderConfig};
use oxirs_graphrag::summarizer::{KgEdge, KgNode, KgSubgraph, SubgraphSummarizer};

// ── Step 1: Extract triples from natural language ─────────────────────────
let corpus = [
    "Alice is a data scientist.",
    "Bob works at ACME.",
    "Carol is a software engineer.",
    "Dave is part of the AI team.",
    "ACME has a research division.",
];
let extractor = TripleExtractor::with_defaults(ExtractionConfig::default());
let all_triples: Vec<_> = corpus
    .iter()
    .flat_map(|sentence| extractor.extract(sentence))
    .collect();
assert!(!all_triples.is_empty(), "at least one triple extracted");

// ── Step 2: Build community graph and detect clusters ─────────────────────
let mut cg = CommunityGraph::new();
// 8 synthetic nodes
for (id, label) in [
    (1u64, "Alice"), (2, "Bob"), (3, "Carol"), (4, "Dave"),
    (5, "ACME"),    (6, "AI-Team"), (7, "Research"), (8, "Berlin"),
] {
    cg.add_node(id, label);
}
for (a, b) in [(1,5),(2,5),(3,6),(4,6),(5,7),(6,7),(7,8),(1,2)] {
    cg.add_edge(a, b, 1.0);
}
let detector = CommunityDetector::new(2, 50);
let detection = detector.detect(&mut cg);
assert!(!detection.communities.is_empty(), "at least one community");

// ── Step 3: Graph path retrieval ──────────────────────────────────────────
let edges = vec![
    KnowledgeEdge::new("Alice",    "works_at",    "ACME"),
    KnowledgeEdge::new("ACME",     "located_in",  "Berlin"),
    KnowledgeEdge::new("Bob",      "knows",       "Alice"),
    KnowledgeEdge::new("Alice",    "member_of",   "AI-Team"),
    KnowledgeEdge::new("AI-Team",  "part_of",     "ACME"),
    KnowledgeEdge::new("Carol",    "works_at",    "ACME"),
    KnowledgeEdge::new("Dave",     "leads",       "AI-Team"),
    KnowledgeEdge::new("Research", "division_of", "ACME"),
];
let finder = PathFinder::new(edges, PathFinderConfig::default());
let paths = finder.bfs_paths("Bob", "Berlin", 4);
assert!(!paths.is_empty(), "path Bob→Berlin found");

// ── Step 4: Summarize subgraph for LLM context ────────────────────────────
let mut subgraph = KgSubgraph::new();
for (id, label, ty) in [
    ("alice",    "Alice",    "Person"),
    ("bob",      "Bob",      "Person"),
    ("carol",    "Carol",    "Person"),
    ("acme",     "ACME",     "Organization"),
    ("berlin",   "Berlin",   "Place"),
    ("ai_team",  "AI-Team",  "Team"),
    ("research", "Research", "Department"),
    ("dave",     "Dave",     "Person"),
] {
    subgraph.add_node(KgNode::simple(id, label, ty));
}
subgraph.add_edge(KgEdge::unweighted("alice", "acme",  "works_at"));
subgraph.add_edge(KgEdge::unweighted("acme",  "berlin","located_in"));

let summarizer = SubgraphSummarizer::new();
let clusters = summarizer.summarize(&subgraph, 10);
assert!(!clusters.is_empty(), "at least one cluster");
let text_summary = summarizer.generate_text_summary(&clusters);
assert!(!text_summary.is_empty(), "non-empty summary text");

§Full engine usage (async, requires trait impls)

For production usage with a real vector index, embedding model, SPARQL engine, and LLM client:

use oxirs_graphrag::{GraphRAGEngine, GraphRAGConfig};
use std::sync::Arc;

let config = GraphRAGConfig {
    top_k: 20,
    expansion_hops: 2,
    enable_communities: true,
    ..Default::default()
};

// Provide your own implementations of VectorIndexTrait, EmbeddingModelTrait,
// SparqlEngineTrait, and LlmClientTrait:
let engine = GraphRAGEngine::new(
    Arc::new(my_vec_index),
    Arc::new(my_embedder),
    Arc::new(my_sparql),
    Arc::new(my_llm),
    config,
);

let result = engine.query("What safety issues affect battery cells?").await?;
println!("Answer: {}", result.answer);
println!("Confidence: {:.2}", result.confidence);

See docs/tutorial.md for a step-by-step walkthrough.

Re-exports§

pub use summarizer::GraphSummarizer;
pub use summarizer::GraphSummary;
pub use feedback::Relevance;
pub use feedback::TripleId;
pub use feedback::TripleRelevanceFeedback;
pub use gnn_encoder::AdjacencyGraph;
pub use gnn_encoder::EdgeList;
pub use gnn_encoder::GnnEncoder;
pub use gnn_encoder::GnnEncoderConfig;
pub use gnn_encoder::ScaledDotProductAttention;
pub use cache::query_cache::CacheEntry;
pub use cache::query_cache::CacheStats;
pub use cache::query_cache::QueryCache;
pub use cache::query_cache::QueryCacheConfig;
pub use config::CacheConfiguration;
pub use config::GraphRAGConfig;
pub use embeddings::node2vec::Node2VecConfig;
pub use embeddings::node2vec::Node2VecEmbedder;
pub use embeddings::node2vec::Node2VecEmbeddings;
pub use embeddings::node2vec::Node2VecWalkConfig;
pub use graph::community::CommunityAlgorithm;
pub use graph::community::CommunityConfig;
pub use graph::community::CommunityDetector;
pub use graph::embeddings::CommunityAwareEmbeddings;
pub use graph::embeddings::CommunityStructure;
pub use graph::embeddings::EmbeddingConfig;
pub use graph::traversal::GraphTraversal;
pub use hybrid::lora::LoraAdapter;
pub use hybrid::lora::LoraTrainer;
pub use query::planner::QueryPlanner;
pub use retrieval::fusion::FusionStrategy;
pub use model_loader::GgufMetadata;
pub use model_loader::GgufModelArch;
pub use model_loader::GgufParseError;
pub use model_loader::GgufParser;
pub use model_loader::GgufTensorInfo;
pub use model_loader::GgufValue;
pub use model_loader::ModelHandle;
pub use model_loader::ModelInfo;
pub use model_loader::ModelRegistry;
pub use model_loader::RegistryError;

Modules§

cache: Cache module for GraphRAG query results
community_detector: Graph community detection using a greedy label-propagation approach.
config: GraphRAG configuration
context_builder: Context building for graph-based RAG.
distributed: Distributed GraphRAG: federated subgraph expansion across multiple SPARQL endpoints.
embeddings: Graph embedding algorithms for GraphRAG.
entity_classifier: Entity type classification for knowledge graph nodes.
entity_linker: String-to-RDF entity linking: mention detection and candidate ranking.
entity_linking: Entity linking and disambiguation for knowledge graphs.
explainability: Explainability engine for graph-based RAG — attention weights, path explanation, provenance.
federation: Federation layer for distributed GraphRAG queries.
feedback: Interactive feedback loop for graph-based RAG retrieval refinement.
fusion: Fusion and reranking module for GraphRAG
generation: Answer generation module
gnn_encoder: GraphSAGE encoder for knowledge-graph entity embeddings.
graph: Graph processing module
graph_embedder: Graph Embedder
graph_partitioner: Graph partitioning using greedy and label-propagation methods.
graph_summarization: Graph Summarization for GraphRAG
hybrid: Hybrid GNN+LLM architecture — phases b, c, and d.
knowledge_fusion: Multi-source knowledge fusion.
model_loader: Pure-Rust GGUF model metadata loader and thread-safe model registry.
neuro_symbolic: Neuro-symbolic module: physics-informed entity scoring for knowledge graphs.
path_finder: Path Finder for Graph-RAG
path_ranker: Knowledge Graph Path Ranker
query: Query processing module
reasoning: Reasoning module for GraphRAG
retrieval: Retrieval module for GraphRAG
sparql: SPARQL extension functions for GraphRAG
streaming: Streaming subgraph extraction using SPARQL-like patterns.
summarizer: Knowledge Graph Subgraph Summarizer
temporal: Temporal reasoning and time-aware retrieval for GraphRAG
transe_model: TransE Knowledge Graph Embedding Model
triple_extractor: Triple Extractor

Structs§

CacheConfig: Cache configuration
CommunitySummary: Community summary for hierarchical retrieval
GraphRAGEngine: Main GraphRAG engine
GraphRAGResult2: GraphRAG query result
QueryProvenance: Query provenance for attribution
ScoredEntity: Entity with relevance score
Triple: Triple representation for RDF data

Enums§

GraphRAGError: GraphRAG error types
ScoreSource: Source of entity score

Traits§

EmbeddingModelTrait: Trait for embedding model operations
LlmClientTrait: Trait for LLM client operations
SparqlEngineTrait: Trait for SPARQL engine operations
VectorIndexTrait: Trait for vector index operations

Type Aliases§

GraphRAGResult