oxirs-graphrag 0.3.0

GraphRAG: Hybrid Vector + Graph Retrieval-Augmented Generation for OxiRS
Documentation
# oxirs-graphrag Tutorial

This tutorial walks you through the full **extract → embed → retrieve → generate**
lifecycle of `oxirs-graphrag` — from raw text to a KGQA answer.

---

## 1. Installation

Add to your `Cargo.toml`:

```toml
[dependencies]
oxirs-graphrag = { workspace = true }
```

Or with explicit version:

```toml
[dependencies]
oxirs-graphrag = "0.3"
```

The crate's default features enable community detection and hierarchical summarization.
Streaming SPARQL support requires the `streaming` feature:

```toml
oxirs-graphrag = { version = "0.3", features = ["streaming"] }
```

---

## 2. Extracting triples from text

`triple_extractor` converts natural-language sentences into RDF-like triples using
declarative pattern matching. It requires no external models.

```rust
use oxirs_graphrag::triple_extractor::{ExtractionConfig, TripleExtractor};

let config = ExtractionConfig {
    min_confidence: 0.3,
    max_triples_per_sentence: 10,
    normalize_predicates: true,
};
let extractor = TripleExtractor::with_defaults(config);

let triples = extractor.extract(
    "Alice is a data scientist. \
     Alice works at ACME. \
     ACME has a research division.",
);

for t in &triples {
    println!("({}, {}, {})", t.subject, t.predicate, t.object);
}
```

To add custom patterns (e.g. domain-specific relations):

```rust
use oxirs_graphrag::triple_extractor::{ExtractionConfig, ExtractionPattern, TripleExtractor};

let mut extractor = TripleExtractor::with_defaults(ExtractionConfig::default());
extractor.add_pattern(ExtractionPattern::new(
    "inhibits",
    "drug",
    vec!["inhibits".to_string()],
    "target",
));
```

---

## 3. Building the knowledge graph

Once triples are extracted, load them into a `KgSubgraph` for summarization or
into a `CommunityGraph` / `PathFinder` for structural analysis.

```rust
use oxirs_graphrag::summarizer::{KgEdge, KgNode, KgSubgraph};

let mut graph = KgSubgraph::new();

// Add typed nodes
graph.add_node(KgNode::simple("alice",    "Alice",    "Person"));
graph.add_node(KgNode::simple("acme",     "ACME",     "Organization"));
graph.add_node(KgNode::simple("research", "Research", "Department"));

// Add edges
graph.add_edge(KgEdge::unweighted("alice",    "acme",     "works_at"));
graph.add_edge(KgEdge::unweighted("acme",     "research", "has_division"));

println!("Nodes: {}  Edges: {}", graph.node_count(), graph.edge_count());
```

---

## 4. Embedding the graph for similarity search

`graph_embedder` provides structural node embeddings using Node2Vec-style
biased random walks and neighborhood aggregation — no deep learning required.

```rust
use oxirs_graphrag::graph_embedder::{Graph, GraphEmbedder, WalkConfig};

let mut g = Graph::new(5); // 5 nodes
g.add_edge(0, 1, 1.0);
g.add_edge(1, 2, 1.0);
g.add_edge(2, 3, 0.5);
g.add_edge(3, 4, 1.0);
g.add_edge(0, 3, 0.5);

// Structural embeddings (topology-only, deterministic)
let embeddings = GraphEmbedder::structural_embedding(&g, 16);
println!("Node 0 embedding dim: {}", embeddings[0].vector.len());

// Random-walk based embeddings (stochastic, call `embed` for EmbeddingResult)
let walk_config = WalkConfig {
    walk_length: 10,
    walks_per_node: 5,
    return_param_p: 1.0,
    in_out_param_q: 1.0,
};
let result = GraphEmbedder::embed(&g, &walk_config, 16);
println!("Embeddings generated: {}", result.embeddings.len());
```

---

## 5. Community detection

Use `community_detector` to group related entities (Louvain-inspired greedy
label propagation):

```rust
use oxirs_graphrag::community_detector::{CommunityDetector, CommunityGraph};

let mut cg = CommunityGraph::new();
for (id, label) in [(1u64, "Alice"), (2, "Bob"), (3, "ACME"), (4, "Research")] {
    cg.add_node(id, label);
}
cg.add_edge(1, 3, 1.0); // Alice – ACME
cg.add_edge(2, 3, 1.0); // Bob – ACME
cg.add_edge(3, 4, 1.0); // ACME – Research

let detector = CommunityDetector::new(/*min_size=*/ 2, /*max_iter=*/ 50);
let result = detector.detect(&mut cg);

println!("Communities: {}", result.communities.len());
println!("Modularity:  {:.4}", result.modularity);

for community in &result.communities {
    println!("  Community {:>2}: {} members", community.id, community.size());
}
```

For the full community pipeline on top of `Triple` data (SPARQL-backed),
see the `graph::community` module which implements Leiden algorithm with
the `CommunityConfig` struct.

---

## 6. Running a KGQA query

Use `PathFinder` to answer "how is X connected to Y?" queries over the
in-memory graph:

```rust
use oxirs_graphrag::path_finder::{KnowledgeEdge, PathFinder, PathFinderConfig};

let edges = vec![
    KnowledgeEdge::new("Alice",  "works_at",   "ACME"),
    KnowledgeEdge::new("ACME",   "located_in", "Berlin"),
    KnowledgeEdge::new("Bob",    "knows",       "Alice"),
];

let config = PathFinderConfig {
    max_depth: 3,
    max_paths: 10,
    ..Default::default()
};
let finder = PathFinder::new(edges, config);

let paths = finder.bfs_paths("Bob", "Berlin", 3);
for path in &paths {
    println!("{}", path.narrative());
    // e.g. "Bob —[knows]→ Alice —[works_at]→ ACME —[located_in]→ Berlin"
}
```

For path scoring with predicate relevance weights, populate
`PathFinderConfig::predicate_weights`.

---

## 7. Subgraph summarization (LLM context)

Before sending to an LLM, compress the retrieved subgraph into a readable
summary using `SubgraphSummarizer`:

```rust
use oxirs_graphrag::summarizer::{KgEdge, KgNode, KgSubgraph, SubgraphSummarizer};

let mut graph = KgSubgraph::new();
graph.add_node(KgNode::simple("alice", "Alice", "Person"));
graph.add_node(KgNode::simple("acme",  "ACME",  "Organization"));
graph.add_edge(KgEdge::unweighted("alice", "acme", "works_at"));

let summarizer = SubgraphSummarizer::new();

// Cluster nodes by type
let clusters = summarizer.summarize(&graph, 10);

// Generate natural-language paragraph
let context = summarizer.generate_text_summary(&clusters);
println!("{}", context);

// Top relation types for prompt engineering
let top_rels = summarizer.extract_key_relations(&graph, 5);
for (rel, count) in &top_rels {
    println!("  {rel}: {count}×");
}
```

---

## 8. Integrating with an LLM for RAG

For full RAG integration implement the four engine traits and construct
`GraphRAGEngine`:

```rust,ignore
use oxirs_graphrag::{
    GraphRAGConfig, GraphRAGEngine,
    VectorIndexTrait, EmbeddingModelTrait, SparqlEngineTrait, LlmClientTrait,
};
use std::sync::Arc;

// Implement the four traits on your types:
//   VectorIndexTrait  — wraps oxirs-vec HNSW index
//   EmbeddingModelTrait — wraps oxirs-embed model
//   SparqlEngineTrait  — wraps oxirs-arq engine
//   LlmClientTrait    — wraps your LLM HTTP client

let config = GraphRAGConfig {
    top_k: 20,
    expansion_hops: 2,
    max_subgraph_size: 500,
    enable_communities: true,
    vector_weight: 0.7,
    keyword_weight: 0.3,
    ..Default::default()
};

let engine = GraphRAGEngine::new(
    Arc::new(my_vec_index),
    Arc::new(my_embedder),
    Arc::new(my_sparql_engine),
    Arc::new(my_llm),
    config,
);

// Issue a KGQA query
let result = engine.query("What safety issues affect battery cells?").await?;
println!("Answer:     {}", result.answer);
println!("Confidence: {:.2}", result.confidence);
println!("Seed nodes: {}", result.seeds.len());
println!("Subgraph:   {} triples", result.subgraph.len());
```

The engine pipeline is:
1. Embed query → vector KNN search
2. BM25 keyword search via SPARQL REGEX
3. RRF fusion → seed entities
4. N-hop SPARQL graph expansion → subgraph
5. Community detection → hierarchical clusters
6. Context building → LLM prompt
7. LLM generation → answer + citations

---

## Next steps

- See the [architecture overview]architecture.md for module internals.
- Run the provided examples: `cargo run --example kgqa_basic -p oxirs-graphrag`
- Browse the full API at `cargo doc -p oxirs-graphrag --open`.