# GraphRAG Core
The core library for GraphRAG-rs, providing portable functionality for both native and WASM deployments.
## Overview
`graphrag-core` is the foundational library that powers GraphRAG-rs. It provides:
- **Embedding Generation**: 8 provider backends (HuggingFace, OpenAI, Voyage AI, Cohere, Jina, Mistral, Together AI, Ollama)
- **Entity Extraction**: TRUE LLM-based gleaning extraction with multi-round refinement (Microsoft GraphRAG-style)
- **Graph Construction**: Incremental updates, PageRank, community detection
- **Retrieval Strategies**: Vector, BM25, PageRank, hybrid, adaptive
- **Configuration System**: Hierarchical TOML-based configuration with environment variable overrides
- **Cross-Platform**: Works on native (Linux, macOS, Windows) and WASM
## Quick Start (5 Lines!)
```rust
use graphrag_core::prelude::*;
#[tokio::main]
async fn main() -> Result<()> {
let mut graphrag = GraphRAG::quick_start("Your document text here").await?;
let answer = graphrag.ask("What is the main topic?").await?;
println!("{}", answer);
Ok(())
}
```
**Or with detailed explanations:**
```rust
let explained = graphrag.ask_explained("What is the main topic?").await?;
println!("Answer: {}", explained.answer);
println!("Confidence: {:.0}%", explained.confidence * 100.0);
for step in &explained.reasoning_steps {
println!("Step {}: {}", step.step_number, step.description);
}
```
## Installation
Add to your `Cargo.toml`:
```toml
[dependencies]
# Choose a feature bundle:
graphrag-core = { version = "0.1", features = ["starter"] } # Basic setup
# OR
graphrag-core = { version = "0.1", features = ["full"] } # Production-ready
# OR
graphrag-core = { version = "0.1", features = ["research"] } # Advanced features
```
### Feature Bundles
| **`starter`** | Minimal setup to get started | async, ollama, memory-storage, basic-retrieval |
| **`full`** | Production-ready with common features | starter + pagerank, lightrag, caching, parallel-processing, leiden |
| **`wasm-bundle`** | Browser-safe features only | memory-storage, basic-retrieval, leiden |
| **`research`** | Advanced experimental features | full + rograg, cross-encoder, incremental, monitoring |
## Three Ways to Configure
### 1. TypedBuilder (Compile-Time Safety)
```rust
use graphrag_core::prelude::*;
// Build won't compile until required fields are set!
let graphrag = TypedBuilder::new()
.with_output_dir("./output") // Required
.with_ollama() // Required: choose LLM backend
.with_chunk_size(512) // Optional
.with_top_k(10) // Optional
.build()?;
```
**Available LLM backends:**
- `.with_ollama()` - Local Ollama (recommended)
- `.with_ollama_custom("host", 8080, "model")` - Custom Ollama config
- `.with_hash_embeddings()` - Offline, no LLM needed
- `.with_candle_embeddings()` - Local neural embeddings
### 2. Hierarchical Config (with figment)
Enable with the `hierarchical-config` feature:
```rust
// Loads configuration from 5 sources (in priority order):
// 1. Code defaults (lowest priority)
// 2. ~/.graphrag/config.toml (user config)
// 3. ./graphrag.toml (project config)
// 4. Environment variables (GRAPHRAG_*)
// 5. Builder overrides (highest priority)
let config = Config::load()?; // Automatically merges all sources
let graphrag = GraphRAG::new(config)?;
```
**Environment variable overrides:**
```bash
export GRAPHRAG_OLLAMA_HOST=my-server
export GRAPHRAG_OLLAMA_PORT=8080
export GRAPHRAG_CHUNK_SIZE=1000
```
### 3. TOML Configuration File
```toml
# graphrag.toml
output_dir = "./output"
approach = "hybrid" # semantic, algorithmic, or hybrid
chunk_size = 1000
chunk_overlap = 200
[embeddings]
backend = "ollama"
dimension = 768
model = "nomic-embed-text:latest"
[ollama]
enabled = true
host = "localhost"
port = 11434
chat_model = "llama3.2:3b"
[entities]
min_confidence = 0.7
use_gleaning = true
max_gleaning_rounds = 3
entity_types = ["PERSON", "ORGANIZATION", "LOCATION", "DATE", "EVENT"]
```
Load with:
```rust
let config = Config::from_toml_file("graphrag.toml")?;
let graphrag = GraphRAG::new(config)?;
```
## Sectoral Templates
Pre-configured templates for specific domains:
| `general.toml` | Mixed documents | PERSON, ORGANIZATION, LOCATION, DATE, EVENT |
| `legal.toml` | Contracts, agreements | PARTY, JURISDICTION, CLAUSE_TYPE, OBLIGATION |
| `medical.toml` | Clinical notes | PATIENT, DIAGNOSIS, MEDICATION, SYMPTOM |
| `financial.toml` | Reports, filings | COMPANY, TICKER, MONETARY_VALUE, METRIC |
| `technical.toml` | API docs, code | FUNCTION, CLASS, MODULE, API_ENDPOINT |
**Using templates:**
```rust
let config = Config::from_toml_file("templates/legal.toml")?;
```
**Or via CLI:**
```bash
graphrag-cli setup --template legal
```
## Explained Answers
Get transparency into how answers are generated:
```rust
let explained = graphrag.ask_explained("Who founded the company?").await?;
// Access detailed information:
println!("Answer: {}", explained.answer);
println!("Confidence: {:.0}%", explained.confidence * 100.0);
// Reasoning trace
for step in &explained.reasoning_steps {
println!("{}. {} (confidence: {:.0}%)",
step.step_number,
step.description,
step.confidence * 100.0
);
}
// Source references
for source in &explained.sources {
println!("Source: {} ({:?})", source.id, source.source_type);
println!(" Excerpt: {}", source.excerpt);
}
// Or get formatted output
println!("{}", explained.format_display());
```
**Output:**
```
**Answer:** John Smith founded Acme Corp in 2015.
**Confidence:** 85%
**Reasoning:**
1. Analyzed query: "Who founded the company?" (confidence: 95%)
2. Found 3 relevant entities (confidence: 85%)
3. Retrieved 5 relevant text chunks (confidence: 85%)
4. Synthesized answer from retrieved information (confidence: 85%)
**Sources:**
1. [TextChunk] chunk_123 (relevance: 92%)
2. [Entity] john_smith (relevance: 88%)
```
## Error Handling
Errors implement standard `std::error::Error` and carry descriptive messages:
```rust
match graphrag.ask("question").await {
Ok(answer) => println!("{}", answer),
Err(e) => {
println!("Error: {}", e);
}
}
```
## CLI Setup Wizard
Interactive configuration wizard:
```bash
graphrag-cli setup
# With template:
graphrag-cli setup --template legal
# Custom output:
graphrag-cli setup --output ./my-config.toml
```
**Wizard prompts:**
1. Select use case (General, Legal, Medical, Financial, Technical)
2. Choose LLM provider (Ollama or pattern-based)
3. Configure Ollama settings (if selected)
4. Set output directory
## Full Usage Example
```rust
use graphrag_core::prelude::*;
#[tokio::main]
async fn main() -> Result<()> {
// Option 1: Quick start (simplest)
let mut graphrag = GraphRAG::quick_start("Your document text").await?;
// Option 2: TypedBuilder (compile-time safe)
let mut graphrag = TypedBuilder::new()
.with_output_dir("./output")
.with_ollama()
.with_chunk_size(512)
.build_and_init()?;
// Add documents
graphrag.add_document_from_text("Document content here")?;
// Build knowledge graph
graphrag.build_graph().await?;
// Query
let answer = graphrag.ask("What are the main topics?").await?;
println!("{}", answer);
// Or with explanations
let explained = graphrag.ask_explained("What are the main topics?").await?;
println!("{}", explained.format_display());
Ok(())
}
```
## Embedding Providers
GraphRAG Core supports 8 embedding backends:
| **HuggingFace** | Free | ★★★★ | `huggingface-hub` | Offline, 100+ models |
| **OpenAI** | $0.13/1M | ★★★★★ | `ureq` | Best quality |
| **Voyage AI** | Medium | ★★★★★ | `ureq` | Anthropic recommended |
| **Cohere** | $0.10/1M | ★★★★ | `ureq` | Multilingual (100+ langs) |
| **Jina AI** | $0.02/1M | ★★★★ | `ureq` | Cost-optimized |
| **Mistral** | $0.10/1M | ★★★★ | `ureq` | RAG-optimized |
| **Together AI** | $0.008/1M | ★★★★ | `ureq` | Cheapest |
| **Ollama** | Free | ★★★★ | `ollama` + `async` | Local GPU + LLM |
## Advanced Features
### LightRAG (Dual-Level Retrieval)
```toml
[retrieval]
strategy = "hybrid"
enable_lightrag = true # 6000x token reduction!
```
### PageRank (Fast-GraphRAG)
```toml
[graph]
enable_pagerank = true # 27x performance boost
```
### RoGRAG (Logic Form Reasoning)
```rust
// Enable with feature flag: rograg
let answer = graphrag.ask_with_reasoning("Why did X cause Y?").await?;
```
### Intelligent Caching
```toml
[generation]
enable_caching = true # 80%+ hit rate, 6x cost reduction
```
## Pipeline Architecture
GraphRAG uses a configurable pipeline with different methods for each phase:
```
┌─────────────────────────────────────────────────────────────────────────┐
│ build_graph() │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ │
│ │ CHUNKING │ TextProcessor splits document into chunks │
│ │ (always runs) │ Configurable: chunk_size, chunk_overlap │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ ENTITY EXTRACTION │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │
│ │ │ Algorithmic │ │ Semantic │ │ Hybrid │ │ │
│ │ │ (pattern-based) │ │ (LLM-based) │ │ (both + fusion) │ │ │
│ │ │ ⚡ Fast │ │ 🎯 Accurate │ │ ⚖️ Balanced │ │ │
│ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ RELATIONSHIP EXTRACTION │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │
│ │ │ Co-occurrence │ │ LLM-based │ │ Gleaning │ │ │
│ │ │ entity proximity│ │ GraphRAG method │ │ multi-round LLM │ │ │
│ │ │ ⚡ Fast │ │ 🎯 Semantic │ │ 🔄 Iterative │ │ │
│ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │
│ │ Optional: config.graph.extract_relationships = true/false │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ GRAPH │ Entities + Relationships → KnowledgeGraph │
│ │ CONSTRUCTION │ Supports: PageRank, Community Detection │
│ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ ask() / query │
├─────────────────────────────────────────────────────────────────────────┤
│ ┌─────────────────┐ │
│ │ EMBEDDING │ Generated on-demand (lazy evaluation) │
│ │ GENERATION │ 8 providers: Ollama, OpenAI, HuggingFace, etc. │
│ └────────┬────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ RETRIEVAL STRATEGIES │ │
│ │ Vector │ BM25 │ PageRank │ Hybrid │ Adaptive │ LightRAG │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ ANSWER │ LLM synthesis (if Ollama enabled) │
│ │ GENERATION │ Or: concatenated search results │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
```
### Phase Configuration Quick Reference
| **1. Chunking** | `chunk_size`, `chunk_overlap` | `chunk_size = 1000` |
| **2. Entity Extraction** | `approach`, `entity_types`, `use_gleaning` | `approach = "hybrid"` |
| **3. Relationship Extraction** | `extract_relationships`, `use_gleaning` | `[graph] extract_relationships = true` |
| **4. Graph Construction** | `enable_pagerank`, `max_connections` | `[graph] enable_pagerank = true` |
| **5. Embedding** | `backend`, `dimension`, `model` | `[embeddings] backend = "ollama"` |
| **6. Retrieval** | `strategy`, `top_k` | `[retrieval] strategy = "hybrid"` |
| **7. Answer Generation** | `chat_model`, `temperature` | `[ollama] enabled = true` |
### Method Selection by Phase
| **Entity Extraction** | Algorithmic / Semantic / Hybrid | `approach = "algorithmic\|semantic\|hybrid"` |
| **Relationship Extraction** | Co-occurrence / LLM-based / Gleaning | `entities.use_gleaning = true\|false` |
| **Embedding** | Ollama / Hash / OpenAI / HuggingFace / 8 providers | `embeddings.backend = "ollama"` |
| **Retrieval** | Vector / BM25 / PageRank / Hybrid / Adaptive / LightRAG | `retrieval.strategy = "hybrid"` |
### Key Notes
- **Embedding is NOT part of `build_graph()`** - generated lazily during queries
- **Relationship extraction is optional** - controlled by `config.graph.extract_relationships`
- **Gleaning extracts entities AND relationships together** in multi-round LLM calls
- **See [PIPELINE_ARCHITECTURE.md](PIPELINE_ARCHITECTURE.md) for full parameter reference**
## Module Structure
```
graphrag-core/
├── src/
│ ├── builder/ # TypedBuilder with type-state pattern
│ ├── config/ # Hierarchical configuration (figment)
│ ├── core/ # Core traits, errors with suggestions
│ ├── embeddings/ # 8 embedding providers
│ ├── entity/ # LLM-based gleaning extraction
│ ├── graph/ # Knowledge graph construction
│ ├── retrieval/ # ExplainedAnswer, search strategies
│ └── templates/ # Sectoral configuration templates
└── examples/
```
## Testing
```bash
# Quick test with starter features
cargo test --features starter
# Full test suite
cargo test --all-features
# Test specific modules
cargo test --features starter builder::
cargo test --features starter retrieval::
```
## Documentation
- **[QUICKSTART.md](QUICKSTART.md)** - 5-minute getting started guide
- **[PIPELINE_ARCHITECTURE.md](PIPELINE_ARCHITECTURE.md)** - Pipeline phases and methods
- **[templates/README.md](templates/README.md)** - Sectoral template guide
- **[EMBEDDINGS_CONFIG.md](EMBEDDINGS_CONFIG.md)** - Embedding configuration
- **[ENTITY_EXTRACTION.md](ENTITY_EXTRACTION.md)** - LLM-based extraction guide
- **[OLLAMA_INTEGRATION.md](OLLAMA_INTEGRATION.md)** - Ollama setup guide
## Cross-Platform Support
- ✅ **Linux** - Full support with all features
- ✅ **macOS** - Full support with Metal GPU acceleration
- ✅ **Windows** - Full support with CUDA GPU acceleration
- ✅ **WASM** - Core functionality (use `wasm-bundle` feature)
## License
MIT License - see [../LICENSE](../LICENSE) for details.
---