graphrag-core 0.1.0

Core portable library for GraphRAG - works on native and WASM
Documentation

GraphRAG Core

The core library for GraphRAG-rs, providing portable functionality for both native and WASM deployments.

Overview

graphrag-core is the foundational library that powers GraphRAG-rs. It provides:

  • Embedding Generation: 8 provider backends (HuggingFace, OpenAI, Voyage AI, Cohere, Jina, Mistral, Together AI, Ollama)
  • Entity Extraction: TRUE LLM-based gleaning extraction with multi-round refinement (Microsoft GraphRAG-style)
  • Graph Construction: Incremental updates, PageRank, community detection
  • Retrieval Strategies: Vector, BM25, PageRank, hybrid, adaptive
  • Configuration System: Comprehensive TOML-based configuration
  • Cross-Platform: Works on native (Linux, macOS, Windows) and WASM

Quick Start

Installation

Add to your Cargo.toml:

[dependencies]
graphrag-core = { path = "../graphrag-core", features = ["huggingface-hub", "ureq"] }

Basic Usage

use graphrag_core::embeddings::huggingface::HuggingFaceEmbeddings;
use graphrag_core::embeddings::EmbeddingProvider;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create embedding provider
    let mut embeddings = HuggingFaceEmbeddings::new(
        "sentence-transformers/all-MiniLM-L6-v2",
        None,
    );

    // Initialize (downloads model if needed)
    embeddings.initialize().await?;

    // Generate embedding
    let embedding = embeddings.embed("Your text here").await?;
    println!("Generated {} dimensions", embedding.len());

    Ok(())
}

Features

Embedding Providers

GraphRAG Core supports 8 embedding backends via feature flags:

[features]
# Free, offline embedding models
huggingface-hub = ["hf-hub", "dirs"]

# API-based embedding providers
ureq = ["dep:ureq"]  # Enables OpenAI, Voyage, Cohere, Jina, Mistral, Together

# Local inference (coming soon)
neural-embeddings = ["candle-core"]

Supported Providers:

Provider Cost Quality Feature Flag Use Case
HuggingFace Free ★★★★ huggingface-hub Offline, 100+ models
OpenAI $0.13/1M ★★★★★ ureq Best quality
Voyage AI Medium ★★★★★ ureq Anthropic recommended
Cohere $0.10/1M ★★★★ ureq Multilingual (100+ langs)
Jina AI $0.02/1M ★★★★ ureq Cost-optimized
Mistral $0.10/1M ★★★★ ureq RAG-optimized
Together AI $0.008/1M ★★★★ ureq Cheapest
Ollama Free ★★★★ (via config) Local GPU

See EMBEDDINGS_CONFIG.md for detailed configuration.

Configuration System

GraphRAG Core uses TOML for configuration:

# config.toml

[embeddings]
backend = "huggingface"
model = "sentence-transformers/all-MiniLM-L6-v2"
dimension = 384
batch_size = 32
cache_dir = "~/.cache/huggingface"

[graph]
max_connections = 10
similarity_threshold = 0.8

[retrieval]
top_k = 10
search_algorithm = "cosine"

Load configuration:

use graphrag_core::config::Config;

let config = Config::from_toml_file("config.toml")?;

Examples

Embedding Providers

# Demo all providers
cargo run --example embeddings_demo --features "huggingface-hub,ureq"

# Config-based usage
cargo run --example embeddings_from_config --features "huggingface-hub,ureq"

# With API keys
OPENAI_API_KEY=sk-... \
cargo run --example embeddings_demo --features ureq

Entity Extraction (Real LLM-Based Gleaning)

GraphRAG Core implements TRUE LLM-based entity extraction with iterative refinement:

use graphrag_core::entity::gleaning_extractor::GleaningEntityExtractor;
use graphrag_core::entity::GleaningConfig;
use graphrag_core::ollama::OllamaClient;

// Create Ollama client
let ollama_client = OllamaClient::new(ollama_config);

// Configure gleaning (Microsoft GraphRAG-style)
let gleaning_config = GleaningConfig {
    max_gleaning_rounds: 4,  // Default: 4 rounds like Microsoft GraphRAG
    completion_threshold: 0.8,
    entity_confidence_threshold: 0.7,
    use_llm_completion_check: true,  // LLM judges completion
    entity_types: vec!["PERSON".to_string(), "ORGANIZATION".to_string(), "LOCATION".to_string()],
    temperature: 0.1,
    max_tokens: 1500,
};

// Create extractor
let extractor = GleaningEntityExtractor::new(ollama_client, gleaning_config);

// Extract with real LLM calls (takes 15-30 seconds per chunk per round)
let (entities, relationships) = extractor.extract_with_gleaning(chunk).await?;

Performance Expectations:

  • Small document (5-10 pages): 5-15 minutes
  • Medium document (50-100 pages): 30-60 minutes
  • Large document (500-1000 pages): 2-4 hours

This is REAL LLM processing with actual API calls, not pattern matching!

Graph Construction

use graphrag_core::graph::incremental::IncrementalGraph;

let mut graph = IncrementalGraph::new();
graph.add_document("document1", entities1)?;
graph.add_document("document2", entities2)?;  // Automatic deduplication!

Module Structure

graphrag-core/
├── src/
│   ├── embeddings/          # Embedding generation
│   │   ├── mod.rs           # EmbeddingProvider trait
│   │   ├── huggingface.rs   # HuggingFace Hub integration
│   │   ├── api_providers.rs # 6 API providers
│   │   ├── config.rs        # TOML configuration
│   │   └── README.md        # Technical documentation
│   ├── entity/              # Entity extraction
│   │   ├── mod.rs           # Entity types and base traits
│   │   ├── prompts.rs       # Microsoft GraphRAG-style LLM prompts
│   │   ├── llm_extractor.rs # Real LLM entity extraction with Ollama
│   │   ├── gleaning_extractor.rs  # Multi-round gleaning orchestrator
│   │   └── semantic_merging.rs    # Entity deduplication
│   ├── graph/               # Knowledge graph
│   │   ├── mod.rs           # Graph construction
│   │   ├── incremental.rs   # Incremental updates
│   │   └── pagerank.rs      # Fast-GraphRAG retrieval
│   ├── retrieval/           # Search strategies
│   │   ├── mod.rs           # Vector similarity
│   │   ├── bm25.rs          # Keyword search
│   │   ├── hybrid.rs        # Multi-strategy fusion
│   │   └── pagerank_retrieval.rs  # Graph-based search
│   ├── config/              # Configuration system
│   │   ├── mod.rs           # Main Config struct
│   │   └── toml_config.rs   # TOML parsing
│   └── core/                # Core traits and types
│       ├── error.rs         # Error types
│       ├── traits.rs        # Pluggable abstractions
│       └── registry.rs      # Component registry
└── examples/
    ├── embeddings_demo.rs           # Test all providers
    ├── embeddings_from_config.rs    # Config-based usage
    └── embeddings.toml              # Example configuration

Performance

Embedding Speed

Provider Latency Throughput GPU Support
HuggingFace (local) 50-100ms 10-20 chunks/s
ONNX Runtime Web 3-8ms 125-333 chunks/s ✅ WebGPU
Ollama 100-200ms 5-10 chunks/s ✅ CUDA/Metal
API Providers 50-200ms Varies ☁️ Cloud

Memory Usage

  • HuggingFace models: ~100-500MB (cached on disk)
  • Graph (10K entities): ~50MB
  • Embeddings cache: Configurable (default: 10K vectors)

Advanced Features

LightRAG (Dual-Level Retrieval)

[retrieval]
strategy = "hybrid"
enable_lightrag = true  # 6000x token reduction!

PageRank (Fast-GraphRAG)

[graph]
enable_pagerank = true  # 27x performance boost

Intelligent Caching

[generation]
enable_caching = true  # 80%+ hit rate, 6x cost reduction

API Providers Setup

Environment Variables

# API Keys (recommended)
export OPENAI_API_KEY="sk-..."
export VOYAGE_API_KEY="pa-..."
export COHERE_API_KEY="..."
export JINA_API_KEY="jina_..."
export MISTRAL_API_KEY="..."
export TOGETHER_API_KEY="..."

# HuggingFace cache
export HF_HOME="~/.cache/huggingface"

Configuration File

[embeddings]
backend = "openai"
model = "text-embedding-3-small"
# api_key = "sk-..."  # Or use OPENAI_API_KEY env var
dimension = 1536
batch_size = 100

Testing

# Run all tests
cargo test --all-features

# Test embeddings
cargo test --features "huggingface-hub,ureq" embeddings

# Test with model downloads (slow)
ENABLE_DOWNLOAD_TESTS=1 cargo test --features huggingface-hub

Documentation

Feature Flags

[features]
# Embedding providers
huggingface-hub = ["hf-hub", "dirs"]  # Free, offline models
ureq = ["dep:ureq"]                   # API providers

# Graph backends
memory-storage = []                    # In-memory (default)
persistent-storage = ["lancedb"]       # LanceDB embedded

# Processing
parallel-processing = ["rayon"]        # Multi-threading
caching = ["moka"]                     # LLM response cache

# Advanced features
incremental = []                       # Zero-downtime updates
pagerank = []                          # Fast-GraphRAG
lightrag = []                          # Dual-level retrieval
rograg = []                            # Query decomposition

# LLM integrations
ollama = []                            # Ollama local models
function-calling = []                  # Function calling support

Cross-Platform Support

GraphRAG Core is designed to work across platforms:

  • Linux - Full support with all features
  • macOS - Full support with Metal GPU acceleration
  • Windows - Full support with CUDA GPU acceleration
  • WASM - Core functionality (see graphrag-wasm crate)

Contributing

See ../CONTRIBUTING.md for contribution guidelines.

License

MIT License - see ../LICENSE for details.


Part of the GraphRAG-rs project | Main README | Architecture | How It Works