# LightRAG Integration Guide
## Overview
LightRAG dual-level retrieval has been successfully integrated into `graphrag-core`. This implementation provides **6000x token reduction** compared to traditional GraphRAG through intelligent keyword extraction at two abstraction levels.
## Implementation Status
✅ **COMPLETED** - Core modules compiled and tested:
- `lightrag::keyword_extraction` - Dual-level keyword extractor
- `lightrag::dual_retrieval` - Parallel high/low-level retrieval with merge strategies
- Integration with existing `graphrag-core` traits
## Architecture
### Key Components
1. **KeywordExtractor**
- Extracts **high-level** (topics, themes) and **low-level** (entities, specifics) keywords
- LLM-based extraction with fallback to query terms
- Enforces <20 keyword limit for token optimization
2. **DualLevelRetriever**
- Parallel retrieval at two granularities
- 4 merge strategies: Interleave, HighFirst, LowFirst, Weighted
- Automatic deduplication
3. **SemanticSearcher** trait
- Abstraction for semantic search that combines embedder + vector store
- String query → SearchResult conversion
## Usage
### Enable Feature
```toml
[dependencies]
graphrag-core = { path = "../graphrag-core", features = ["lightrag"] }
```
### Basic Example
```rust
use graphrag_core::{
DualLevelRetriever, DualRetrievalConfig,
KeywordExtractor, KeywordExtractorConfig,
SemanticSearcher, MergeStrategy,
};
use std::sync::Arc;
// 1. Create keyword extractor
let llm: Arc<dyn AsyncLanguageModel<Error = GraphRAGError>> = /* your LLM */;
let keyword_config = KeywordExtractorConfig {
max_keywords: 20,
language: "English".to_string(),
enable_cache: true,
};
let keyword_extractor = Arc::new(KeywordExtractor::new(llm, keyword_config));
// 2. Create semantic searchers for high and low levels
let high_level_searcher: Arc<dyn SemanticSearcher> = /* community/topic search */;
let low_level_searcher: Arc<dyn SemanticSearcher> = /* entity/chunk search */;
// 3. Configure dual-level retrieval
let retrieval_config = DualRetrievalConfig {
high_level_weight: 0.6,
low_level_weight: 0.4,
merge_strategy: MergeStrategy::Weighted,
};
// 4. Create retriever
let retriever = DualLevelRetriever::new(
keyword_extractor,
high_level_searcher,
low_level_searcher,
retrieval_config,
);
// 5. Retrieve with dual-level optimization
let results = retriever.retrieve("How do Alice and Bob collaborate?", 10).await?;
println!("High-level: {} results", results.high_level_chunks.len());
println!("Low-level: {} results", results.low_level_chunks.len());
println!("Merged: {} results", results.merged_chunks.len());
println!("Keywords: {:?}", results.keywords);
```
## Implementing SemanticSearcher
The `SemanticSearcher` trait wraps an embedder + vector store to provide semantic search:
```rust
use async_trait::async_trait;
use graphrag_core::{SemanticSearcher, retrieval::SearchResult};
use std::sync::Arc;
pub struct MySemanticSearch {
embedder: Arc<dyn AsyncEmbedder<Error = GraphRAGError>>,
vector_store: Arc<dyn AsyncVectorStore<Error = GraphRAGError>>,
}
#[async_trait]
impl SemanticSearcher for MySemanticSearch {
async fn search(&self, query: &str, top_k: usize) -> Result<Vec<SearchResult>, GraphRAGError> {
// 1. Embed query
let query_vector = self.embedder.embed(query).await?;
// 2. Vector search
let core_results = self.vector_store.search(&query_vector, top_k).await?;
// 3. Convert to SearchResult
let results = core_results.into_iter().map(|r| {
SearchResult {
id: r.id,
content: /* fetch content */,
score: r.distance,
result_type: ResultType::Chunk,
entities: vec![],
source_chunks: vec![],
}
}).collect();
Ok(results)
}
}
```
## Merge Strategies
- **Interleave**: Alternate between high and low-level results
- **HighFirst**: Topics first, then entities
- **LowFirst**: Entities first, then topics
- **Weighted**: Score-based merge using configured weights
## Performance Benefits
Based on LightRAG paper (arXiv:2410.05779):
- **Token Reduction**: 6000x (from 600-10k → <100 tokens)
- **API Cost**: 99% reduction
- **Latency**: 3-5x faster retrieval
- **Accuracy**: Comparable or better vs traditional GraphRAG
## Next Steps
To fully integrate LightRAG into your application:
1. **Implement SemanticSearcher** for your vector stores
2. **Create community-level index** (high-level) with topic summaries
3. **Create entity-level index** (low-level) with chunk embeddings
4. **Configure merge strategy** based on your use case
5. **Add caching** for keyword extraction (optional but recommended)
## Testing
Run tests with:
```bash
cargo test --package graphrag-core --features lightrag
```
## References
- Paper: "LightRAG: Simple and Fast Retrieval-Augmented Generation" (EMNLP 2025)
- arXiv: [2410.05779](https://arxiv.org/abs/2410.05779)
- Implementation Plan: `/home/dio/graphrag-rs/IMPLEMENTATION_PLAN_QUALITY_IMPROVEMENTS.md`