# RAG
A Rust library and CLI for Retrieval-Augmented Generation (RAG) that combines vector similarity, graph structure, and search-style retrieval rather than embeddings alone. Dense vectors cover semantic match, a knowledge graph encodes entities and relations, and configurable top-k plus metadata filtering make retrieval behave like a search layer over your corpus.
Project docs: [SPEC.md](SPEC.md) (scope and requirements), [ARCHITECTURE.md](ARCHITECTURE.md) (modules and data flow), [TODO.md](TODO.md) (backlog).
## Features
- Pure Rust implementation with async/await support
- Vector RAG: multiple embedding backends (OpenAI, Ollama), pluggable indexes and distance metrics (cosine, Euclidean, dot product, Manhattan)
- Graph RAG: graph store for nodes and edges, entity extraction hooks, and a `GraphRagEngine` that ties documents, vectors, and the graph together
- In-memory vector stores with parallel batch search (`InMemoryVectorStore`, `MinimalVectorDB`)
- Search-oriented retrieval: configurable top-k, score-ranked results, and metadata filtering over stored chunks
- Ingestion helpers: `Source` implementations for PDF, codebase trees, and wiki-style URLs (`ingestion` module)
- Multiple text chunking strategies (fixed-size, paragraph, sentence)
- CLI for ingest and query with **persistent state** (`RAG_STATE_DIR`, default `.rag`): vector, **hybrid-query (BM25 + embeddings)**, and **graph** subcommands
- MCP server (`rag-mcp`) with vector tools (`rag_*`) and graph or hybrid tools (`graph_*`)
- Library API suitable for custom pipelines
## Installation
### From source
```bash
cargo install --path .
```
### As a library
Add to your `Cargo.toml`:
```toml
[dependencies]
rag = { git = "https://github.com/yingkitw/rag" }
```
## Quick Start
State for the CLI lives under **`RAG_STATE_DIR`** (default `.rag`): `vectors.json`, optional `graph.json` and `graph_rag.json`.
### CLI Usage
```bash
# Set your API key (OpenAI) or use Ollama
export OPENAI_API_KEY="your-api-key-here"
# Optional when using Ollama for CLI or rag-mcp-server:
export OLLAMA_MODEL="nomic-embed-text"
# Add a document (persists chunks to $RAG_STATE_DIR/vectors.json)
rag add --file document.txt --source "my-docs"
# Vector-only query
rag query --query "What is Rust?" --top-k 3
# Vector + BM25 hybrid (alpha = vector weight in [0,1])
rag hybrid-query --query "What is Rust?" --top-k 5 --alpha 0.65
# Graph stats from a saved graph file
rag graph-stats
# Build GraphRAG snapshot from a file (writes graph_rag.json + graph.json)
rag graph-build --file document.txt --source "my-docs"
# Query using saved GraphRAG snapshot
rag graph-hybrid-query --query "Who is mentioned?" --top-k 5
# List documents
rag list --limit 10 --offset 0
# Count documents
rag count
```
### Library Usage
```rust
use rag::{
chunker::FixedSizeChunker,
embeddings::OpenAIEmbeddingModel,
retriever::Retriever,
vector_store::MinimalVectorDB,
};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create embedding model and vector store
let embedding_model = OpenAIEmbeddingModel::new("your-api-key".to_string());
let vector_store = MinimalVectorDB::new();
// Create retriever
let retriever = Retriever::new(embedding_model, vector_store)
.with_chunker(Box::new(FixedSizeChunker::new(500, 50)))
.with_top_k(5);
// Add documents
retriever.add_document("Your document content here".to_string()).await?;
// Retrieve relevant chunks
let results = retriever.retrieve("Your query here").await?;
for (i, content) in results.iter().enumerate() {
println!("{}. {}", i + 1, content);
}
Ok(())
}
```
## Examples
See the `examples/` directory, for example:
```bash
cargo run --example simple_rag
cargo run --example graph_store_basic
cargo run --example graph_rag_example
cargo run --example ingest_fixture_rag
cargo run --example ingest_pdf
cargo run --example ingest_codebase
cargo run --example ingest_wiki
cargo run --example mcp_example
```
## Configuration
### Environment Variables
- `OPENAI_API_KEY`: Your OpenAI API key (optional; if unset, embeddings use Ollama)
- `OLLAMA_URL`: Ollama server URL (default: `http://localhost:11434`)
- `OLLAMA_MODEL`: Embedding model when using **Ollama** (CLI, `rag-mcp-server`, and examples; default: `nomic-embed-text`)
### MCP server
Run the stdio MCP server (for clients that spawn the process):
```bash
export OPENAI_API_KEY="..." # or rely on Ollama + OLLAMA_URL / OLLAMA_MODEL
cargo run --bin rag-mcp
```
Vector tools: `rag_add_document`, `rag_query`, `rag_list_documents`, `rag_count`. Graph and hybrid tools: `graph_build`, `graph_query`, `graph_get_entity`, `graph_get_neighbors`, `graph_info`, `graph_communities`.
### Chunking Strategies
- `FixedSizeChunker`: Splits text into chunks of fixed size with overlap
- `ParagraphChunker`: Splits text by paragraphs (double newlines)
- `SentenceChunker`: Splits text by sentences
### Embedding Models
#### OpenAI
```rust
let model = OpenAIEmbeddingModel::new("your-api-key".to_string());
let model = OpenAIEmbeddingModel::with_model("your-api-key".to_string(), "text-embedding-ada-002".to_string());
```
#### Ollama
```rust
let model = OllamaEmbeddingModel::new("nomic-embed-text".to_string());
let model = OllamaEmbeddingModel::new("nomic-embed-text".to_string())
.with_base_url("http://localhost:11434".to_string());
```
## API Reference
### Core Types
- `EmbeddingModel`: Trait for embedding models
- `VectorStore`: Trait for vector storage backends
- `Retriever`: Main interface for vector-centric RAG operations
- `GraphStore`, `GraphNode`, `GraphEdge`: Graph storage and structure for graph-augmented retrieval
- `GraphRagEngine`, `EntityExtractor`: Orchestration and entity linking for graph RAG
- `Source`, `ExtractedDocument`: Ingestion from PDF, codebase, wiki, and other sources
- `Document`: Represents a stored document with content, metadata, and optional embedding
- `TextChunker`: Trait for text chunking strategies
- `RagMcpServer`: MCP tool router combining vector store and graph (see `mcp` module)
### Retriever Methods
- `add_document(content)`: Add a single document
- `add_document_with_metadata(content, metadata)`: Add a document with metadata
- `retrieve(query)`: Retrieve relevant chunks
- `retrieve_with_scores(query)`: Retrieve chunks with similarity scores
- `retrieve_filtered(query, metadata_filter)`: Retrieve with metadata filtering
## Development
Run tests:
```bash
cargo test
```
Run examples:
```bash
cargo run --example simple_rag
cargo run --example graph_store_basic
cargo run --example graph_rag_example
cargo run --example ingest_fixture_rag
```
## License
Apache-2.0
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.