rag 0.1.1

A Rust library and CLI for Retrieval-Augmented Generation
Documentation

RAG

A Rust library and CLI for Retrieval-Augmented Generation (RAG) that combines vector similarity, graph structure, and search-style retrieval rather than embeddings alone. Dense vectors cover semantic match, a knowledge graph encodes entities and relations, and configurable top-k plus metadata filtering make retrieval behave like a search layer over your corpus.

Project docs: SPEC.md (scope and requirements), ARCHITECTURE.md (modules and data flow), TODO.md (backlog).

Features

  • Pure Rust implementation with async/await support
  • Vector RAG: multiple embedding backends (OpenAI, Ollama), pluggable indexes and distance metrics (cosine, Euclidean, dot product, Manhattan)
  • Graph RAG: graph store for nodes and edges, entity extraction hooks, and a GraphRagEngine that ties documents, vectors, and the graph together
  • In-memory vector stores with parallel batch search (InMemoryVectorStore, MinimalVectorDB)
  • Search-oriented retrieval: configurable top-k, score-ranked results, and metadata filtering over stored chunks
  • Ingestion helpers: Source implementations for PDF, codebase trees, and wiki-style URLs (ingestion module)
  • Multiple text chunking strategies (fixed-size, paragraph, sentence)
  • CLI for ingest and query with persistent state (RAG_STATE_DIR, default .rag): vector, hybrid-query (BM25 + embeddings), and graph subcommands
  • MCP server (rag-mcp) with vector tools (rag_*) and graph or hybrid tools (graph_*)
  • Library API suitable for custom pipelines

Installation

From source

cargo install --path .

As a library

Add to your Cargo.toml:

[dependencies]
rag = { git = "https://github.com/yingkitw/rag" }

Quick Start

State for the CLI lives under RAG_STATE_DIR (default .rag): vectors.json, optional graph.json and graph_rag.json.

CLI Usage

# Set your API key (OpenAI) or use Ollama
export OPENAI_API_KEY="your-api-key-here"
# Optional when using Ollama for CLI or rag-mcp-server:
export OLLAMA_MODEL="nomic-embed-text"

# Add a document (persists chunks to $RAG_STATE_DIR/vectors.json)
rag add --file document.txt --source "my-docs"

# Vector-only query
rag query --query "What is Rust?" --top-k 3

# Vector + BM25 hybrid (alpha = vector weight in [0,1])
rag hybrid-query --query "What is Rust?" --top-k 5 --alpha 0.65

# Graph stats from a saved graph file
rag graph-stats

# Build GraphRAG snapshot from a file (writes graph_rag.json + graph.json)
rag graph-build --file document.txt --source "my-docs"

# Query using saved GraphRAG snapshot
rag graph-hybrid-query --query "Who is mentioned?" --top-k 5

# List documents
rag list --limit 10 --offset 0

# Count documents
rag count

Library Usage

use rag::{
    chunker::FixedSizeChunker,
    embeddings::OpenAIEmbeddingModel,
    retriever::Retriever,
    vector_store::MinimalVectorDB,
};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create embedding model and vector store
    let embedding_model = OpenAIEmbeddingModel::new("your-api-key".to_string());
    let vector_store = MinimalVectorDB::new();
    
    // Create retriever
    let retriever = Retriever::new(embedding_model, vector_store)
        .with_chunker(Box::new(FixedSizeChunker::new(500, 50)))
        .with_top_k(5);
    
    // Add documents
    retriever.add_document("Your document content here".to_string()).await?;
    
    // Retrieve relevant chunks
    let results = retriever.retrieve("Your query here").await?;
    
    for (i, content) in results.iter().enumerate() {
        println!("{}. {}", i + 1, content);
    }
    
    Ok(())
}

Examples

See the examples/ directory, for example:

cargo run --example simple_rag
cargo run --example graph_store_basic
cargo run --example graph_rag_example
cargo run --example ingest_fixture_rag
cargo run --example ingest_pdf
cargo run --example ingest_codebase
cargo run --example ingest_wiki
cargo run --example mcp_example

Configuration

Environment Variables

  • OPENAI_API_KEY: Your OpenAI API key (optional; if unset, embeddings use Ollama)
  • OLLAMA_URL: Ollama server URL (default: http://localhost:11434)
  • OLLAMA_MODEL: Embedding model when using Ollama (CLI, rag-mcp-server, and examples; default: nomic-embed-text)

MCP server

Run the stdio MCP server (for clients that spawn the process):

export OPENAI_API_KEY="..."   # or rely on Ollama + OLLAMA_URL / OLLAMA_MODEL
cargo run --bin rag-mcp

Vector tools: rag_add_document, rag_query, rag_list_documents, rag_count. Graph and hybrid tools: graph_build, graph_query, graph_get_entity, graph_get_neighbors, graph_info, graph_communities.

Chunking Strategies

  • FixedSizeChunker: Splits text into chunks of fixed size with overlap
  • ParagraphChunker: Splits text by paragraphs (double newlines)
  • SentenceChunker: Splits text by sentences

Embedding Models

OpenAI

let model = OpenAIEmbeddingModel::new("your-api-key".to_string());
let model = OpenAIEmbeddingModel::with_model("your-api-key".to_string(), "text-embedding-ada-002".to_string());

Ollama

let model = OllamaEmbeddingModel::new("nomic-embed-text".to_string());
let model = OllamaEmbeddingModel::new("nomic-embed-text".to_string())
    .with_base_url("http://localhost:11434".to_string());

API Reference

Core Types

  • EmbeddingModel: Trait for embedding models
  • VectorStore: Trait for vector storage backends
  • Retriever: Main interface for vector-centric RAG operations
  • GraphStore, GraphNode, GraphEdge: Graph storage and structure for graph-augmented retrieval
  • GraphRagEngine, EntityExtractor: Orchestration and entity linking for graph RAG
  • Source, ExtractedDocument: Ingestion from PDF, codebase, wiki, and other sources
  • Document: Represents a stored document with content, metadata, and optional embedding
  • TextChunker: Trait for text chunking strategies
  • RagMcpServer: MCP tool router combining vector store and graph (see mcp module)

Retriever Methods

  • add_document(content): Add a single document
  • add_document_with_metadata(content, metadata): Add a document with metadata
  • retrieve(query): Retrieve relevant chunks
  • retrieve_with_scores(query): Retrieve chunks with similarity scores
  • retrieve_filtered(query, metadata_filter): Retrieve with metadata filtering

Development

Run tests:

cargo test

Run examples:

cargo run --example simple_rag
cargo run --example graph_store_basic
cargo run --example graph_rag_example
cargo run --example ingest_fixture_rag

License

Apache-2.0

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.