adk-rag

Give your AI agents a knowledge base. adk-rag adds Retrieval-Augmented Generation (RAG) to ADK-Rust so your agents can search documents and answer questions using your own data.

ADK RAG

The adk-rag crate provides Retrieval-Augmented Generation capabilities for the ADK-Rust workspace. It offers a modular, trait-based architecture for document chunking, embedding generation, vector storage, similarity search, reranking, and agentic retrieval. The crate follows the ADK-Rust conventions of feature-gated backends, async-trait interfaces, and builder-pattern configuration. It integrates with existing ADK crates (adk-gemini for embeddings, adk-core for the Tool trait) and supports multiple vector store backends (in-memory, Qdrant, LanceDB, pgvector, SurrealDB).

What is RAG?

RAG stands for Retrieval-Augmented Generation. Instead of relying only on what an LLM was trained on, RAG lets your agent look up relevant information from your documents before answering:

Ingest — Documents are split into chunks, converted to vector embeddings, and stored
Query — A user question is embedded and matched against stored chunks
Generate — The most relevant chunks are passed to the LLM as context

Quick Start

The fastest way to get a working RAG pipeline. Uses Gemini for embeddings (free API key from Google AI Studio).

[dependencies]
adk-rag = { version = "1.0.0", features = ["gemini"] }
tokio = { version = "1", features = ["full"] }

use std::sync::Arc;
use adk_rag::*;

#[tokio::main]
async fn main() -> std::result::Result<(), Box<dyn std::error::Error>> {
    let api_key = std::env::var("GOOGLE_API_KEY")?;

    let pipeline = RagPipeline::builder()
        .config(RagConfig::default())
        .embedding_provider(Arc::new(GeminiEmbeddingProvider::new(&api_key)?))
        .vector_store(Arc::new(InMemoryVectorStore::new()))
        .chunker(Arc::new(RecursiveChunker::new(512, 100)))
        .build()?;

    pipeline.create_collection("docs").await?;

    pipeline.ingest("docs", &Document {
        id: "intro".into(),
        text: "Rust is a systems programming language focused on safety and speed.".into(),
        metadata: Default::default(),
        source_uri: None,
    }).await?;

    let results = pipeline.query("docs", "safe programming language").await?;
    for r in &results {
        println!("[score: {:.3}] {}", r.score, r.chunk.text);
    }
    Ok(())
}

GOOGLE_API_KEY=your-key-here cargo run

Agent with RAG Tool

The practical use case — an agent that searches your knowledge base to answer questions. The agent decides when to call rag_search and uses the retrieved context to generate answers.

[dependencies]
adk-rag = { version = "1.0.0", features = ["gemini"] }
adk-agent = "1.0.0"
adk-model = "1.0.0"
adk-cli = "1.0.0"
tokio = { version = "1", features = ["full"] }

use std::sync::Arc;
use adk_agent::LlmAgentBuilder;
use adk_model::GeminiModel;
use adk_rag::*;

#[tokio::main]
async fn main() -> std::result::Result<(), Box<dyn std::error::Error>> {
    let api_key = std::env::var("GOOGLE_API_KEY")?;

    // Build the RAG pipeline
    let pipeline = Arc::new(
        RagPipeline::builder()
            .config(RagConfig::builder().chunk_size(300).chunk_overlap(50).top_k(3).build()?)
            .embedding_provider(Arc::new(GeminiEmbeddingProvider::new(&api_key)?))
            .vector_store(Arc::new(InMemoryVectorStore::new()))
            .chunker(Arc::new(RecursiveChunker::new(300, 50)))
            .build()?,
    );

    // Ingest your documents
    pipeline.create_collection("kb").await?;
    pipeline.ingest("kb", &Document {
        id: "faq".into(),
        text: "Our return policy allows returns within 30 days with a receipt.".into(),
        metadata: Default::default(),
        source_uri: None,
    }).await?;

    // Create an agent with RAG as a tool
    let agent = LlmAgentBuilder::new("support_agent")
        .instruction("Answer questions using the rag_search tool. Cite your sources.")
        .model(Arc::new(GeminiModel::new(&api_key, "gemini-2.5-flash")?))
        .tool(Arc::new(RagTool::new(pipeline, "kb")))
        .build()?;

    // Interactive chat
    adk_cli::console::run_console(Arc::new(agent), "app".into(), "user1".into()).await?;
    Ok(())
}

The agent will automatically call rag_search when it needs information from your knowledge base.

How It Works

adk-rag is built from four pluggable components:

Documents → [Chunker] → [EmbeddingProvider] → [VectorStore]
                                                     ↓
Query → [EmbeddingProvider] → [VectorStore search] → [Reranker] → Results

Component	What it does	Built-in options
Chunker	Splits documents into smaller pieces	`FixedSizeChunker`, `RecursiveChunker`, `MarkdownChunker`
EmbeddingProvider	Converts text to vector embeddings	`GeminiEmbeddingProvider`¹, `OpenAIEmbeddingProvider`²
VectorStore	Stores and searches embeddings	`InMemoryVectorStore`, `QdrantVectorStore`³, `LanceDBVectorStore`⁴, `PgVectorStore`⁵, `SurrealVectorStore`⁶
Reranker	Re-scores results after search	`NoOpReranker` (default), or write your own

¹ gemini feature ² openai feature ³ qdrant feature ⁴ lancedb feature ⁵ pgvector feature ⁶ surrealdb feature

The RagPipeline wires these together. The RagTool wraps the pipeline as an adk_core::Tool so any ADK agent can call it.

Embedding Providers

Gemini (recommended)

Uses Google's gemini-embedding-001 model (3072 dimensions). Free tier available.

adk-rag = { version = "1.0.0", features = ["gemini"] }

let provider = GeminiEmbeddingProvider::new(&api_key)?;

OpenAI

Uses text-embedding-3-small (1536 dimensions) by default. Supports dimension truncation via Matryoshka.

adk-rag = { version = "1.0.0", features = ["openai"] }

// Default model
let provider = OpenAIEmbeddingProvider::new("sk-...")?;

// Or read from OPENAI_API_KEY env var
let provider = OpenAIEmbeddingProvider::from_env()?;

// With a different model and custom dimensions
let provider = OpenAIEmbeddingProvider::new("sk-...")?
    .with_model("text-embedding-3-large")
    .with_dimensions(256);

Custom Embedding Provider

Implement the EmbeddingProvider trait to use any embedding model — a local model, a different API, or a mock for testing.

use async_trait::async_trait;
use adk_rag::{EmbeddingProvider, Result};

struct MyEmbedder { /* your client */ }

#[async_trait]
impl EmbeddingProvider for MyEmbedder {
    async fn embed(&self, text: &str) -> Result<Vec<f32>> {
        // Call your embedding model here
        todo!()
    }

    fn dimensions(&self) -> usize {
        384 // Return your model's output dimensions
    }
}

Add async-trait = "0.1" to your Cargo.toml when implementing traits.

Vector Stores

InMemoryVectorStore (default)

No external dependencies. Good for development, testing, and small datasets. Data is lost when the process exits.

let store = InMemoryVectorStore::new();

Qdrant

Production-ready vector database with filtering, snapshots, and clustering.

adk-rag = { version = "1.0.0", features = ["qdrant"] }

let store = QdrantVectorStore::new("http://localhost:6334").await?;

LanceDB

Embedded vector database with no server required. Data persists to disk.

adk-rag = { version = "1.0.0", features = ["lancedb"] }

Requires protoc installed: brew install protobuf (macOS), apt install protobuf-compiler (Ubuntu).

let store = LanceDBVectorStore::new("/tmp/my-vectors").await?;

pgvector (PostgreSQL)

Use your existing PostgreSQL database for vector search.

adk-rag = { version = "1.0.0", features = ["pgvector"] }

let store = PgVectorStore::new("postgres://user:pass@localhost/mydb").await?;

SurrealDB

Embedded or remote multi-model database with built-in vector search.

adk-rag = { version = "1.0.0", features = ["surrealdb"] }

let store = SurrealVectorStore::new_memory().await?;
// or
let store = SurrealVectorStore::new_rocksdb("/tmp/surreal-data").await?;

Choosing a Chunker

Chunker	Best for	How it splits
`FixedSizeChunker`	General text, logs	Every N characters with overlap
`RecursiveChunker`	Articles, docs, code	Paragraphs → sentences → words (natural boundaries)
`MarkdownChunker`	Markdown files, READMEs	By headers, preserving section hierarchy in metadata

// Fixed: 512 chars per chunk, 100 char overlap
let chunker = FixedSizeChunker::new(512, 100);

// Recursive: tries paragraph breaks first, then sentences
let chunker = RecursiveChunker::new(512, 100);

// Markdown: splits by headers, stores header path in metadata
let chunker = MarkdownChunker::new(512, 100);

Configuration

let config = RagConfig::builder()
    .chunk_size(256)            // max characters per chunk (default: 512)
    .chunk_overlap(50)          // overlap between chunks (default: 100)
    .top_k(5)                   // number of results to return (default: 10)
    .similarity_threshold(0.5)  // minimum score to include (default: 0.0)
    .build()?;

chunk_size — Smaller chunks are more precise but may lose context. 200–500 is a good range.
chunk_overlap — Prevents information loss at chunk boundaries. 10–20% of chunk_size works well.
top_k — More results give the LLM more context but increase token usage.
similarity_threshold — Filter out low-quality matches. 0.0 returns everything, 0.3–0.7 keeps strong matches only.

Writing a Custom Reranker

The default NoOpReranker passes results through unchanged. Write your own to improve precision:

[dependencies]
adk-rag = { version = "1.0.0", features = ["gemini"] }
async-trait = "0.1"
tokio = { version = "1", features = ["full"] }

use async_trait::async_trait;
use adk_rag::{Reranker, SearchResult};

struct KeywordBoostReranker {
    boost: f32,
}

#[async_trait]
impl Reranker for KeywordBoostReranker {
    async fn rerank(
        &self,
        query: &str,
        mut results: Vec<SearchResult>,
    ) -> adk_rag::Result<Vec<SearchResult>> {
        let keywords: Vec<String> = query.split_whitespace()
            .filter(|w| w.len() > 3)
            .map(|w| w.to_lowercase())
            .collect();

        for result in &mut results {
            let text = result.chunk.text.to_lowercase();
            let hits = keywords.iter().filter(|kw| text.contains(kw.as_str())).count();
            result.score += hits as f32 * self.boost;
        }

        results.sort_by(|a, b| b.score.partial_cmp(&a.score).unwrap_or(std::cmp::Ordering::Equal));
        Ok(results)
    }
}

Use it in the pipeline:

let pipeline = RagPipeline::builder()
    .config(config)
    .embedding_provider(embedder)
    .vector_store(store)
    .chunker(chunker)
    .reranker(Arc::new(KeywordBoostReranker { boost: 0.1 }))
    .build()?;

Feature Flags

# Core only (in-memory store, all chunkers, no external deps)
adk-rag = "1.0.0"

# With Gemini embeddings (recommended)
adk-rag = { version = "1.0.0", features = ["gemini"] }

# With OpenAI embeddings
adk-rag = { version = "1.0.0", features = ["openai"] }

# With a persistent vector store
adk-rag = { version = "1.0.0", features = ["gemini", "qdrant"] }

# Everything
adk-rag = { version = "1.0.0", features = ["full"] }

Feature	Enables	Extra dependency
(default)	Core traits, `InMemoryVectorStore`, all chunkers	none
`gemini`	`GeminiEmbeddingProvider`	`adk-gemini`
`openai`	`OpenAIEmbeddingProvider`	`reqwest`
`qdrant`	`QdrantVectorStore`	`qdrant-client`
`lancedb`	`LanceDBVectorStore`	`lancedb`, `arrow`
`pgvector`	`PgVectorStore`	`sqlx`
`surrealdb`	`SurrealVectorStore`	`surrealdb`
`full`	All of the above	all

Testing Without API Keys

For unit tests or CI where you don't have API keys, implement a deterministic mock embedder:

use async_trait::async_trait;
use adk_rag::{EmbeddingProvider, Result};

struct MockEmbedder;

#[async_trait]
impl EmbeddingProvider for MockEmbedder {
    async fn embed(&self, text: &str) -> Result<Vec<f32>> {
        // Deterministic hash-based embedding for testing.
        // NOT suitable for production — use a real provider.
        let hash = text.bytes().fold(0u64, |acc, b| acc.wrapping_mul(31).wrapping_add(b as u64));
        let mut v = vec![0.0f32; 64];
        for (i, x) in v.iter_mut().enumerate() {
            *x = ((hash.wrapping_add(i as u64)) as f32).sin();
        }
        let norm: f32 = v.iter().map(|x| x * x).sum::<f32>().sqrt();
        if norm > 0.0 { v.iter_mut().for_each(|x| *x /= norm); }
        Ok(v)
    }
    fn dimensions(&self) -> usize { 64 }
}

This produces stable vectors so your tests are reproducible, but the similarity scores won't be meaningful. Use it for testing pipeline wiring, not search quality.

Examples

Run from the ADK-Rust workspace root:

Example	What it shows	API key?	Command
`rag_basic`	Pipeline with mock embeddings	No	`cargo run --example rag_basic --features rag`
`rag_markdown`	Markdown chunking with header metadata	No	`cargo run --example rag_markdown --features rag`
`rag_agent`	LlmAgent with RagTool	Yes	`cargo run --example rag_agent --features rag-gemini`
`rag_recursive`	Codebase Q&A with RecursiveChunker	Yes	`cargo run --example rag_recursive --features rag-gemini`
`rag_reranker`	Custom keyword reranker	Yes	`cargo run --example rag_reranker --features rag-gemini`
`rag_multi_collection`	Multi-collection search	Yes	`cargo run --example rag_multi_collection --features rag-gemini`
`rag_surrealdb`	SurrealDB vector store	Yes	`cargo run --example rag_surrealdb --features rag-surrealdb`

For examples that need an API key, set GOOGLE_API_KEY in your environment or .env file.

API Reference

Core Types

// A document to ingest
Document {
    id: String,
    text: String,
    metadata: HashMap<String, String>,
    source_uri: Option<String>,
}

// A chunk produced by a Chunker (with embedding attached after processing)
Chunk {
    id: String,
    text: String,
    embedding: Vec<f32>,
    metadata: HashMap<String, String>,
    document_id: String,
}

// A search result with relevance score
SearchResult {
    chunk: Chunk,
    score: f32,
}

Pipeline Methods

let pipeline = RagPipeline::builder()
    .config(config)
    .embedding_provider(embedder)
    .vector_store(store)
    .chunker(chunker)
    .reranker(reranker)  // optional
    .build()?;

// Collection management
pipeline.create_collection("name").await?;
pipeline.delete_collection("name").await?;

// Ingestion (chunk → embed → store)
let chunks = pipeline.ingest("collection", &document).await?;
let chunks = pipeline.ingest_batch("collection", &documents).await?;

// Query (embed → search → rerank → filter)
let results = pipeline.query("collection", "search text").await?;

RagTool

Wraps a pipeline as an adk_core::Tool for agent use:

let tool = RagTool::new(pipeline, "default_collection");

// The agent calls it with JSON:
// { "query": "How do I reset my password?" }
// { "query": "pricing info", "collection": "faq", "top_k": 5 }

License

Apache-2.0

Part of ADK-Rust

This crate is part of the ADK-Rust framework for building AI agents in Rust.

adk-rag 1.0.0