quorumrag 0.1.0

Quorum-based retrieval-augmented generation: fuse multiple retrievers and keep only the evidence they agree on.
Documentation

-----------------------------------------------------

➤ QuorumRAG

Multi-Retriever Retrieval-Augmented Generation via Quorum Consensus

Query → Multi-Retriever Ensemble → RRF Scoring → Quorum Filtering → Evidence Clustering → LLM Generation

A research implementation of QuorumRAG — a RAG architecture that requires cross-retriever consensus before surfacing evidence, built entirely in Rust with Ollama for local LLM inference.


-----------------------------------------------------

➤ ⚡ What is QuorumRAG?

QuorumRAG is a retrieval strategy that runs multiple independent retrievers (dense semantic search at different chunk granularities + BM25 keyword search) over the same corpus, then only surfaces evidence that achieves quorum — agreement from at least N retrievers. Evidence clusters are scored using Reciprocal Rank Fusion (RRF) before being passed to the LLM, producing answers grounded in cross-validated evidence rather than the output of a single retriever.


-----------------------------------------------------

➤ ✨ Why it stands out

  • Quorum consensus — only evidence agreed upon by multiple retrievers reaches the LLM
  • RRF scoring — rank-based fusion robust to BM25/cosine scale mismatch
  • Overlapping chunks — 50% stride prevents answers being split at chunk boundaries
  • Parallel embedding — batched concurrent requests for fast corpus indexing
  • Embedding cache — cold start only happens once per retriever configuration
  • Full eval harness — baseline vs. QuorumRAG recall comparison on every run
  • Built entirely in Rust — no Python runtime, single binary, production-grade performance

-----------------------------------------------------

➤ 🧠 Design Decisions

Decision Why it matters
Reciprocal Rank Fusion Normalizes scores across retrievers without manual scaling — 1/(k+rank) is robust and well-established (Cormack et al., 2009)
Quorum filtering Reduces hallucination risk by requiring cross-retriever agreement before evidence reaches the LLM
Multi-granularity dense retrieval Dense-50, Dense-100, Dense-200 capture answers at different levels of context — fine detail to broad context
BM25 as a quorum voter Keyword retrieval as a complementary signal to semantic search; if both agree, confidence is higher
Overlapping chunks (50% stride) Answers near chunk boundaries are captured by at least one window
Embedding cache per retriever Avoids re-embedding thousands of chunks on every run; cache is keyed by retriever ID including chunk size and overlap

-----------------------------------------------------

➤ 🏗️ Architecture

Query
  │
  ├─► Dense-50   (ov25)  ─┐
  ├─► Dense-100  (ov50)  ─┤  RRF Scoring
  ├─► Dense-200  (ov100) ─┤  (1 / k + rank)
  └─► BM25-100   (ov50)  ─┘
              │
              ▼
     Embedding Similarity
        Clustering (0.85)
              │
              ▼
     Quorum Filter
     (support ≥ 2 retrievers)
              │
              ▼
     Rank Clusters
     (0.7 × avg_score + 0.3 × support)
              │
              ▼
     Build Context (top 5 clusters,
     all members deduplicated by score)
              │
              ▼
     Ollama LLM Generation

-----------------------------------------------------

➤ 🧩 Pipeline Modules

Module Purpose
corpus Loads .txt files, chunks with configurable size and overlap
embedding HTTP client for Ollama nomic-embed-text embeddings
retrievers/dense Cosine similarity search over embedded chunks
retrievers/bm25 Tantivy-powered BM25 keyword search
clustering Greedy cosine similarity clustering of candidates
quorum Filters clusters below the minimum retriever support threshold
ranking Scores clusters by RRF avg + support weighting
context Builds the LLM context string from top-ranked clusters
generation Ollama generation API client
evaluation Word-overlap recall metric, baseline vs. QuorumRAG comparison

-----------------------------------------------------

➤ 🚀 Quickstart

Prerequisites

  • Rust (edition 2024)
  • Ollama running locally at http://localhost:11434
  • Required models pulled:
ollama pull nomic-embed-text
ollama pull mistral

1) Fetch the corpus

pip install wikipedia-api
python3 scripts/fetch_corpus.py

2) Run eval (builds embedding cache on first run)

cargo run

3) Ask a single question

cargo run -- --query "What is backpropagation?"

-----------------------------------------------------

➤ 📦 Use as a library

Add it to your project:

[dependencies]
quorumrag = "0.1"

Build a pipeline from a Config and ask questions. The pipeline indexes the corpus (using the embedding cache when available) on build:

use quorumrag::{Config, QuorumRag};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let config: Config = toml::from_str(&std::fs::read_to_string("config.toml")?)?;
    let rag = QuorumRag::build(config).await?;

    // Full RAG: quorum retrieval + generation.
    let answer = rag.answer("What is backpropagation?").await?;
    println!("{answer}");

    // Or inspect the evidence yourself before generating.
    let result = rag.retrieve("What is backpropagation?", true).await?;
    println!("support={}, clusters={}", result.max_support, result.clusters.len());
    Ok(())
}

Config fields such as corpus_dir, cache_dir, the embedding model, RRF constant, and ranking weights are all configurable (with sensible defaults), so the library makes no assumptions about your working directory.


-----------------------------------------------------

➤ ⚙️ Configuration

config.toml controls the full pipeline:

quorum_threshold = 2      # minimum retrievers that must agree
top_k = 15                # candidates per retriever per query
cluster_threshold = 0.85  # cosine similarity threshold for clustering

[[retrievers]]
retriever_type = "dense"
chunk_size = 50
overlap = 25

[[retrievers]]
retriever_type = "dense"
chunk_size = 100
overlap = 50

[[retrievers]]
retriever_type = "dense"
chunk_size = 200
overlap = 100

[[retrievers]]
retriever_type = "bm25"
chunk_size = 100
overlap = 50

[ollama]
url = "http://localhost:11434"
model = "mistral"

-----------------------------------------------------

➤ 📊 Eval Results

Evaluated on 20 Wikipedia-based Q&A pairs across CS and ML topics.

System Recall
Baseline (Dense-50 only) 14 / 20
QuorumRAG (4 retrievers) 19 / 20

QuorumRAG additionally provides a support score (1–4) on every answer, indicating how many independent retrievers agreed on the evidence — a confidence signal the baseline cannot produce.


-----------------------------------------------------

➤ 🧠 Tech Stack

  • Rust (edition 2024) — entire implementation
  • Tokio — async runtime, parallel embedding
  • Tantivy — BM25 full-text search index
  • Ollama — local LLM inference (nomic-embed-text, mistral)
  • Serde / TOML — config and cache serialization
  • Futures — batched concurrent HTTP embedding

-----------------------------------------------------

➤ 🛣️ What's next

  • Confidence-weighted answer generation using support scores
  • Streaming LLM responses for interactive mode
  • PyO3 bindings to expose the Rust core to a Python research harness
  • Additional retrievers (TF-IDF, hybrid sparse-dense)
  • Ablation study tooling (sweep quorum threshold, chunk size, cluster threshold)
  • REST API mode for integration with external frontends

-----------------------------------------------------

➤ Authors


➤ License

Licensed under either of MIT or Apache-2.0 at your option.