Pure-Rust Retrieval-Augmented Generation Pipeline
SIMD-accelerated RAG pipeline built on Trueno compute primitives. Part of the Sovereign AI Stack.
Features
- Pure Rust - Zero Python/C++ dependencies
- Chunking - Recursive, Fixed, Sentence, Paragraph, Semantic, Structural
- Hybrid Retrieval - Dense (vector) + Sparse (BM25) search
- Fusion - RRF, Linear, DBSF, Convex, Union, Intersection
- Reranking - Lexical, cross-encoder, and composite rerankers
- Metrics - Recall, Precision, MRR, NDCG, MAP
- Semantic Embeddings - Production ONNX models via FastEmbed (optional)
- Nemotron Embeddings - NVIDIA Embed Nemotron 8B via GGUF (optional)
- Index Compression - LZ4/ZSTD compressed persistence (optional)
Installation
[]
= "0.1.8"
Quick Start
use ;
let mut pipeline = new
.chunker
.embedder
.reranker
.fusion
.build?;
let doc = new.with_title;
pipeline.index_document?;
let = pipeline.query_with_context?;
Examples
# Basic examples
# With semantic embeddings (downloads ~90MB ONNX model on first run)
# With compressed index persistence
# With NVIDIA Nemotron embeddings (requires GGUF model file)
NEMOTRON_MODEL_PATH=/path/to/model.gguf
Optional Features
Semantic Embeddings (FastEmbed)
Production-quality vector embeddings via FastEmbed (ONNX Runtime):
= { = "0.1.8", = ["embeddings"] }
use ;
let embedder = new?;
let embedding = embedder.embed?;
// 384-dimensional embeddings
Available models:
AllMiniLmL6V2- Fast, 384 dims (default)AllMiniLmL12V2- Better quality, 384 dimsBgeSmallEnV15- Balanced, 384 dimsBgeBaseEnV15- Higher quality, 768 dimsNomicEmbedTextV1- Retrieval optimized, 768 dims
NVIDIA Embed Nemotron 8B
High-quality 4096-dimensional embeddings via GGUF model inference:
= { = "0.1.8", = ["nemotron"] }
use ;
let config = new
.with_gpu
.with_normalize;
let embedder = new?;
// Asymmetric retrieval - different prefixes for queries vs documents
let query_emb = embedder.embed_query?;
let doc_emb = embedder.embed_document?;
Index Compression
LZ4/ZSTD compressed index persistence:
= { = "0.1.8", = ["compression"] }
use ;
let bytes = index.to_compressed_bytes?;
// 4-6x compression ratio
Stack Dependencies
trueno-rag is part of the Sovereign AI Stack:
| Crate | Version | Purpose |
|---|---|---|
| trueno | 0.11 | SIMD/GPU compute primitives |
| trueno-db | 0.3.10 | GPU-first analytics database |
| realizar | 0.5.1 | GGUF/APR model inference |
| fastembed | 5.x | ONNX embeddings |
Development
Documentation
License
MIT