Skip to main content

Crate oxibonsai_rag

Crate oxibonsai_rag 

Source
Expand description

§oxibonsai-rag

Pure Rust Retrieval-Augmented Generation (RAG) pipeline for OxiBonsai.

This crate provides a self-contained, dependency-light RAG stack:

  • vector_store — in-memory flat index with cosine similarity search.
  • chunker — split documents into overlapping character windows, sentence groups, or paragraphs.
  • embeddingEmbedder trait plus two built-in backends: IdentityEmbedder (deterministic hash, for tests) and TfIdfEmbedder (bag-of-words TF-IDF, no external deps).
  • retriever — top-k chunk retrieval given a query string.
  • pipeline — composes retrieval + prompt building for inference.

§Quick Start

use oxibonsai_rag::embedding::IdentityEmbedder;
use oxibonsai_rag::pipeline::{RagConfig, RagPipeline};

let embedder = IdentityEmbedder::new(64).expect("valid dim");
let mut pipeline = RagPipeline::new(embedder, RagConfig::default());

pipeline.index_document("Rust is a systems programming language.").expect("failed to index document");
let prompt = pipeline.build_prompt("What is Rust?").expect("failed to build prompt");
assert!(prompt.contains("Question: What is Rust?"));

Re-exports§

pub use advanced_chunker::ChunkStrategy;
pub use advanced_chunker::ChunkerRegistry;
pub use advanced_chunker::MarkdownChunker;
pub use advanced_chunker::RecursiveCharSplitter;
pub use advanced_chunker::RichChunk;
pub use advanced_chunker::SentenceChunker;
pub use advanced_chunker::SlidingWindowChunker;
pub use chunker::chunk_by_paragraphs;
pub use chunker::chunk_by_sentences;
pub use chunker::chunk_document;
pub use chunker::Chunk;
pub use chunker::ChunkConfig;
pub use code_chunker::CodeChunker;
pub use code_chunker::Language;
pub use distance::Distance;
pub use embedding::Embedder;
pub use embedding::IdentityEmbedder;
pub use embedding::TfIdfEmbedder;
pub use error::RagError;
pub use metadata_filter::MetadataFilter;
pub use metadata_filter::MetadataValue;
pub use persistence::IndexSnapshot;
pub use persistence::RetrieverSnapshot;
pub use persistence::SCHEMA_VERSION;
pub use pipeline::PipelineStats;
pub use pipeline::RagConfig;
pub use pipeline::RagPipeline;
pub use retriever::Retriever;
pub use retriever::RetrieverConfig;
pub use semantic_chunker::SemanticChunker;
pub use vector_store::cosine_similarity;
pub use vector_store::dot_product;
pub use vector_store::l2_normalize;
pub use vector_store::SearchResult;
pub use vector_store::VectorStore;

Modules§

advanced_chunker
Advanced document chunking strategies for RAG pipelines.
chunker
Document chunking strategies for the RAG pipeline.
code_chunker
Language-aware chunking for source files.
distance
Distance / similarity metrics for the RAG pipeline.
embedding
Embedding backends for the RAG pipeline.
error
Error types for the OxiBonsai RAG pipeline.
metadata_filter
Metadata filtering for the vector store.
persistence
JSON persistence for the vector store and retriever.
pipeline
End-to-end RAG pipeline: index → retrieve → build prompt.
retriever
Retrieval pipeline: indexes documents and answers top-k queries.
semantic_chunker
Semantic (embedding-driven) sentence grouping.
vector_store
In-memory flat vector store with configurable distance metric.