Expand description
§oxibonsai-rag
Pure Rust Retrieval-Augmented Generation (RAG) pipeline for OxiBonsai.
This crate provides a self-contained, dependency-light RAG stack:
vector_store— in-memory flat index with cosine similarity search.chunker— split documents into overlapping character windows, sentence groups, or paragraphs.embedding—Embeddertrait plus two built-in backends:IdentityEmbedder(deterministic hash, for tests) andTfIdfEmbedder(bag-of-words TF-IDF, no external deps).retriever— top-k chunk retrieval given a query string.pipeline— composes retrieval + prompt building for inference.
§Quick Start
use oxibonsai_rag::embedding::IdentityEmbedder;
use oxibonsai_rag::pipeline::{RagConfig, RagPipeline};
let embedder = IdentityEmbedder::new(64).expect("valid dim");
let mut pipeline = RagPipeline::new(embedder, RagConfig::default());
pipeline.index_document("Rust is a systems programming language.").expect("failed to index document");
let prompt = pipeline.build_prompt("What is Rust?").expect("failed to build prompt");
assert!(prompt.contains("Question: What is Rust?"));Re-exports§
pub use advanced_chunker::ChunkStrategy;pub use advanced_chunker::ChunkerRegistry;pub use advanced_chunker::MarkdownChunker;pub use advanced_chunker::RecursiveCharSplitter;pub use advanced_chunker::RichChunk;pub use advanced_chunker::SentenceChunker;pub use advanced_chunker::SlidingWindowChunker;pub use chunker::chunk_by_paragraphs;pub use chunker::chunk_by_sentences;pub use chunker::chunk_document;pub use chunker::Chunk;pub use chunker::ChunkConfig;pub use code_chunker::CodeChunker;pub use code_chunker::Language;pub use distance::Distance;pub use embedding::Embedder;pub use embedding::IdentityEmbedder;pub use embedding::TfIdfEmbedder;pub use error::RagError;pub use metadata_filter::MetadataFilter;pub use metadata_filter::MetadataValue;pub use persistence::IndexSnapshot;pub use persistence::RetrieverSnapshot;pub use persistence::SCHEMA_VERSION;pub use pipeline::PipelineStats;pub use pipeline::RagConfig;pub use pipeline::RagPipeline;pub use retriever::Retriever;pub use retriever::RetrieverConfig;pub use semantic_chunker::SemanticChunker;pub use vector_store::cosine_similarity;pub use vector_store::dot_product;pub use vector_store::l2_normalize;pub use vector_store::SearchResult;pub use vector_store::VectorStore;
Modules§
- advanced_
chunker - Advanced document chunking strategies for RAG pipelines.
- chunker
- Document chunking strategies for the RAG pipeline.
- code_
chunker - Language-aware chunking for source files.
- distance
- Distance / similarity metrics for the RAG pipeline.
- embedding
- Embedding backends for the RAG pipeline.
- error
- Error types for the OxiBonsai RAG pipeline.
- metadata_
filter - Metadata filtering for the vector store.
- persistence
- JSON persistence for the vector store and retriever.
- pipeline
- End-to-end RAG pipeline: index → retrieve → build prompt.
- retriever
- Retrieval pipeline: indexes documents and answers top-k queries.
- semantic_
chunker - Semantic (embedding-driven) sentence grouping.
- vector_
store - In-memory flat vector store with configurable distance metric.