Expand description
Capsa - A compact, lightweight library for embedding-based document storage and retrieval.
This library provides the core functionality for implementing RAG (Retrieval-Augmented Generation) systems. It handles document chunking, embedding generation, vector storage, and semantic search.
§Quick Start
use capsa::{config::Config, documentdb::DocumentDatabase};
use serde_json::json;
use secrecy::SecretString;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
// Configure embedding service and database
let api_key = std::env::var("CAPSA_API_KEY").ok().map(SecretString::from);
let config = Config::new(
"http://localhost:9000/v1".to_string(),
"nomic-ai/nomic-embed-text-v1.5".to_string(),
"./documents.db".to_string(),
api_key,
);
// Connect to database
let db = DocumentDatabase::new(&config).await?;
let conn = db.connect().await?;
// Index a document
let doc_id = conn.insert(
json!({"title": "Example Document"}),
"Your document text here"
).await?;
// Search for similar content
let results = conn.search_topk("your search query", 5).await?;
for (doc_id, metadata, start, end) in results {
println!("Found match in document {}: bytes {}-{}", doc_id, start, end);
}
Ok(())
}§Architecture
The library is organized into several modules:
config- Configuration types for embedding services and databasesdocumentdb- High-level document storage and retrieval APIembedder- Text embedding generation and chunkingvectordb- Low-level vector database operations
Most applications should use documentdb which provides automatic embedding
generation. Use vectordb directly only if you need fine-grained control
over vector storage.
Modules§
- config
- Configuration constants for the embedding system.
- documentdb
- High-level document storage and retrieval with automatic embedding generation.
- embedder
- A module for generating text embeddings using OpenAI-compatible APIs.
- error
- Error types for the Capsa library.
- vectordb
- Low-level vector database operations using libSQL.