Crate frame_catalog

Crate frame_catalog 

Source
Expand description

§Frame Catalog - Vector Similarity Search and RAG Infrastructure

High-performance vector search, embeddings, and retrieval-augmented generation (RAG) for AI systems.

§Features

Fast approximate nearest neighbor search using Hierarchical Navigable Small World graphs:

  • Sub-millisecond queries: ~0.5-2ms for 10K documents
  • 384-dimensional embeddings: MiniLM-L6-v2 compatible
  • In-memory index: Optimized for speed
  • Thread-safe: Concurrent read access with RwLock

§🧠 ONNX Embeddings

Text-to-vector conversion using ONNX Runtime:

  • MiniLM-L6-v2 model (87MB, 384-dim vectors)
  • Batch processing: Encode multiple texts efficiently
  • Normalization: L2-normalized embeddings
  • Fallback: Simple hash-based embeddings for testing

§💾 Persistent Storage

SQLite-backed vector store with optional compression:

  • Document references: Store file paths or spool offsets
  • BytePunch compression: 40-70% space savings
  • DataSpool integration: Bundle multiple documents
  • Lazy loading: Load embeddings on demand

§📚 RAG System

High-level retrieval interface:

  • Automatic chunking: Split documents with overlap
  • Index + search: One-step document indexing
  • Configurable: Chunk size, overlap, HNSW parameters

§🗄️ Event Database

Conversation and event storage:

  • Conversation tracking: Session-based organization
  • Event history: Timestamped event log
  • Metadata storage: JSON metadata per event
  • Search support: Retrieve events by conversation ID

§Usage

use frame_catalog::{VectorStore, VectorStoreConfig};
use frame_catalog::{OnnxEmbeddingGenerator, EmbeddingGenerator};
use frame_catalog::DocumentChunk;

// Create embedding generator
let embedder = OnnxEmbeddingGenerator::new().unwrap();

// Create vector store
let config = VectorStoreConfig::default();
let mut store = VectorStore::new(config).unwrap();

// Index documents
let chunk = DocumentChunk {
    id: "doc1".to_string(),
    content: "Rust is a systems programming language".to_string(),
    source: "rust-docs".to_string(),
    metadata: None,
};

let embedding = embedder.generate(&chunk.content).unwrap();
store.add_chunk(chunk, &embedding).unwrap();

// Search
let query_embedding = embedder.generate("programming languages").unwrap();
let results = store.search(&query_embedding, 5).unwrap();

for result in results {
    println!("{:.3}: {}", result.score, result.chunk.content);
}

Re-exports§

pub use vector_store::VectorStore;
pub use vector_store::VectorStoreConfig;
pub use vector_store::VectorStoreError;
pub use vector_store::DocumentChunk;
pub use vector_store::SearchResult;
pub use vector_store::EMBEDDING_DIM;
pub use embeddings::EmbeddingGenerator;
pub use embeddings::EmbeddingError;
pub use embeddings::SimpleEmbeddingGenerator;
pub use database::Database;
pub use database::DatabaseError;
pub use database::StoredEvent;
pub use database::Conversation;
pub use retrieval::RetrievalSystem;
pub use retrieval::RetrievalConfig;
pub use retrieval::RetrievalError;

Modules§

database
SQLite database for persistent storage
embeddings
Embedding generation for text chunks
retrieval
Document indexing and retrieval system
vector_store
Vector store for knowledge retrieval using HNSW