Foxstash
High-performance local RAG library for Rust
Foxstash is a local-first Retrieval-Augmented Generation (RAG) library featuring SIMD-accelerated vector operations, HNSW indexing, vector quantization, ONNX embeddings, hybrid search (BM25 + vector), and WebAssembly support.
Features
- SIMD-Accelerated - AVX2/SSE/NEON vector operations with 3-4x speedup
- HNSW Indexing - Hierarchical Navigable Small World graphs for fast similarity search
- Vector Quantization - Int8 (4x), Binary (32x), and Product Quantization (192x)
- Hybrid Search - Combine BM25 keyword search with vector similarity for best-of-both recall
- ONNX Embeddings - Generate embeddings locally with MiniLM-L6-v2 or any ONNX model
- WASM Support - Run in the browser with IndexedDB persistence
- Compression - Gzip, LZ4, and Zstd support for efficient storage
- Incremental Persistence - Write-ahead log for fast updates without full rewrites
- Local-First - Your data never leaves your machine
Quick Start
Add to your Cargo.toml:
[]
= "0.3"
Basic Usage
use ;
use HNSWIndex;
// Create an HNSW index
let mut index = with_defaults; // 384-dim for MiniLM-L6-v2
// Add documents with embeddings
let doc = Document ;
index.add?;
// Search for similar documents
let query = vec!;
let results = index.search?;
for result in results
Memory-Efficient Indexing with Quantization
For large datasets, use quantized indexes to reduce memory by 4-192x:
use ;
use Document;
// Scalar Quantization (4x compression, ~95% recall)
let mut sq8_index = for_normalized;
// Binary Quantization (32x compression, use with reranking)
let mut binary_index = with_full_precision;
// Add documents
let doc = Document ;
sq8_index.add?;
binary_index.add_with_full_precision?;
// Search with SQ8 (high quality, 4x memory savings)
let results = sq8_index.search?;
// Two-phase search with Binary (fast filter, then precise rerank)
let results = binary_index.search_and_rerank?;
Product Quantization (Extreme Compression)
For massive datasets, use Product Quantization for up to 192x compression:
use ;
use PQConfig;
// Configure PQ: 8 subvectors, 256 centroids each
let pq_config = new
.with_kmeans_iterations;
// Train on sample vectors
let training_data = load_sample_vectors;
let mut index = train?;
// Add documents (automatically compressed)
for doc in documents
// Search using Asymmetric Distance Computation (ADC)
let results = index.search?;
Memory Comparison (1M vectors, 384 dimensions)
| Index Type | Memory | Compression | Recall |
|---|---|---|---|
| HNSW (f32) | 1.5 GB | 1x | ~98% |
| SQ8 HNSW | 384 MB | 4x | ~95% |
| Binary HNSW | 48 MB | 32x | ~90%* |
| PQ HNSW (M=8) | 8 MB | 192x | ~80%** |
*With two-phase reranking. **Using ADC search.
Streaming Batch Ingestion
For large datasets, use streaming batch ingestion with progress tracking:
use ;
let mut index = with_defaults;
let config = default
.with_batch_size
.with_total
.with_progress;
let mut builder = new;
for doc in document_iterator
let result = builder.finish;
println!;
Incremental Persistence (WAL)
Avoid rewriting the entire index on every update:
use ;
let config = default
.with_checkpoint_threshold // Full snapshot every 10K ops
.with_wal_sync_interval; // Sync to disk every 100 ops
let mut storage = new?;
// Fast append-only writes to WAL
for doc in new_documents
// Periodic checkpoint
if storage.needs_checkpoint
With ONNX Embeddings
Enable the onnx feature:
[]
= { = "0.3", = ["onnx"] }
use OnnxEmbedder;
let mut embedder = new?;
let embedding = embedder.embed?;
assert_eq!;
Database Layer (foxstash-db)
For production use, foxstash-db provides a high-level document store with named collections, metadata filtering, BM25 full-text search, and hybrid search built on top of foxstash-core.
[]
= "0.3"
VectorStore and Collections
use ;
use json;
// Open a persistent store (recovers existing collections from disk)
let config = default.with_embedding_dim;
let store = open?;
// Get or create a collection
let col = store.get_or_create_collection?;
// Insert documents with optional metadata
col.insert?;
col.insert?;
// Upsert (insert or replace) a document
col.upsert?;
// Vector similarity search
let query_embedding = vec!;
let results = col.search?;
// Vector search with metadata filter
let filter = eq;
let filtered = col.search?;
// BM25 full-text search
let text_results = col.search_text?;
// Hybrid search: combines vector + BM25 with Reciprocal Rank Fusion
let hybrid_results = col.search_hybrid?;
// Look up a document by ID
if let Some = col.get?
// Delete a document
col.delete?;
// Compact tombstoned entries
col.compact?;
// Flush WAL to disk
col.flush?;
// Flush all collections at once
store.flush_all?;
VectorStore API
| Method | Description |
|---|---|
VectorStore::open(path, config) |
Open a store, recovering existing collections from disk |
get_or_create_collection(name) |
Return existing collection or create a new one |
create_collection(name) |
Create a new collection; error if it already exists |
get_collection(name) |
Get an existing collection; error if not found |
collections() |
List all collection names |
unload_collection(name) |
Remove from memory (files remain; can be re-opened) |
delete_collection(name) |
Permanently delete from memory and disk |
flush_all() |
Flush all collections to disk |
Collection API
| Method | Description |
|---|---|
insert(id, content, embedding, metadata) |
Insert a document; error on duplicate ID |
upsert(id, content, embedding, metadata) |
Insert or replace a document |
delete(id) |
Tombstone a document by ID |
get(id) |
Retrieve a document by ID |
search(query, k, filter) |
Vector similarity search with optional metadata filter |
search_text(query, k, filter) |
BM25 keyword search with optional metadata filter |
search_hybrid(query, text, k, filter, config) |
Hybrid vector + BM25 search |
flush() |
Flush WAL to disk |
compact() |
Remove tombstoned entries and rebuild index |
Metadata Filtering
Filter supports dot-notation field access into JSON metadata:
use Filter;
use json;
// Equality
let f = eq;
// Inequality
let f = ne;
// Range comparisons
let f = gt;
let f = lte;
// Set membership
let f = is_in;
// Field existence
let f = exists;
// Logical composition
let f = and;
let f = or;
let f = not;
Hybrid Search Configuration
use ;
let config = default
.with_weights // vector_weight=0.7, keyword_weight=0.3
.with_strategy // Reciprocal Rank Fusion (default)
.with_rrf_k; // RRF smoothing constant
// Alternatively, use WeightedSum with min-max normalized scores
let config = default
.with_weights
.with_strategy;
| Field | Default | Description |
|---|---|---|
vector_weight |
0.7 |
Weight for vector similarity scores |
keyword_weight |
0.3 |
Weight for BM25 keyword scores |
merge_strategy |
Rrf |
Rrf (rank-based) or WeightedSum (score-based) |
rrf_k |
60.0 |
RRF smoothing constant (only used with Rrf) |
Index and Text Index Trait Abstractions
foxstash-core exposes VectorIndex and VectorIndexSnapshot traits that abstract over
concrete index types (HNSW, Flat, SQ8, Binary, PQ). The foxstash-db crate additionally
exports a TextIndex trait for BM25-backed keyword indexes. These traits make it straightforward
to swap implementations or build generic search pipelines without coupling to a specific type.
use ;
use TextIndex;
Crates
| Crate | Description |
|---|---|
foxstash-core |
Core library with indexes, embeddings, and storage |
foxstash-db |
Document storage, collections, hybrid search, BM25 |
foxstash-wasm |
WebAssembly bindings with IndexedDB persistence |
foxstash-native |
Native bindings with full ONNX support |
Architecture
foxstash/
├── crates/
│ ├── core/ # Main library
│ │ ├── embedding/ # ONNX Runtime + caching
│ │ ├── index/ # HNSW, Flat, SQ8, Binary, PQ indexes
│ │ ├── storage/ # File persistence, compression, WAL
│ │ └── vector/ # SIMD ops, quantization
│ ├── db/ # Database layer
│ │ ├── collection/ # Named collections with WAL
│ │ ├── filter/ # Metadata filtering
│ │ ├── hybrid/ # BM25 + vector hybrid search
│ │ └── store/ # VectorStore (multi-collection manager)
│ ├── wasm/ # Browser target
│ ├── native/ # Desktop/server target
│ └── benches/ # Comprehensive benchmarks
Benchmarks
HNSW Performance @ 100,000 Vectors
128 dimensions, 10,000 queries, Recall@10
| Library | Build Time | Search QPS | Recall |
|---|---|---|---|
| Foxstash (batch) | 7.6s | 13,366 | 61.0% |
| Foxstash (single-threaded) | 7.6s | 1,322 | 61.0% |
| hnswlib (C++, ef=64) | 5.7s | 4,004 | 39.5% |
| faiss-hnsw (C++, ef=64) | 8.6s | 3,139 | 44.9% |
| instant-distance (Rust) | 73.9s | 575 | 60.2% |
Key takeaways:
- 2.3x faster single-threaded search than instant-distance with equivalent recall
- 23x faster batch search than instant-distance via rayon
- 9.7x faster build than instant-distance
- hnswlib/faiss use lower
ef_search(64 vs 100), inflating their QPS relative to Foxstash
Build Strategies @ 100,000 Vectors
| Strategy | Build Time | Search QPS | Recall | Use Case |
|---|---|---|---|---|
| Sequential | 541s | 1,274 | 58.8% | Maximum quality |
| Parallel | 7.6s | 1,322 | 61.0% | Production (71x faster) |
Running Benchmarks
# Full benchmark suite (sets up Python venv automatically)
# Or run individually:
See crates/benches/ for benchmark implementations.
Roadmap
- Int8/Binary quantization (4-32x memory reduction)
- Streaming add/search for large datasets
- Incremental persistence (WAL + checkpointing)
- Product quantization (PQ) - up to 192x compression
- Diversity-aware neighbor selection (Algorithm 4)
- Hybrid search (BM25 + vector, RRF and WeightedSum)
- VectorIndex / TextIndex trait abstractions
- Constrained graph traversal for efficient pre-filtering
- Cache-locality optimizations for quantized indices (flattened L0 cache)
- High-concurrency scaling (sharded-lock or lock-free index updates)
- GPU acceleration (optional)
- Multi-vector support (late interaction)
License
MIT License - see LICENSE for details.
Credits
Built by Narcoleptic Fox