LEANN-rs
Rust implementation of LEANN -- a lightweight vector database and RAG system that achieves 97% storage reduction through graph-based selective embedding recomputation.
Overview
LEANN-rs is a full rewrite of the Python LEANN system in Rust, providing:
- Pure Rust HNSW engine -- build and search without FAISS or any C++ dependencies
- Embedding recomputation -- prune stored embeddings and recompute on-the-fly via ZMQ, reducing index size by ~97%
- Multiple embedding backends -- OpenAI, Ollama (pipelined async), Gemini APIs (ONNX local inference planned)
- RAG pipeline -- search + LLM chat with Ollama, OpenAI, and Anthropic providers
- Python bindings -- PyO3-based native module, drop-in replacement for the Python version
- HTTP server -- Axum-based REST API for search
Crates
| Crate | Description |
|---|---|
leann-core |
Core library: HNSW graph (SIMD-optimized), embeddings, search, builder, passages, BM25, metadata filtering (~9K LOC) |
leann-cli |
CLI binary (leann build, search, ask, react, list, remove, watch, serve) |
leann-server |
Standalone HTTP server with index management and search endpoints |
leann-python |
PyO3 bindings exposing LeannBuilder, LeannSearcher, LeannChat, ReActAgent |
Quick Start
Build
Build an index
# From a directory of text/code files
# With Ollama (local, no API key needed)
# From multiple directories and individual files
# Build with AST-aware code chunking
# Force rebuild with custom file types
# Disable compact storage / recomputation
Search
RAG Q&A
# Single question
# Interactive mode
# With a specific LLM provider
# With thinking budget for reasoning models
ReAct Agent
# Multi-turn retrieval and reasoning
Other commands
Global options
HTTP Server
Start the standalone server:
# Via the server binary
LEANN_INDEX_DIR=./indexes
# Or via the CLI
Endpoints
| Method | Path | Description |
|---|---|---|
GET |
/health |
Health check |
GET |
/indexes |
List all indexes |
GET |
/indexes/{name} |
Get index info |
POST |
/indexes/{name}/search |
Search an index |
Search request body:
Python Bindings
The leann-python crate provides native Python bindings via PyO3, with .pyi type stubs for IDE autocomplete/typecheck. Build with maturin:
Run binding tests:
&&
Usage:
# Build an index
=
# Search
=
=
# RAG Q&A
=
=
Architecture
HNSW Engine
The core HNSW implementation in leann-core/src/hnsw/ includes:
- simd.rs -- NEON (aarch64) and AVX2 (x86_64) SIMD-optimized L2 and inner product distance with batch-4 processing, plus
FlatMinHeap/FlatMaxHeapandVisitedListdata structures - build.rs -- FAISS-style insert algorithm with parallel construction (rayon), monomorphized distance functions, early termination, and deterministic RNG seed support
- search.rs -- Two-phase beam search with SIMD batch-4 distance, flat heaps, generation-counter visited tracking, and support for both stored-vector and recompute modes
- csr.rs -- Compact CSR format conversion for pruned indexes
- io.rs -- Binary serialization for both standard and compact graph formats
Embedding Recomputation
Instead of storing all embedding vectors (which dominate index size), LEANN prunes them and recomputes distances on-the-fly during search via a ZMQ REQ/REP protocol:
- The search algorithm encounters a pruned node
- It sends node IDs + query vector to the embedding server via ZMQ
- The server recomputes embeddings and returns distances
- Search continues with fresh distances
This achieves ~97% storage reduction with minimal latency impact.
Index File Format
A LEANN index consists of:
| File | Contents |
|---|---|
<name>.meta.json |
Index metadata (model, dimensions, backend config) |
<name>.passages.jsonl |
Raw text chunks with metadata |
<name>.passages.idx |
Byte offsets for random passage access (one offset per line) |
<name>.index |
HNSW graph (standard or compact CSR) |
<name>.ids.txt |
Node ID to passage ID mapping |
Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY |
-- | OpenAI API key for embeddings and chat |
ANTHROPIC_API_KEY |
-- | Anthropic API key for chat |
GOOGLE_API_KEY / GEMINI_API_KEY |
-- | Gemini API key for embeddings |
OLLAMA_HOST |
http://localhost:11434 |
Ollama server URL |
OPENAI_BASE_URL |
https://api.openai.com/v1 |
OpenAI-compatible API base URL |
PORT |
8080 |
HTTP server port |
LEANN_INDEX_DIR |
. |
Directory for index storage (server) |
Benchmarks
The pure-Rust HNSW engine matches or exceeds FAISS C++ performance: 3.3x geometric mean speedup across 25 benchmarks (distance, build, search, recompute, full pipeline, passage lookup, index size). Distance computations are 9-111x faster via SIMD; search is on par; passage lookups are 2.5-2.8x faster; index files are 85% smaller.
See RUST_PERFORMANCE.md for full results, methodology, and reproduction instructions.
Quick start:
# All-in-one comparison (Rust vs Python/FAISS)
# Criterion benchmarks only (HTML reports in target/criterion/)
Development
# Run tests
# Check all crates
# Build release
# Build Python bindings
&&
# Test Python bindings (maturin develop + pytest)
&&
License
MIT