sevensense-vector
Ultra-fast vector similarity search using HNSW for bioacoustic embeddings.
sevensense-vector implements Hierarchical Navigable Small World (HNSW) graphs for approximate nearest neighbor search. It achieves 150x speedup over brute-force search while maintaining >95% recall, enabling real-time similarity queries over millions of bird call embeddings.
Features
- HNSW Index: State-of-the-art ANN algorithm with 150x speedup
- Hyperbolic Geometry: Poincaré ball model for hierarchical data
- Multiple Distance Metrics: Cosine, Euclidean, Angular, Hyperbolic
- Dynamic Updates: Insert and delete without full rebuild
- Persistence: Save/load indices to disk
- Filtered Search: Query with metadata constraints
Use Cases
| Use Case | Description | Key Functions |
|---|---|---|
| Similarity Search | Find similar bird calls | search(), search_with_filter() |
| Index Building | Build searchable index | build(), add() |
| Dynamic Updates | Add/remove vectors | insert(), delete() |
| Persistence | Save/load index | save(), load() |
| Hyperbolic Search | Hierarchical similarity | HyperbolicIndex::search() |
Installation
Add to your Cargo.toml:
[]
= "0.1"
Quick Start
use ;
Basic Index Construction
use ;
// Configure the index
let config = HnswConfig ;
let mut index = new;
// Add vectors one by one
for in vectors.iter.enumerate
Batch Construction
use HnswIndex;
// Build from a batch of vectors (more efficient)
let index = build?;
println!;
Progress Monitoring
let index = build_with_progress?;
Basic Search
use HnswIndex;
let results = index.search?;
for result in &results
Search with EF Parameter
The ef parameter controls the accuracy/speed tradeoff at query time:
use SearchParams;
// Higher ef = more accurate but slower
let params = SearchParams ;
let results = index.search_with_params?;
Filtered Search
use ;
// Search with metadata filter
let filter = new
.species_in
.confidence_gte;
let results = index.search_with_filter?;
Batch Search
let queries = vec!;
// Search all queries in parallel
let all_results = index.search_batch?;
for in all_results.iter.enumerate
Saving an Index
use HnswIndex;
// Build and save
let index = build?;
index.save?;
println!;
Loading an Index
let index = load?;
println!;
// Ready to search
let results = index.search?;
Memory-Mapped Loading
For large indices that don't fit in RAM:
use MmapIndex;
// Memory-map the index (lazy loading)
let index = open?;
// Search works the same way
let results = index.search?;
Poincaré Ball Model
Hyperbolic space is ideal for hierarchical data like taxonomies:
use ;
let config = PoincareConfig ;
let mut index = new;
// Project Euclidean embeddings to Poincaré ball
for in embeddings.iter.enumerate
Hyperbolic Distance
use ;
// Distance in the Poincaré ball
let dist = poincare_distance;
// Möbius addition (hyperbolic translation)
let translated = mobius_add;
Hierarchical Similarity
// Hyperbolic distance captures hierarchical relationships
// Closer to origin = more general, farther = more specific
let genus_embedding = index.get?;
let species_embedding = index.get?;
// Species is "below" genus in the hierarchy
let genus_norm = l2_norm;
let species_norm = l2_norm;
assert!; // Further from origin
Parameter Selection
use HnswConfig;
// High accuracy configuration
let accurate_config = HnswConfig ;
// Fast configuration
let fast_config = HnswConfig ;
// Balanced (default)
let balanced_config = default;
Benchmarking Recall
use ;
// Build index
let index = build?;
// Benchmark against brute force
let recall = benchmark_recall?;
println!; // Should be >0.95
Memory Estimation
use estimate_memory;
let num_vectors = 1_000_000;
let dimensions = 1536;
let m = 16;
let estimated_bytes = estimate_memory;
println!;
Configuration
HnswConfig Parameters
| Parameter | Default | Description | Impact |
|---|---|---|---|
m |
16 | Connections per node | Higher = better recall, more memory |
m0 |
32 | Layer 0 connections | Usually 2×m |
ef_construction |
200 | Build-time search width | Higher = better quality, slower build |
ml |
1/ln(m) | Level multiplier | Controls layer distribution |
Search Parameters
| Parameter | Default | Description |
|---|---|---|
ef |
50 | Search-time width |
k |
10 | Number of results |
Performance Benchmarks
| Index Size | Build Time | Search (p99) | Recall@10 | Memory |
|---|---|---|---|---|
| 100K | 5s | 0.8ms | 0.97 | 620 MB |
| 1M | 55s | 2.1ms | 0.96 | 6.0 GB |
| 10M | 12min | 8.5ms | 0.95 | 58 GB |
Speedup vs Brute Force
| Index Size | HNSW (ms) | Brute Force (ms) | Speedup |
|---|---|---|---|
| 100K | 0.8 | 45 | 56x |
| 1M | 2.1 | 450 | 214x |
| 10M | 8.5 | 4500 | 529x |
Links
- Homepage: ruv.io
- Repository: github.com/ruvnet/ruvector
- Crates.io: crates.io/crates/sevensense-vector
- Documentation: docs.rs/sevensense-vector
License
MIT License - see LICENSE for details.
Part of the 7sense Bioacoustic Intelligence Platform by rUv