lattice-embed
Vector embedding generation with SIMD-accelerated operations for semantic search and similarity matching.
Features
- Local Embeddings: Generate embeddings locally using BGE models via fastembed
- SIMD Acceleration: AVX2/AVX-512/NEON optimized vector operations (7x speedup)
- LRU Cache: Blake3-based caching to avoid recomputation
- Async API: Full async/await support with tokio
Models
| Model | Dimensions | Use Case | HuggingFace ID |
|---|---|---|---|
BgeSmallEnV15 |
384 | Fast, general purpose (default) | BAAI/bge-small-en-v1.5 |
BgeBaseEnV15 |
768 | Balanced quality/speed | BAAI/bge-base-en-v1.5 |
BgeLargeEnV15 |
1024 | Highest quality | BAAI/bge-large-en-v1.5 |
All models have a 512 token input limit.
Services
LocalEmbeddingService
Single-model service with lazy initialization. Inference is serialized (one call at a time).
use ;
let service = new;
let embedding = service.embed_one.await?;
assert_eq!;
CachedEmbeddingService
Wraps any EmbeddingService with LRU caching. Identical texts return cached embeddings.
use ;
use Arc;
let inner = new;
let cached = new; // 1000 entry cache
let emb1 = cached.embed_one.await?;
let emb2 = cached.embed_one.await?; // cache hit
PooledEmbeddingService
Maintains N model instances for parallel inference. Memory scales linearly (~100-300MB per instance).
use ;
let service = new; // 4 concurrent inference slots
SIMD Vector Operations
All operations use runtime feature detection with automatic scalar fallback.
| Platform | Instructions |
|---|---|
| x86_64 | AVX-512 VNNI > AVX2 + FMA > scalar |
| aarch64 | ARM NEON (mandatory) |
| Other | Scalar fallback |
Performance (384-dim vectors)
| Operation | Scalar | SIMD | Speedup |
|---|---|---|---|
| cosine_similarity | ~650ns | ~90ns | 7x |
| dot_product | ~230ns | ~35ns | 6.5x |
| normalize | ~400ns | ~60ns | 6.5x |
| dot_product_i8 | ~300ns | ~25ns | 12x |
Usage
use ;
let a = vec!;
let b = vec!;
let sim = cosine_similarity; // 1.0 (identical direction)
let dot = dot_product; // 1.0
let dist = euclidean_distance; // 0.0
let mut v = vec!;
normalize; // v = [0.6, 0.8], magnitude = 1.0
Int8 Quantization
4x memory reduction with ~99% accuracy:
use QuantizedVector;
let v = vec!;
let q = from_f32;
// Compare quantized vectors directly
let sim = q.cosine_similarity;
Cache
Blake3-based hashing with LRU eviction. Default capacity: 4000 entries (~6MB for 384-dim vectors).
use ;
let cache = new;
let key = cache.compute_key;
cache.put;
let stats = cache.stats;
println!;
Feature Flags
| Feature | Default | Description |
|---|---|---|
local |
Yes | Enable local embedding via fastembed |
[]
= { = "0.1", = false } # SIMD only
= { = "0.1" } # Full (local + SIMD)
Batch Processing
let texts = vec!;
let embeddings = service.embed.await?;
assert_eq!;
Maximum batch size: 1000 texts (to prevent OOM).
API Reference
EmbeddingService Trait
SIMD Config
use simd_config;
let config = simd_config;
println!;