sevensense-embedding
Neural embedding generation using Perch 2.0 for bioacoustic analysis.
sevensense-embedding transforms audio segments into rich 1536-dimensional embedding vectors using Google's Perch 2.0 model via ONNX Runtime. These embeddings capture the acoustic essence of bird vocalizations, enabling similarity search, clustering, and species identification.
Features
- Perch 2.0 Integration: State-of-the-art bird audio embeddings
- ONNX Runtime: Cross-platform GPU/CPU inference
- 1536-Dimensional Vectors: Rich semantic representation
- Batch Processing: Efficient multi-segment inference
- Product Quantization (PQ): 4x memory reduction for storage
- L2 Normalization: Optimized for cosine similarity search
Use Cases
| Use Case | Description | Key Functions |
|---|---|---|
| Single Inference | Embed one audio segment | embed() |
| Batch Processing | Embed multiple segments efficiently | embed_batch() |
| Streaming | Real-time embedding generation | EmbeddingStream::new() |
| Quantization | Compress embeddings for storage | quantize_pq() |
| Validation | Verify embedding quality | validate() |
Installation
Add to your Cargo.toml:
[]
= "0.1"
ONNX Model Setup
The Perch 2.0 ONNX model is automatically downloaded on first use. For manual setup:
# Download model manually
Quick Start
use ;
use AudioLoader;
async
Single Audio Embedding
use ;
async
From Raw Audio
use EmbeddingPipeline;
use AudioLoader;
let audio = load.await?;
let pipeline = new.await?;
// Pipeline handles mel spectrogram computation internally
let embedding = pipeline.embed_audio.await?;
Efficient Batch Embedding
use ;
let pipeline = new.await?;
// Configure batching
let batch_config = BatchConfig ;
// Embed multiple segments
let segments = load_segments?;
let embeddings = pipeline.embed_batch.await?;
println!;
Progress Tracking
use EmbeddingPipeline;
let pipeline = new.await?;
let embeddings = pipeline.embed_batch_with_progress.await?;
Parallel Processing
use EmbeddingPipeline;
use ;
let pipeline = new;
let embeddings: = iter
.map
.buffer_unordered // 8 concurrent embeddings
.collect
.await;
Product Quantization (PQ)
Product Quantization reduces embedding size by 4x while maintaining search quality.
use ;
let pipeline = new.await?;
// Generate embeddings
let embeddings: = generate_embeddings.await?;
// Train PQ codebook on embeddings
let pq = train?; // 96 subvectors, 256 centroids
// Quantize embeddings
let quantized: = embeddings.iter
.map
.collect;
// Memory reduction
let original_size = embeddings.len * 1536 * 4; // f32 = 4 bytes
let quantized_size = quantized.len * 96; // u8 per subvector
println!;
// Output: Compression ratio: 64.0x
Asymmetric Distance Computation
use ProductQuantizer;
// Query embedding (full precision)
let query = pipeline.embed.await?;
// Compute distances to quantized vectors
let distances: = quantized.iter
.map
.collect;
// Find nearest neighbors
let mut indexed: = distances.iter.enumerate.collect;
indexed.sort_by;
let top_10: = indexed.iter.take.collect;
Custom ONNX Configuration
use ;
let config = EmbeddingConfig ;
let pipeline = new.await?;
Execution Providers
use ExecutionProvider;
// CPU (default)
let cpu_config = EmbeddingConfig ;
// CUDA (NVIDIA GPU)
let cuda_config = EmbeddingConfig ;
// CoreML (Apple Silicon)
let coreml_config = EmbeddingConfig ;
Memory Optimization
use ;
let config = EmbeddingConfig ;
Quality Checks
use ;
let validator = new;
let embedding = pipeline.embed.await?;
let result = validator.validate?;
match result
Validation Criteria
use ;
let criteria = ValidationCriteria ;
let validator = with_criteria;
Batch Validation
let results = validator.validate_batch;
let valid_count = results.iter.filter.count;
let invalid_count = results.len - valid_count;
println!;
Configuration
EmbeddingConfig Parameters
| Parameter | Default | Description |
|---|---|---|
model_path |
Auto-download | Path to ONNX model |
execution_provider |
CPU | CUDA, CoreML, or CPU |
num_threads |
4 | CPU inference threads |
normalize |
true | L2 normalize embeddings |
warmup |
true | Run warmup inference |
Model Specifications
| Property | Value |
|---|---|
| Input | Mel spectrogram [batch, 128, 312] |
| Output | Embedding vector [batch, 1536] |
| Model Size | ~25 MB |
| Inference Time | ~15ms (CPU) / ~3ms (GPU) |
Performance
| Operation | CPU (i7-12700) | GPU (RTX 3080) |
|---|---|---|
| Single Inference | 15ms | 3ms |
| Batch (32) | 120ms | 20ms |
| Throughput | 260/s | 1600/s |
Links
- Homepage: ruv.io
- Repository: github.com/ruvnet/ruvector
- Crates.io: crates.io/crates/sevensense-embedding
- Documentation: docs.rs/sevensense-embedding
License
MIT License - see LICENSE for details.
Part of the 7sense Bioacoustic Intelligence Platform by rUv