sevensense-audio
Audio ingestion and preprocessing pipeline for bioacoustic analysis.
sevensense-audio handles all aspects of audio input—loading files, streaming from microphones, segmenting into fixed-length chunks, computing Mel spectrograms, and normalizing for neural network input. It's the gateway for raw audio into the 7sense platform.
Features
- Multi-Format Support: WAV, MP3, FLAC, OGG via symphonia
- Streaming Input: Real-time microphone/line-in capture
- Smart Segmentation: Fixed-length or voice-activity-based splitting
- Mel Spectrograms: Configurable FFT, hop length, and mel bins
- Audio Augmentation: Time stretch, pitch shift, noise injection
- Batch Processing: Process multiple files in parallel
Use Cases
| Use Case | Description | Key Functions |
|---|---|---|
| File Loading | Load audio from various formats | AudioLoader::load() |
| Segmentation | Split recordings into analysis chunks | Segmenter::segment() |
| Spectrogram | Convert audio to mel spectrogram | MelSpectrogram::compute() |
| Streaming | Real-time audio capture | AudioStream::new() |
| Augmentation | Data augmentation for training | Augmenter::augment() |
Installation
Add to your Cargo.toml:
[]
= "0.1"
Quick Start
use ;
async
Basic File Loading
use ;
async
Loading with Options
use ;
let options = LoadOptions ;
let audio = load_with_options.await?;
assert_eq!;
assert_eq!;
Batch Loading
use AudioLoader;
let paths = vec!;
let audios = load_batch.await?;
for in paths.iter.zip
Fixed-Length Segmentation
use ;
let audio = load.await?;
// 5-second segments with 50% overlap
let segmenter = new;
let segments = segmenter.segment;
println!;
for in segments.iter.enumerate
Voice Activity Detection (VAD)
use ;
let audio = load.await?;
let config = VadConfig ;
let segmenter = new;
let segments = segmenter.segment;
println!;
Segment Iterator
use ;
let audio = load.await?;
// Lazy iteration over segments
for segment in new
Basic Mel Computation
use ;
let audio = load.await?;
// Default configuration (128 mel bins, 2048 FFT, 512 hop)
let mel = compute?;
println!;
// Shape: [n_frames, n_mels] e.g., [312, 128]
Custom Configuration
use ;
let config = MelConfig ;
let mel = compute?;
Log-Mel Spectrogram
use ;
let mel = compute?;
// Convert to log scale (commonly used for neural networks)
let log_mel = mel.to_log_scale; // Add small constant to avoid log(0)
Visualizing Spectrograms
use ;
let mel = compute?;
// Save as PNG image
save_spectrogram?;
// Get as RGB buffer
let rgb_buffer = to_rgb?;
Basic Augmentation
use ;
let audio = load.await?;
let config = AugmentConfig ;
let augmenter = new;
let augmented = augmenter.augment?;
Specific Augmentations
use ;
// Time stretch (slow down by 10%)
let stretched = apply?;
// Pitch shift (up by 2 semitones)
let shifted = apply?;
// Add background noise
let noisy = apply?;
Augmentation Pipeline
use ;
let pipeline = new
.add
.add
.add
.add
.add;
let augmented = pipeline.apply?;
Configuration
Mel Spectrogram Parameters
| Parameter | Default | Description |
|---|---|---|
n_mels |
128 | Number of mel frequency bins |
n_fft |
2048 | FFT window size |
hop_length |
512 | Samples between frames |
f_min |
50.0 | Minimum frequency (Hz) |
f_max |
14000.0 | Maximum frequency (Hz) |
Segmentation Parameters
| Parameter | Default | Description |
|---|---|---|
window_size |
5.0 | Segment duration in seconds |
overlap |
0.5 | Overlap between segments in seconds |
min_duration |
1.0 | Minimum segment duration |
Performance
| Operation | Throughput | Notes |
|---|---|---|
| File Loading | ~500 MB/s | With SSD |
| Mel Spectrogram | ~1000 segments/s | 5s segments |
| Resampling | ~200 MB/s | Using libsamplerate |
Links
- Homepage: ruv.io
- Repository: github.com/ruvnet/ruvector
- Crates.io: crates.io/crates/sevensense-audio
- Documentation: docs.rs/sevensense-audio
License
MIT License - see LICENSE for details.
Part of the 7sense Bioacoustic Intelligence Platform by rUv