audiofp
Audio fingerprinting library for Rust with classical landmark and band-power algorithms, streaming extraction, file decoding, and AudioSeal-compatible watermark detection.
Overview
audiofp provides three complementary classical fingerprinters for music identification, each with offline and streaming variants:
| Method | Use Case | Sample Rate | Frame Rate | Output Size |
|---|---|---|---|---|
| Wang | Music ID, Shazam-style matching | 8 kHz | 62.5 fps | ~2.4 KB/s (fan-out 10) |
| Panako | Music ID with ±5 % tempo robustness | 8 kHz | 62.5 fps | ~2.0 KB/s (fan-out 5) |
| Haitsma | Compact dense IDs, fastest extraction | 5 kHz | 78.125 fps | 312 B/s |
| Streaming | Real-time hash emission | (per algorithm) | (per algorithm) | Bit-exact offline parity |
| Watermark | AudioSeal detection (BYO ONNX) | 16 kHz | (per model) | Detection + 16-bit message |
Perfect for:
- Music identification ("what is this song?")
- Audio deduplication at scale
- Royalty / rights enforcement against re-encoded content
- Cover and remix detection (with neural fingerprinting on the roadmap)
- Watermark verification on generative-AI audio
Features
- Three Classical Algorithms - Wang (landmark pairs) + Panako (triplet hashes with tempo β) + Haitsma–Kalker (32-bit/frame band sign)
- Streaming + Offline Variants - Every fingerprinter has a
StreamingFingerprinterimpl with bit-exact parity to the offlineextract - Bit-Exact Determinism - Same input always produces the same hashes; verified down to 1-sample-per-push streaming chunks
bytemuck::PodHash Types - Persist hashes directly to mmap'd files or ship over a C ABI without serialization- Audio File Decoding - MP3, FLAC, WAV, OGG-Vorbis, AAC-in-MP4, raw PCM via Symphonia
- High-Quality Resampling - Built-in windowed-sinc Kaiser resampler with auto anti-aliasing cutoff
- Watermark Detection - AudioSeal-compatible ONNX wrapper (Tract backend)
- DSP Primitives Reusable - Public
dsp::stft,dsp::mel,dsp::peaks,dsp::resample,dsp::windows - Allocation-Free Hot Path - Streaming
pushreuses pre-allocated scratch after warmup no_std + allocCapable - DSP and classical fingerprinters compile without std (host-only today; bare-metal in roadmap)- Feature-Gated Heavy Deps - Symphonia and Tract both opt-in via Cargo features
- Optional
mimalloc- Single-flag opt-in to installmimallocas the global allocator
Installation
[]
= "0.1"
Feature Flags
| Feature | Default | Description |
|---|---|---|
std |
Yes | Enables audiofp::io (Symphonia file decoder) |
watermark |
No | Enables audiofp::watermark via Tract ONNX runtime |
neural |
No | Reserved for the upcoming Phase 5 neural fingerprinter |
mimalloc |
No | Installs mimalloc::MiMalloc as the process-wide #[global_allocator] |
Minimal build (no_std + alloc, DSP and classical only):
[]
= { = "0.1", = false }
With watermark detection (pulls in Tract):
[]
= { = "0.1", = ["watermark"] }
With mimalloc for a faster global allocator:
[]
= { = "0.1", = ["mimalloc"] }
Quick Start
use Wang;
use decode_to_mono_at;
use ;
Streaming Mode
use StreamingWang;
use StreamingFingerprinter;
#
Documentation
For complete API reference and usage examples, see USAGE.md.
Architecture
Fingerprint Types
Each algorithm emits a strongly-typed, bytemuck::Pod-castable result:
Wang offline Panako offline
┌──────────────────────────┐ ┌──────────────────────────┐
│ WangFingerprint │ │ PanakoFingerprint │
│ hashes: Vec<WangHash> │ │ hashes: Vec<PanakoHash>│
│ frames_per_sec: f32 │ │ frames_per_sec: f32 │
└──────────────────────────┘ └──────────────────────────┘
WangHash (8 bytes, repr(C)) PanakoHash (16 bytes, repr(C))
├── hash: u32 ├── hash: u32
└── t_anchor: u32 ├── t_anchor: u32
├── t_b: u32
└── t_c: u32
Haitsma offline
┌──────────────────────────┐
│ HaitsmaFingerprint │
│ frames: Vec<u32> │ one u32 per spectrogram frame ≥ 1
│ frames_per_sec: f32 │
└──────────────────────────┘
Algorithm Pipeline
- Decode — Parse any supported format (MP3, FLAC, WAV, OGG-Vorbis, AAC-in-MP4, PCM) via Symphonia and downmix to mono
f32 - Resample — Built-in windowed-sinc Kaiser resampler (default 32 taps, β=8.6) brings the audio to the algorithm's required rate
- STFT —
realfft-backed real-input transform with reusable scratch; Hann window, configurable hop andn_fft - Algorithm-specific extraction:
- Wang: dB log-mag → 31×31 peak picker (capped at 30/s) → anchor-target landmark pairs in
Δt ∈ [1, 63], |Δf| ≤ 64 - Panako: same front-end → triplet enumeration in cone
Δt < 96, |Δf| < 96→ tempo-invariant β packing - Haitsma: 33 log-spaced bands (300–2000 Hz) → 32 sign bits per frame from band-difference deltas
- Wang: dB log-mag → 31×31 peak picker (capped at 30/s) → anchor-target landmark pairs in
- Streaming variants mirror offline pipelines and emit hashes once each anchor's full lookahead has elapsed, guaranteeing bit-exact equivalence under arbitrary chunking
Hash Layouts
WangHash::hash (32 bits)
[31..23] f_a_q 9 bits, anchor frequency (quantised to 512 buckets)
[22..14] f_b_q 9 bits, target frequency (same quantisation)
[13.. 0] Δt 14 bits, frames between anchor and target
PanakoHash::hash (32 bits)
[31..30] sign 2 bits, signs of Δf_ab and Δf_bc
[29..28] mag_order 2 bits, which of {a, b, c} has the largest magnitude
[27..23] β 5 bits, round((t_c - t_b) / (t_c - t_a) · 31)
[22..15] Δf_ab 8 bits signed, clamped to ±127
[14.. 7] Δf_bc 8 bits signed, clamped to ±127
[ 6.. 0] reserved 7 bits, zero
Haitsma frame (32 bits, "MSB-zero" packing)
bit 31 → band 0, bit 0 → band 31
F[n][b] = ((E[n][b] − E[n][b+1]) − (E[n−1][b] − E[n−1][b+1])) > 0
Performance
Measured on Intel i5-1135G7 (4 cores, 8 threads, 2.40 GHz) with cargo bench --bench extract:
| Algorithm | 2 s of audio | 5 s | 30 s | Realtime factor (30 s) |
|---|---|---|---|---|
Wang |
5.6 ms | 15.9 ms | 109 ms | 275× |
Panako |
5.8 ms | 15.7 ms | 109 ms | 275× |
Haitsma |
3.1 ms | 9.0 ms | 65 ms | 462× |
Hot-path design notes:
- All three classical fingerprinters share the same Hann-windowed STFT and Lemire monotonic-deque peak picker (amortised O(N · M)), so cost is dominated by the FFT.
- Streaming
pushreuses pre-allocated scratch; no allocation per frame after the initial ring is sized. SincResamplerwith the default 32-tap Kaiser kernel is O(N · 2 · half_taps) per output sample with a precomputed Bessel I₀(β).
| Streaming type | latency_ms() |
Notes |
|---|---|---|
StreamingWang |
2 256 ms | Includes 1 s for per-second adaptive peak thresholding |
StreamingPanako |
2 784 ms | Wider target zone (96 frames vs Wang's 63) |
StreamingHaitsma |
409 ms | No peak picker → bounded by n_fft / sr |
Run benchmarks for your own host:
Memory Safety
- Sample-rate-strict APIs reject mismatched inputs with
AfpError::UnsupportedSampleRate - Audio length checks reject buffers shorter than each algorithm's minimum (≥ 2 s)
- Allocation-free streaming hot path after warmup (no
Vec::pushin the inner loop) bytemuck::Podderive on hash types is sound: every field isrepr(C)with explicit padding
Determinism
- Identical inputs → identical outputs — same audio, same fingerprinter, same config produces bit-for-bit identical hashes on every call and every supported target
- Stable algorithm IDs —
Fingerprinter::name()returns versioned strings ("wang-v1","panako-v2","haitsma-v1"); a future major bump that changes hash bytes will change the version suffix - Stable hash layouts — bit positions in
WangHash::hash,PanakoHash::hash, and Haitsma frames are stable across patch and minor versions inside0.x - Verified streaming/offline parity — the test suite feeds randomised chunk sequences (down to 1 sample per push) through the streaming impl and asserts the output hash multiset matches
extract
Robustness
- Codec-tolerant by design — Wang and Panako are spectral-peak based; Haitsma is band-power-difference based. All three are intended to survive lossy re-encoding (MP3 / AAC / Opus) and modest noise. Quantitative robustness benchmarks against a held-out corpus are in the roadmap.
- Mono only — multi-channel inputs must be downmixed by the caller (the file decoder does this for you).
- Sample-rate-strict — each fingerprinter requires its native rate (8 kHz / 5 kHz). Resample with
dsp::resample::SincResamplerordecode_to_mono_atif your source differs. - Resilient decoder — recoverable per-packet failures inside Symphonia are silently skipped so a single corrupt block doesn't kill a whole-file decode.
Comparison with Alternatives
| Feature | audiofp | chromaprint-rust | dejavu (Python) |
|---|---|---|---|
| Pure Rust | Yes | No (FFI to C lib) | No |
| Wang landmarks | Yes | No | Yes |
| Panako triplets (tempo-robust) | Yes | No | No |
| Haitsma–Kalker | Yes | No | No |
| Streaming variants | Yes | Limited | No |
| Bit-exact streaming/offline parity | Yes | No | N/A |
| File decoding included | Yes (Symphonia) | Yes (limited) | Yes (FFmpeg) |
| Watermark detection | Yes (AudioSeal) | No | No |
no_std + alloc capable |
Yes (host) | No | N/A |
bytemuck::Pod hash types |
Yes | No | N/A |
| Built-in resampler | Yes | No | No |
Examples
The examples/ directory will house complete working programs in a future release; for now, the snippets in USAGE.md and the doctests across the public API are the recommended starting point.
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/your-feature) - Run tests:
cargo test --all-features - Run clippy:
cargo clippy --all-targets --all-features -- -D warnings - Run formatter:
cargo fmt --all -- --check - Commit your changes
- Push the branch and open a Pull Request
Development Setup
# Clone
# Run all tests
# Run no_std build path
# Generate documentation
RUSTDOCFLAGS="-D warnings"
CI (.github/workflows/ci.yml) runs fmt, clippy, and test jobs in parallel on every push and PR.
License
MIT License — see LICENSE for details.
References
- Avery Wang, An Industrial-Strength Audio Search Algorithm (ISMIR 2003) — Wang landmarks
- Joren Six & Marc Leman, Panako: A Scalable Acoustic Fingerprinting System (ISMIR 2014); 2021 update — triplet β hash
- Jaap Haitsma & Ton Kalker, A Highly Robust Audio Fingerprinting System (ISMIR 2002) — band-power sign bits
- Robin San Roman et al., Proactive Detection of Voice Cloning with Localized Watermarking (AudioSeal, 2024) — watermark model