audiofp

Audio fingerprinting library for Rust with classical landmark and band-power algorithms, streaming extraction, file decoding, and AudioSeal-compatible watermark detection.

Overview

audiofp provides three complementary classical fingerprinters for music identification, each with offline and streaming variants:

Method	Use Case	Sample Rate	Frame Rate	Output Size
Wang	Music ID, Shazam-style matching	8 kHz	62.5 fps	~2.4 KB/s (fan-out 10)
Panako	Music ID with ±5 % tempo robustness	8 kHz	62.5 fps	~2.0 KB/s (fan-out 5)
Haitsma	Compact dense IDs, fastest extraction	5 kHz	78.125 fps	312 B/s
Streaming	Real-time hash emission	(per algorithm)	(per algorithm)	Bit-exact offline parity
Watermark	AudioSeal detection (BYO ONNX)	16 kHz	(per model)	Detection + 16-bit message

Perfect for:

Music identification ("what is this song?")
Audio deduplication at scale
Royalty / rights enforcement against re-encoded content
Cover and remix detection (with neural fingerprinting on the roadmap)
Watermark verification on generative-AI audio

Features

Three Classical Algorithms - Wang (landmark pairs) + Panako (triplet hashes with tempo β) + Haitsma–Kalker (32-bit/frame band sign)
Streaming + Offline Variants - Every fingerprinter has a StreamingFingerprinter impl with bit-exact parity to the offline extract
Bit-Exact Determinism - Same input always produces the same hashes; verified down to 1-sample-per-push streaming chunks
bytemuck::Pod Hash Types - Persist hashes directly to mmap'd files or ship over a C ABI without serialization
Audio File Decoding - MP3, FLAC, WAV, OGG-Vorbis, AAC-in-MP4, raw PCM via Symphonia
High-Quality Resampling - Built-in windowed-sinc Kaiser resampler with auto anti-aliasing cutoff
Watermark Detection - AudioSeal-compatible ONNX wrapper (Tract backend)
DSP Primitives Reusable - Public dsp::stft, dsp::mel, dsp::peaks, dsp::resample, dsp::windows
Allocation-Free Hot Path - Streaming push reuses pre-allocated scratch after warmup
no_std + alloc Capable - DSP and classical fingerprinters compile without std (host-only today; bare-metal in roadmap)
Feature-Gated Heavy Deps - Symphonia and Tract both opt-in via Cargo features
Optional mimalloc - Single-flag opt-in to install mimalloc as the global allocator

Installation

[dependencies]
audiofp = "0.1"

Feature Flags

Feature	Default	Description
`std`	Yes	Enables `audiofp::io` (Symphonia file decoder)
`watermark`	No	Enables `audiofp::watermark` via Tract ONNX runtime
`neural`	No	Reserved for the upcoming Phase 5 neural fingerprinter
`mimalloc`	No	Installs `mimalloc::MiMalloc` as the process-wide `#[global_allocator]`

Minimal build (no_std + alloc, DSP and classical only):

[dependencies]
audiofp = { version = "0.1", default-features = false }

With watermark detection (pulls in Tract):

[dependencies]
audiofp = { version = "0.1", features = ["watermark"] }

With mimalloc for a faster global allocator:

[dependencies]
audiofp = { version = "0.1", features = ["mimalloc"] }

Quick Start

use audiofp::classical::Wang;
use audiofp::io::decode_to_mono_at;
use audiofp::{AudioBuffer, Fingerprinter, SampleRate};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Decode any supported file format and resample to Wang's 8 kHz.
    let samples = decode_to_mono_at("song.mp3", 8_000)?;

    let mut wang = Wang::default();
    let buf = AudioBuffer { samples: &samples, rate: SampleRate::HZ_8000 };
    let fp = wang.extract(buf)?;

    println!("{} hashes at {:.1} fps", fp.hashes.len(), fp.frames_per_sec);
    for h in fp.hashes.iter().take(5) {
        println!("  t_anchor={} hash={:08x}", h.t_anchor, h.hash);
    }

    Ok(())
}

Streaming Mode

use audiofp::classical::StreamingWang;
use audiofp::StreamingFingerprinter;

fn main() {
    let mut s = StreamingWang::default();

    // Pretend incoming 8 kHz mono f32 chunks (e.g., 16 ms at 128 samples).
    for chunk in audio_chunks() {
        for (timestamp, hash) in s.push(&chunk) {
            println!("{:?} {:08x}", timestamp, hash.hash);
        }
    }

    // Drain whatever's pending at end-of-stream.
    for (timestamp, hash) in s.flush() {
        println!("{:?} {:08x}", timestamp, hash.hash);
    }

    println!("latency: {} ms", s.latency_ms());
}
# fn audio_chunks() -> impl Iterator<Item = Vec<f32>> { std::iter::empty() }

Documentation

For complete API reference and usage examples, see USAGE.md.

Architecture

Fingerprint Types

Each algorithm emits a strongly-typed, bytemuck::Pod-castable result:

Wang offline                         Panako offline
┌──────────────────────────┐         ┌──────────────────────────┐
│ WangFingerprint          │         │ PanakoFingerprint        │
│   hashes: Vec<WangHash>  │         │   hashes: Vec<PanakoHash>│
│   frames_per_sec: f32    │         │   frames_per_sec: f32    │
└──────────────────────────┘         └──────────────────────────┘

WangHash (8 bytes, repr(C))          PanakoHash (16 bytes, repr(C))
├── hash: u32                        ├── hash: u32
└── t_anchor: u32                    ├── t_anchor: u32
                                     ├── t_b: u32
                                     └── t_c: u32

Haitsma offline
┌──────────────────────────┐
│ HaitsmaFingerprint       │
│   frames: Vec<u32>       │   one u32 per spectrogram frame ≥ 1
│   frames_per_sec: f32    │
└──────────────────────────┘

Algorithm Pipeline

Decode — Parse any supported format (MP3, FLAC, WAV, OGG-Vorbis, AAC-in-MP4, PCM) via Symphonia and downmix to mono f32
Resample — Built-in windowed-sinc Kaiser resampler (default 32 taps, β=8.6) brings the audio to the algorithm's required rate
STFT — realfft-backed real-input transform with reusable scratch; Hann window, configurable hop and n_fft
Algorithm-specific extraction:
- Wang: dB log-mag → 31×31 peak picker (capped at 30/s) → anchor-target landmark pairs in Δt ∈ [1, 63], |Δf| ≤ 64
- Panako: same front-end → triplet enumeration in cone Δt < 96, |Δf| < 96 → tempo-invariant β packing
- Haitsma: 33 log-spaced bands (300–2000 Hz) → 32 sign bits per frame from band-difference deltas
Streaming variants mirror offline pipelines and emit hashes once each anchor's full lookahead has elapsed, guaranteeing bit-exact equivalence under arbitrary chunking

Hash Layouts

WangHash::hash (32 bits)
[31..23]  f_a_q  9 bits, anchor frequency (quantised to 512 buckets)
[22..14]  f_b_q  9 bits, target frequency (same quantisation)
[13.. 0]  Δt    14 bits, frames between anchor and target

PanakoHash::hash (32 bits)
[31..30]  sign       2 bits, signs of Δf_ab and Δf_bc
[29..28]  mag_order  2 bits, which of {a, b, c} has the largest magnitude
[27..23]  β          5 bits, round((t_c - t_b) / (t_c - t_a) · 31)
[22..15]  Δf_ab      8 bits signed, clamped to ±127
[14.. 7]  Δf_bc      8 bits signed, clamped to ±127
[ 6.. 0]  reserved   7 bits, zero

Haitsma frame (32 bits, "MSB-zero" packing)
bit 31 → band 0,  bit 0 → band 31
F[n][b] = ((E[n][b] − E[n][b+1]) − (E[n−1][b] − E[n−1][b+1])) > 0

Performance

Measured on Intel i5-1135G7 (4 cores, 8 threads, 2.40 GHz) with cargo bench --bench extract:

Algorithm	2 s of audio	5 s	30 s	Realtime factor (30 s)
`Wang`	5.6 ms	15.9 ms	109 ms	275×
`Panako`	5.8 ms	15.7 ms	109 ms	275×
`Haitsma`	3.1 ms	9.0 ms	65 ms	462×

Hot-path design notes:

All three classical fingerprinters share the same Hann-windowed STFT and Lemire monotonic-deque peak picker (amortised O(N · M)), so cost is dominated by the FFT.
Streaming push reuses pre-allocated scratch; no allocation per frame after the initial ring is sized.
SincResampler with the default 32-tap Kaiser kernel is O(N · 2 · half_taps) per output sample with a precomputed Bessel I₀(β).

Streaming type	`latency_ms()`	Notes
`StreamingWang`	2 256 ms	Includes 1 s for per-second adaptive peak thresholding
`StreamingPanako`	2 784 ms	Wider target zone (96 frames vs Wang's 63)
`StreamingHaitsma`	409 ms	No peak picker → bounded by `n_fft / sr`

Run benchmarks for your own host:

cargo bench --bench extract
cargo bench --bench extract -- --save-baseline main   # save for diffing later

Memory Safety

Sample-rate-strict APIs reject mismatched inputs with AfpError::UnsupportedSampleRate
Audio length checks reject buffers shorter than each algorithm's minimum (≥ 2 s)
Allocation-free streaming hot path after warmup (no Vec::push in the inner loop)
bytemuck::Pod derive on hash types is sound: every field is repr(C) with explicit padding

Determinism

Identical inputs → identical outputs — same audio, same fingerprinter, same config produces bit-for-bit identical hashes on every call and every supported target
Stable algorithm IDs — Fingerprinter::name() returns versioned strings ("wang-v1", "panako-v2", "haitsma-v1"); a future major bump that changes hash bytes will change the version suffix
Stable hash layouts — bit positions in WangHash::hash, PanakoHash::hash, and Haitsma frames are stable across patch and minor versions inside 0.x
Verified streaming/offline parity — the test suite feeds randomised chunk sequences (down to 1 sample per push) through the streaming impl and asserts the output hash multiset matches extract

Robustness

Codec-tolerant by design — Wang and Panako are spectral-peak based; Haitsma is band-power-difference based. All three are intended to survive lossy re-encoding (MP3 / AAC / Opus) and modest noise. Quantitative robustness benchmarks against a held-out corpus are in the roadmap.
Mono only — multi-channel inputs must be downmixed by the caller (the file decoder does this for you).
Sample-rate-strict — each fingerprinter requires its native rate (8 kHz / 5 kHz). Resample with dsp::resample::SincResampler or decode_to_mono_at if your source differs.
Resilient decoder — recoverable per-packet failures inside Symphonia are silently skipped so a single corrupt block doesn't kill a whole-file decode.

Comparison with Alternatives

Feature	audiofp	chromaprint-rust	dejavu (Python)
Pure Rust	Yes	No (FFI to C lib)	No
Wang landmarks	Yes	No	Yes
Panako triplets (tempo-robust)	Yes	No	No
Haitsma–Kalker	Yes	No	No
Streaming variants	Yes	Limited	No
Bit-exact streaming/offline parity	Yes	No	N/A
File decoding included	Yes (Symphonia)	Yes (limited)	Yes (FFmpeg)
Watermark detection	Yes (AudioSeal)	No	No
`no_std + alloc` capable	Yes (host)	No	N/A
`bytemuck::Pod` hash types	Yes	No	N/A
Built-in resampler	Yes	No	No

Examples

The examples/ directory will house complete working programs in a future release; for now, the snippets in USAGE.md and the doctests across the public API are the recommended starting point.

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/your-feature)
Run tests: cargo test --all-features
Run clippy: cargo clippy --all-targets --all-features -- -D warnings
Run formatter: cargo fmt --all -- --check
Commit your changes
Push the branch and open a Pull Request

Development Setup

# Clone
git clone https://github.com/themankindproject/audiofp
cd audiofp

# Run all tests
cargo test --all-features

# Run no_std build path
cargo build --no-default-features

# Generate documentation
RUSTDOCFLAGS="-D warnings" cargo doc --all-features --no-deps --open

CI (.github/workflows/ci.yml) runs fmt, clippy, and test jobs in parallel on every push and PR.

License

MIT License — see LICENSE for details.

References

Avery Wang, An Industrial-Strength Audio Search Algorithm (ISMIR 2003) — Wang landmarks
Joren Six & Marc Leman, Panako: A Scalable Acoustic Fingerprinting System (ISMIR 2014); 2021 update — triplet β hash
Jaap Haitsma & Ton Kalker, A Highly Robust Audio Fingerprinting System (ISMIR 2002) — band-power sign bits
Robin San Roman et al., Proactive Detection of Voice Cloning with Localized Watermarking (AudioSeal, 2024) — watermark model

audiofp 0.1.1