Crate soundevents

Expand description

soundevents

Production-oriented Rust inference for CED AudioSet sound-event classifiers — load an ONNX model, feed it 16 kHz mono audio, get back ranked RatedSoundEvent predictions with names, ids, and confidences. Long clips are handled via configurable chunking.

LoC

§Highlights

Drop-in CED inference — load any CED AudioSet ONNX model (or use the bundled tiny variant) and run it directly on &[f32] PCM samples. No Python, no preprocessing pipeline.
Typed labels, not bare integers — every prediction comes back as an EventPrediction carrying a &'static RatedSoundEvent from soundevents-dataset, so you get the canonical AudioSet name, the /m/... id, the model class index, and the confidence in one struct.
Compile-time class-count guarantee — the NUM_CLASSES = 527 constant comes from the rated dataset at codegen time. If a model returns the wrong number of classes you get a typed ClassifierError::UnexpectedClassCount instead of a silent mismatch.
Long-clip chunking built in — classify_chunked / classify_all_chunked window the input at a configurable hop, run inference on each chunk, and aggregate the per-chunk confidences with either Mean or Max. Defaults match CED’s 10 s training window (160 000 samples at 16 kHz), and fixed-size chunk batches can now be packed into one model call.
Top-k via a tiny min-heap — classify(samples, k) does not allocate a full 527-element scores vector to find the top results.
Batch-ready low-level API — predict_raw_scores_batch, predict_raw_scores_batch_flat, predict_raw_scores_batch_into, classify_all_batch, and classify_batch accept equal-length clip batches for service-layer batching.
Bring-your-own model or bundle one — load from a path, from in-memory bytes, or enable the bundled-tiny feature to embed models/tiny.onnx directly into your binary.

§Quick start

[dependencies]
soundevents = "0.3"

§Models

The four CED variants are sourced from the mispeech Hugging Face organisation, exported to ONNX, and checked into this repo under soundevents/models/. You should not normally need to download anything — git clone gives you a working classifier out of the box.

Variant	File	Size	Hugging Face source
`tiny`	`soundevents/models/tiny.onnx`	6.4 MB	`mispeech/ced-tiny`
`mini`	`soundevents/models/mini.onnx`	10 MB	`mispeech/ced-mini`
`small`	`soundevents/models/small.onnx`	22 MB	`mispeech/ced-small`
`base`	`soundevents/models/base.onnx`	97 MB	`mispeech/ced-base`

All four expose the same input/output contract: mono f32 PCM at 16 kHz in, 527-class scores out (SAMPLE_RATE_HZ / NUM_CLASSES). They differ only in parameter count and accuracy/latency trade-off, so you can swap variants without touching application code.

Note — the four ONNX files together are ~135 MB. If you fork this repo and want to keep the working tree slim, consider tracking soundevents/models/*.onnx with git LFS.

§Refreshing models from upstream

If upstream releases new weights, or you cloned without the model files, refetch them with:

# Requires huggingface_hub:  pip install --user huggingface_hub
./scripts/download_models.sh

# Or just one variant
./scripts/download_models.sh tiny

The script downloads the *.onnx artifact from each mispeech/ced-* Hugging Face repo and writes it as soundevents/models/<variant>.onnx.

See THIRD_PARTY_NOTICES.md for upstream model sources and attribution details.

§Bundled tiny model

Enable the bundled-tiny feature to embed models/tiny.onnx into your binary — useful for CLI tools and self-contained services where you don’t want to ship a separate model file.

soundevents = { version = "0.3", features = ["bundled-tiny"] }

§Features

Feature	Default	What you get
`bundled-tiny`		Embeds `models/tiny.onnx` into the crate so `Classifier::tiny()` works without an external file.

The full input/output contract:

Constant	Value	Meaning
`SAMPLE_RATE_HZ`	`16_000`	Required input sample rate (mono `f32`).
`DEFAULT_CHUNK_SAMPLES`	`160_000`	Default 10 s window/hop for chunked inference.
`NUM_CLASSES`	`527`	Number of CED output classes — derived at compile time from `RatedSoundEvent::events().len()`.

For low-level batching, every clip in predict_raw_scores_batch* / classify_*_batch must be non-empty and have the same sample count. predict_raw_scores_batch_flat returns one row-major Vec<f32>, and predict_raw_scores_batch_into lets callers reuse their own output buffer to avoid per-call result allocations. classify_chunked uses the same equal-length restriction internally when ChunkingOptions::batch_size() > 1, which is naturally satisfied for fixed-size windows and automatically falls back to smaller batches for the final short tail chunk.

§Development

Regenerate the dataset from upstream sources:

cargo xtask codegen

Run the test suite:

cargo test

§License

soundevents is under the terms of both the MIT license and the Apache License (Version 2.0).

See LICENSE-APACHE, LICENSE-MIT for details. Bundled third-party model attributions and source licenses are documented in THIRD_PARTY_NOTICES.md.

Structs§

ChunkingOptions: Options for chunked inference over long clips.
Classifier: CED sound event classifier.
EventPrediction: A single classification result with both model-space and ontology-space metadata.
Options: Options for constructing a Classifier from an ONNX model on disk.

Enums§

ChunkAggregation: Controls how chunked inference aggregates chunk confidences.
ClassifierError: Errors from Classifier operations.

Constants§

DEFAULT_CHUNK_SAMPLES: The default window size used by the chunked inference helpers: 10 seconds at 16 kHz.
NUM_CLASSES: Number of model output classes.
SAMPLE_RATE_HZ: The expected input sample rate for CED models.