Production-oriented Rust inference for CED AudioSet sound-event classifiers — load an ONNX model, feed it 16 kHz mono audio, get back ranked RatedSoundEvent predictions with names, ids, and confidences. Long clips are handled via configurable chunking.
Highlights
- Drop-in CED inference — load any CED AudioSet ONNX model (or use the bundled
tinyvariant) and run it directly on&[f32]PCM samples. No Python, no preprocessing pipeline. - Typed labels, not bare integers — every prediction comes back as an
EventPredictioncarrying a&'static RatedSoundEventfromsoundevents-dataset, so you get the canonical AudioSet name, the/m/...id, the model class index, and the confidence in one struct. - Compile-time class-count guarantee — the
NUM_CLASSES = 527constant comes from the rated dataset at codegen time. If a model returns the wrong number of classes you get a typedClassifierError::UnexpectedClassCountinstead of a silent mismatch. - Long-clip chunking built in —
classify_chunked/classify_all_chunkedwindow the input at a configurable hop, run inference on each chunk, and aggregate the per-chunk confidences with eitherMeanorMax. Defaults match CED's 10 s training window (160 000 samples at 16 kHz). - Top-k via a tiny min-heap —
classify(samples, k)does not allocate a full 527-element scores vector to find the top results. - Bring-your-own model or bundle one — load from a path, from in-memory bytes, or enable the
bundled-tinyfeature to embedmodels/tiny.onnxdirectly into your binary.
Quick start
[]
= "0.1"
use ;
Long clips: chunked inference
Classifier::classify_chunked slides a window over the input and aggregates each chunk's per-class confidences. The defaults (10 s window, 10 s hop, mean aggregation) match CED's training setup; tune them for overlap or peak-pooling.
use ;
Models
The four CED variants are sourced from the mispeech Hugging Face organisation, exported to ONNX, and checked into this repo under soundevents/models/. You should not normally need to download anything — git clone gives you a working classifier out of the box.
| Variant | File | Size | Hugging Face source |
|---|---|---|---|
tiny |
soundevents/models/tiny.onnx |
6.4 MB | mispeech/ced-tiny |
mini |
soundevents/models/mini.onnx |
10 MB | mispeech/ced-mini |
small |
soundevents/models/small.onnx |
22 MB | mispeech/ced-small |
base |
soundevents/models/base.onnx |
97 MB | mispeech/ced-base |
All four expose the same input/output contract: mono f32 PCM at 16 kHz in, 527-class scores out (SAMPLE_RATE_HZ / NUM_CLASSES). They differ only in parameter count and accuracy/latency trade-off, so you can swap variants without touching application code.
Note — the four ONNX files together are ~135 MB. If you fork this repo and want to keep the working tree slim, consider tracking
soundevents/models/*.onnxwith git LFS.
Refreshing models from upstream
If upstream releases new weights, or you cloned without the model files, refetch them with:
# Requires huggingface_hub: pip install --user huggingface_hub
# Or just one variant
The script downloads the *.onnx artifact from each mispeech/ced-* Hugging Face repo and writes it as soundevents/models/<variant>.onnx.
Bundled tiny model
Enable the bundled-tiny feature to embed models/tiny.onnx into your binary — useful for CLI tools and self-contained services where you don't want to ship a separate model file.
= { = "0.1", = ["bundled-tiny"] }
use ;
let mut classifier = tiny?;
Features
| Feature | Default | What you get |
|---|---|---|
bundled-tiny |
Embeds models/tiny.onnx into the crate so Classifier::tiny() works without an external file. |
The full input/output contract:
| Constant | Value | Meaning |
|---|---|---|
SAMPLE_RATE_HZ |
16_000 |
Required input sample rate (mono f32). |
DEFAULT_CHUNK_SAMPLES |
160_000 |
Default 10 s window/hop for chunked inference. |
NUM_CLASSES |
527 |
Number of CED output classes — derived at compile time from RatedSoundEvent::events().len(). |
Development
Regenerate the dataset from upstream sources:
Run the test suite:
License
soundevents is under the terms of both the MIT license and the
Apache License (Version 2.0).
See LICENSE-APACHE, LICENSE-MIT for details.
Copyright (c) 2026 FinDIT studio authors.