Expand description
soundevents
Production-oriented Rust inference for CED AudioSet sound-event classifiers — load an ONNX model, feed it 16 kHz mono audio, get back ranked RatedSoundEvent predictions with names, ids, and confidences. Long clips are handled via configurable chunking.
§Highlights
- Drop-in CED inference — load any CED AudioSet ONNX model (or use the bundled
tinyvariant) and run it directly on&[f32]PCM samples. No Python, no preprocessing pipeline. - Typed labels, not bare integers — every prediction comes back as an
EventPredictioncarrying a&'static RatedSoundEventfromsoundevents-dataset, so you get the canonical AudioSet name, the/m/...id, the model class index, and the confidence in one struct. - Compile-time class-count guarantee — the
NUM_CLASSES = 527constant comes from the rated dataset at codegen time. If a model returns the wrong number of classes you get a typedClassifierError::UnexpectedClassCountinstead of a silent mismatch. - Long-clip chunking built in —
classify_chunked/classify_all_chunkedwindow the input at a configurable hop, run inference on each chunk, and aggregate the per-chunk confidences with eitherMeanorMax. Defaults match CED’s 10 s training window (160 000 samples at 16 kHz), and fixed-size chunk batches can now be packed into one model call. - Top-k via a tiny min-heap —
classify(samples, k)does not allocate a full 527-element scores vector to find the top results. - Batch-ready low-level API —
predict_raw_scores_batch,predict_raw_scores_batch_flat,predict_raw_scores_batch_into,classify_all_batch, andclassify_batchaccept equal-length clip batches for service-layer batching. - Bring-your-own model or bundle one — load from a path, from in-memory bytes, or enable the
bundled-tinyfeature to embedmodels/tiny.onnxdirectly into your binary.
§Quick start
[dependencies]
soundevents = "0.3"§Models
The four CED variants are sourced from the mispeech Hugging Face organisation, exported to ONNX, and checked into this repo under soundevents/models/. You should not normally need to download anything — git clone gives you a working classifier out of the box.
| Variant | File | Size | Hugging Face source |
|---|---|---|---|
tiny | soundevents/models/tiny.onnx | 6.4 MB | mispeech/ced-tiny |
mini | soundevents/models/mini.onnx | 10 MB | mispeech/ced-mini |
small | soundevents/models/small.onnx | 22 MB | mispeech/ced-small |
base | soundevents/models/base.onnx | 97 MB | mispeech/ced-base |
All four expose the same input/output contract: mono f32 PCM at 16 kHz in, 527-class scores out (SAMPLE_RATE_HZ / NUM_CLASSES). They differ only in parameter count and accuracy/latency trade-off, so you can swap variants without touching application code.
Note — the four ONNX files together are ~135 MB. If you fork this repo and want to keep the working tree slim, consider tracking
soundevents/models/*.onnxwith git LFS.
§Refreshing models from upstream
If upstream releases new weights, or you cloned without the model files, refetch them with:
# Requires huggingface_hub: pip install --user huggingface_hub
./scripts/download_models.sh
# Or just one variant
./scripts/download_models.sh tinyThe script downloads the *.onnx artifact from each mispeech/ced-* Hugging Face repo and writes it as soundevents/models/<variant>.onnx.
See THIRD_PARTY_NOTICES.md for upstream model sources and attribution details.
§Bundled tiny model
Enable the bundled-tiny feature to embed models/tiny.onnx into your binary — useful for CLI tools and self-contained services where you don’t want to ship a separate model file.
soundevents = { version = "0.3", features = ["bundled-tiny"] }§Features
| Feature | Default | What you get |
|---|---|---|
bundled-tiny | Embeds models/tiny.onnx into the crate so Classifier::tiny() works without an external file. |
The full input/output contract:
| Constant | Value | Meaning |
|---|---|---|
SAMPLE_RATE_HZ | 16_000 | Required input sample rate (mono f32). |
DEFAULT_CHUNK_SAMPLES | 160_000 | Default 10 s window/hop for chunked inference. |
NUM_CLASSES | 527 | Number of CED output classes — derived at compile time from RatedSoundEvent::events().len(). |
For low-level batching, every clip in predict_raw_scores_batch* / classify_*_batch must be non-empty and have the same sample count. predict_raw_scores_batch_flat returns one row-major Vec<f32>, and predict_raw_scores_batch_into lets callers reuse their own output buffer to avoid per-call result allocations. classify_chunked uses the same equal-length restriction internally when ChunkingOptions::batch_size() > 1, which is naturally satisfied for fixed-size windows and automatically falls back to smaller batches for the final short tail chunk.
§Development
Regenerate the dataset from upstream sources:
cargo xtask codegenRun the test suite:
cargo test§License
soundevents is under the terms of both the MIT license and the
Apache License (Version 2.0).
See LICENSE-APACHE, LICENSE-MIT for details. Bundled third-party model attributions and source licenses are documented in THIRD_PARTY_NOTICES.md.
Copyright (c) 2026 FinDIT studio authors.
Structs§
- Chunking
Options - Options for chunked inference over long clips.
- Classifier
- CED sound event classifier.
- Event
Prediction - A single classification result with both model-space and ontology-space metadata.
- Options
- Options for constructing a
Classifierfrom an ONNX model on disk.
Enums§
- Chunk
Aggregation - Controls how chunked inference aggregates chunk confidences.
- Classifier
Error - Errors from
Classifieroperations.
Constants§
- DEFAULT_
CHUNK_ SAMPLES - The default window size used by the chunked inference helpers: 10 seconds at 16 kHz.
- NUM_
CLASSES - Number of model output classes.
- SAMPLE_
RATE_ HZ - The expected input sample rate for CED models.