Skip to main content

Crate soundevents

Crate soundevents 

Source
Expand description

soundevents

Production-oriented Rust inference for CED AudioSet sound-event classifiers — load an ONNX model, feed it 16 kHz mono audio, get back ranked RatedSoundEvent predictions with names, ids, and confidences. Long clips are handled via configurable chunking.

github LoC Build codecov

docs.rs crates.io crates.io license

§Highlights

  • Drop-in CED inference — load any CED AudioSet ONNX model (or use the bundled tiny variant) and run it directly on &[f32] PCM samples. No Python, no preprocessing pipeline.
  • Typed labels, not bare integers — every prediction comes back as an EventPrediction carrying a &'static RatedSoundEvent from soundevents-dataset, so you get the canonical AudioSet name, the /m/... id, the model class index, and the confidence in one struct.
  • Compile-time class-count guarantee — the NUM_CLASSES = 527 constant comes from the rated dataset at codegen time. If a model returns the wrong number of classes you get a typed ClassifierError::UnexpectedClassCount instead of a silent mismatch.
  • Long-clip chunking built inclassify_chunked / classify_all_chunked window the input at a configurable hop, run inference on each chunk, and aggregate the per-chunk confidences with either Mean or Max. Defaults match CED’s 10 s training window (160 000 samples at 16 kHz), and fixed-size chunk batches can now be packed into one model call.
  • Top-k via a tiny min-heapclassify(samples, k) does not allocate a full 527-element scores vector to find the top results.
  • Batch-ready low-level APIpredict_raw_scores_batch, predict_raw_scores_batch_flat, predict_raw_scores_batch_into, classify_all_batch, and classify_batch accept equal-length clip batches for service-layer batching.
  • Bring-your-own model or bundle one — load from a path, from in-memory bytes, or enable the bundled-tiny feature to embed models/tiny.onnx directly into your binary.

§Quick start

[dependencies]
soundevents = "0.2"
use soundevents::{Classifier, Options};

fn load_mono_16k_audio(_: &str) -> Result<Vec<f32>, Box<dyn std::error::Error>> {
    Ok(vec![0.0; 16_000])
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut classifier = Classifier::from_file("soundevents/models/tiny.onnx")?;

    // Bring your own decoder/resampler — soundevents expects mono f32
    // samples at 16 kHz, in [-1.0, 1.0].
    let samples: Vec<f32> = load_mono_16k_audio("clip.wav")?;

    // Top-5 predictions for a clip up to ~10 s long.
    for prediction in classifier.classify(&samples, 5)? {
        println!(
            "{:>5.1}%  {:>3}  {}  ({})",
            prediction.confidence() * 100.0,
            prediction.index(),
            prediction.name(),
            prediction.id(),
        );
    }
    Ok(())
}

§Long clips: chunked inference

Classifier::classify_chunked slides a window over the input and aggregates each chunk’s per-class confidences. The defaults (10 s window, 10 s hop, mean aggregation) match CED’s training setup; tune them for overlap or peak-pooling.

use soundevents::{ChunkAggregation, ChunkingOptions, Classifier};

fn load_long_clip() -> Result<Vec<f32>, Box<dyn std::error::Error>> {
    Ok(vec![0.0; 320_000])
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut classifier = Classifier::from_file("soundevents/models/tiny.onnx")?;
    let samples: Vec<f32> = load_long_clip()?;

    let opts = ChunkingOptions::default()
        // 5 s overlap (50%) between adjacent windows
        .with_hop_samples(80_000)
        // Batch up to 4 equal-length windows per session.run()
        .with_batch_size(4)
        // Keep the loudest detection in any window instead of averaging
        .with_aggregation(ChunkAggregation::Max);

    let top3 = classifier.classify_chunked(&samples, 3, opts)?;
    for prediction in top3 {
        println!("{}: {:.2}", prediction.name(), prediction.confidence());
    }
    Ok(())
}

§Models

The four CED variants are sourced from the mispeech Hugging Face organisation, exported to ONNX, and checked into this repo under soundevents/models/. You should not normally need to download anything — git clone gives you a working classifier out of the box.

VariantFileSizeHugging Face source
tinysoundevents/models/tiny.onnx6.4 MBmispeech/ced-tiny
minisoundevents/models/mini.onnx10 MBmispeech/ced-mini
smallsoundevents/models/small.onnx22 MBmispeech/ced-small
basesoundevents/models/base.onnx97 MBmispeech/ced-base

All four expose the same input/output contract: mono f32 PCM at 16 kHz in, 527-class scores out (SAMPLE_RATE_HZ / NUM_CLASSES). They differ only in parameter count and accuracy/latency trade-off, so you can swap variants without touching application code.

Note — the four ONNX files together are ~135 MB. If you fork this repo and want to keep the working tree slim, consider tracking soundevents/models/*.onnx with git LFS.

§Refreshing models from upstream

If upstream releases new weights, or you cloned without the model files, refetch them with:

# Requires huggingface_hub:  pip install --user huggingface_hub
./scripts/download_models.sh

# Or just one variant
./scripts/download_models.sh tiny

The script downloads the *.onnx artifact from each mispeech/ced-* Hugging Face repo and writes it as soundevents/models/<variant>.onnx.

See THIRD_PARTY_NOTICES.md for upstream model sources and attribution details.

§Bundled tiny model

Enable the bundled-tiny feature to embed models/tiny.onnx into your binary — useful for CLI tools and self-contained services where you don’t want to ship a separate model file.

soundevents = { version = "0.2", features = ["bundled-tiny"] }
use soundevents::{Classifier, Options};

let mut classifier = Classifier::tiny(Options::default())?;

§Features

FeatureDefaultWhat you get
bundled-tinyEmbeds models/tiny.onnx into the crate so Classifier::tiny() works without an external file.

The full input/output contract:

ConstantValueMeaning
SAMPLE_RATE_HZ16_000Required input sample rate (mono f32).
DEFAULT_CHUNK_SAMPLES160_000Default 10 s window/hop for chunked inference.
NUM_CLASSES527Number of CED output classes — derived at compile time from RatedSoundEvent::events().len().

For low-level batching, every clip in predict_raw_scores_batch* / classify_*_batch must be non-empty and have the same sample count. predict_raw_scores_batch_flat returns one row-major Vec<f32>, and predict_raw_scores_batch_into lets callers reuse their own output buffer to avoid per-call result allocations. classify_chunked uses the same equal-length restriction internally when ChunkingOptions::batch_size() > 1, which is naturally satisfied for fixed-size windows and automatically falls back to smaller batches for the final short tail chunk.

§Development

Regenerate the dataset from upstream sources:

cargo xtask codegen

Run the test suite:

cargo test
§License

soundevents is under the terms of both the MIT license and the Apache License (Version 2.0).

See LICENSE-APACHE, LICENSE-MIT for details. Bundled third-party model attributions and source licenses are documented in THIRD_PARTY_NOTICES.md.

Copyright (c) 2026 FinDIT studio authors.

Structs§

ChunkingOptions
Options for chunked inference over long clips.
Classifier
CED sound event classifier.
EventPrediction
A single classification result with both model-space and ontology-space metadata.
Options
Options for constructing a Classifier from an ONNX model on disk.

Enums§

ChunkAggregation
Controls how chunked inference aggregates chunk confidences.
ClassifierError
Errors from Classifier operations.

Constants§

DEFAULT_CHUNK_SAMPLES
The default window size used by the chunked inference helpers: 10 seconds at 16 kHz.
NUM_CLASSES
Number of model output classes.
SAMPLE_RATE_HZ
The expected input sample rate for CED models.