soundevents-dataset 0.2.0

Audio Set Ontology aims to provide a comprehensive set of categories to describe sound events.
Documentation

Typed, zero-allocation Rust access to Google's AudioSet sound-event taxonomy. Two views are available — the full 632-entry ontology and the 527-class rated label set used by released AudioSet models — both baked in at compile time as &'static data, with case-insensitive perfect-hash lookup.

Installation

[dependencies]
soundevents-dataset = "0.2"

By default this pulls in the rated module — the 527-class label set used by released AudioSet/YAMNet/VGGish models. To use the ontology view instead (or in addition), pick the features explicitly:

# Just the full AudioSet ontology, no rated set.
soundevents-dataset = { version = "0.2", default-features = false, features = ["std", "ontology"] }

# Both views.
soundevents-dataset = { version = "0.2", features = ["ontology"] }

Two views, two modules

Module Source Entries Use when…
rated class_labels_indices.csv 527 You're working with model outputs / multi-hot label tensors. Each entry carries its index so the position in a 527-vector resolves to a name in O(1).
ontology ontology.json 632 You need the full taxonomy, including abstract container nodes ("Human voice", "Music", …) and the 105 entries that aren't in the released rated set.

The two are independent: each lives in its own module, has its own &'static consts, its own perfect-hash map, and its own type (SoundEvent vs RatedSoundEvent). Enable only what you need to keep the binary small.

rated — AudioSet rated label set (527 entries)

RatedSoundEvent exposes the same metadata accessors as SoundEvent (id, name, description, aliases, citation_uri, children, restrictions) plus a rated-only index() — the integer 0..527 used as the position in released AudioSet models' output vectors. Walking children() stays inside the rated namespace: any ontology child that is not in the rated set is dropped, so the hierarchy remains self-consistent.

Case-insensitive, separator-distinct lookup

from_key is keyed by UncasedStr, so any case form of an alias resolves to the same entry without us having to enumerate every possibility:

Separator styles are still indexed independently ("man speaking""man_speaking""man-speaking""manSpeaking"), so you only pay for the four shapes the codegen actually emits — every case variant of each shape collapses into one phf bucket.

Features

Feature Default What you get
std Standard library + std-dependent error reporting via thiserror. Disable for no_std.
rated The rated module (527 entries, ~1900 phf keys).
ontology The ontology module (632 entries, ~2400 phf keys).
alloc Opt-in alloc support for no_std targets with an allocator.
serde Derives Serialize for SoundEvent, RatedSoundEvent, and Restriction.

The crate is #![no_std]-compatible (default-features = false). The entire dataset lives in &'static memory: no allocations, no startup cost, and the perfect-hash map is generated by phf_codegen at codegen time so the dataset crate's compile graph contains no proc-macros from phf.

Regenerating the dataset

src/ontology/generated.rs and src/rated/generated.rs are checked in and produced from assets/ontology.json and assets/class_labels_indices.csv by an xtask binary. After updating either source file or the codegen logic, regenerate both with one command:

cargo xtask codegen

License

soundevents-dataset is under the terms of both the MIT license and the Apache License (Version 2.0).

See LICENSE-APACHE, LICENSE-MIT for details. Bundled AudioSet metadata attribution and upstream license details are documented in THIRD_PARTY_NOTICES.md.

Copyright (c) 2026 FinDIT studio authors.