Typed, zero-allocation Rust access to Google's AudioSet sound-event taxonomy. Two views are available — the full 632-entry ontology and the 527-class rated label set used by released AudioSet models — both baked in at compile time as &'static data, with case-insensitive perfect-hash lookup.
Installation
[]
= "0.2"
By default this pulls in the rated module — the 527-class label set used by released AudioSet/YAMNet/VGGish models. To use the ontology view instead (or in addition), pick the features explicitly:
# Just the full AudioSet ontology, no rated set.
= { = "0.2", = false, = ["std", "ontology"] }
# Both views.
= { = "0.2", = ["ontology"] }
Two views, two modules
| Module | Source | Entries | Use when… |
|---|---|---|---|
rated |
class_labels_indices.csv |
527 | You're working with model outputs / multi-hot label tensors. Each entry carries its index so the position in a 527-vector resolves to a name in O(1). |
ontology |
ontology.json |
632 | You need the full taxonomy, including abstract container nodes ("Human voice", "Music", …) and the 105 entries that aren't in the released rated set. |
The two are independent: each lives in its own module, has its own &'static consts, its own perfect-hash map, and its own type (SoundEvent vs RatedSoundEvent). Enable only what you need to keep the binary small.
rated — AudioSet rated label set (527 entries)
RatedSoundEvent exposes the same metadata accessors as SoundEvent (id, name, description, aliases, citation_uri, children, restrictions) plus a rated-only index() — the integer 0..527 used as the position in released AudioSet models' output vectors. Walking children() stays inside the rated namespace: any ontology child that is not in the rated set is dropped, so the hierarchy remains self-consistent.
Case-insensitive, separator-distinct lookup
from_key is keyed by UncasedStr, so any case form of an alias resolves to the same entry without us having to enumerate every possibility:
Separator styles are still indexed independently ("man speaking" ≠ "man_speaking" ≠ "man-speaking" ≠ "manSpeaking"), so you only pay for the four shapes the codegen actually emits — every case variant of each shape collapses into one phf bucket.
Features
| Feature | Default | What you get |
|---|---|---|
std |
✓ | Standard library + std-dependent error reporting via thiserror. Disable for no_std. |
rated |
✓ | The rated module (527 entries, ~1900 phf keys). |
ontology |
The ontology module (632 entries, ~2400 phf keys). |
|
alloc |
Opt-in alloc support for no_std targets with an allocator. |
|
serde |
Derives Serialize for SoundEvent, RatedSoundEvent, and Restriction. |
The crate is #![no_std]-compatible (default-features = false). The entire dataset lives in &'static memory: no allocations, no startup cost, and the perfect-hash map is generated by phf_codegen at codegen time so the dataset crate's compile graph contains no proc-macros from phf.
Regenerating the dataset
src/ontology/generated.rs and src/rated/generated.rs are checked in and produced from assets/ontology.json and assets/class_labels_indices.csv by an xtask binary. After updating either source file or the codegen logic, regenerate both with one command:
License
soundevents-dataset is under the terms of both the MIT license and the
Apache License (Version 2.0).
See LICENSE-APACHE, LICENSE-MIT for details. Bundled AudioSet metadata attribution and upstream license details are documented in THIRD_PARTY_NOTICES.md.
Copyright (c) 2026 FinDIT studio authors.