oxideav-opus
Pure-Rust Opus audio codec — RFC 6716 bitstream + RFC 7845 Ogg mapping. SILK + CELT + Hybrid decode (mono + stereo) plus encoders for a CELT-only full-band path, the full SILK-only config matrix (NB / MB / WB, mono + stereo, 10 / 20 / 40 / 60 ms), and Hybrid 20 ms mono + stereo (SWB / FB). Zero C dependencies.
Part of the oxideav framework but usable standalone.
Installation
[]
= "0.1"
= "0.1"
= "0.0"
Status
Decode
- CELT-only frames at every bandwidth — Narrowband (4 kHz), Wideband (8 kHz), Superwideband (12 kHz), Fullband (20 kHz).
- CELT frame sizes: 2.5 / 5 / 10 / 20 ms (config 16–31).
- SILK-only frames at NB / MB / WB (8 / 12 / 16 kHz internal rate).
- SILK frame sizes: 10 / 20 / 40 / 60 ms (config 0–11).
- Hybrid frames (SILK + CELT, RFC 6716 §4.4) — SILK-WB covers the 0..8 kHz low band; CELT starts at band 17 (the 8 kHz edge) and fills 8..12 kHz (SWB) or 8..20 kHz (FB) on the same range-coded bitstream. All four configs (12 = SWB 10 ms, 13 = SWB 20 ms, 14 = FB 10 ms, 15 = FB 20 ms) decode mono and stereo.
- Stereo: CELT, SILK, and Hybrid stereo paths — SILK includes the mid/side unmixing filter with prediction-weight interpolation.
- Mono: CELT, SILK, and Hybrid mono paths.
- Framing codes 0, 1, 2, 3 — single-frame, paired-equal, paired-variable, and VBR/CBR multi-frame packets (RFC 6716 §3.2).
- Silence / DTX packets (0 / 1-byte frames) emit correctly-sized silence.
- CELT-frame silence flag decoded per RFC 6716 §4.3.
- Output: 48 kHz, S16 PCM, 1 or 2 channels.
OpusHeadidentification packet parsing (RFC 7845 §5.1), channel mapping family 0, and the raw mapping-table bytes for families 1 / 2.
Encode
Two explicit entry points, one per Opus mode:
-
CELT-only, Fullband, 20 ms, 48 kHz (
OpusEncoder::new/OpusEncoder::new_celt_only_full_band).- Packet layout: TOC byte
config = 31+ CELT bitstream, framing code 0 (single frame per packet). - Mono input is encoded as-is.
- Stereo input is downmixed to mono on the way in — the underlying CELT encoder is mono-only today, so the TOC stereo bit is set to zero and the per-channel detail is lost. The signal survives end-to-end and the decoder splats it back across two channels when asked.
- Input sample rate: 48 kHz only. Any other rate returns
Error::Unsupported— resample upstream.
- Packet layout: TOC byte
-
SILK-only, full config matrix (configs 0..=11), mono and stereo, 10 / 20 / 40 / 60 ms frames — 24 named constructors, one per (bandwidth, channels, duration) tuple:
- 20 ms mono:
SilkEncoder::new_nb_mono_20ms(config 1),new_mb_mono_20ms(config 5),new_wb_mono_20ms(config 9). - 20 ms stereo:
new_nb_stereo_20ms/new_mb_stereo_20ms/new_wb_stereo_20ms(configs 1 / 5 / 9 + stereo bit). - 10 ms mono + stereo: configs 0 / 4 / 8. Each embedded SILK frame has 2 sub-frames instead of 4.
- 40 ms mono + stereo: configs 2 / 6 / 10. Packet carries 2 back-to- back 20 ms SILK frame bodies per RFC §4.2.4 (still framing code 0).
- 60 ms mono + stereo: configs 3 / 7 / 11. 3 back-to-back bodies.
Each constructor accepts either the SILK internal rate (8 / 12 / 16 kHz for NB / MB / WB) or 48 kHz; 48 kHz input is downsampled by a simple box-average pre-filter.
- 20 ms mono:
-
Hybrid (SILK + CELT) 20 ms and 10 ms at SWB and FB, mono and stereo (
HybridEncoder::new_{swb,fb}_{mono,stereo}_{10,20}ms) — TOC configs 12 / 13 (SWB 10 / 20 ms) and 14 / 15 (FB 10 / 20 ms). Per RFC 6716 §4.4 the SILK part runs WB (16 kHz internal, covering 0..8 kHz) regardless of TOC bandwidth; the CELT part starts at band 17 (the 8 kHz edge) and covers 8..12 kHz (SWB) or 8..20 kHz (FB) on the same range-coded bitstream — the CELT body is appended to the in-flightRangeEncoderafter the SILK body so the whole packet is one shared arithmetic stream, exactly what the decoder expects.At 20 ms the SILK frame encoder uses 4 sub-frames and the CELT high-band runs at LM=3 (960-sample MDCT). At 10 ms the SILK frame encoder uses 2 sub-frames and the CELT high-band runs at LM=2 (480-sample MDCT) via
oxideav_celt::CeltEncoder::new_with_frame_samples(_, 480).Stereo Hybrid runs a mid/side pair of WB SILK frame encoders for the low band (with the RFC §4.2.7.1 prediction header — weights shipped as (0, 0) for this MVP) and a dual-stereo CELT high-band via
oxideav_celt::CeltEncoder::encode_hybrid_body_stereo. Packets are capped at the RFC 6716 §3.2.1 1275-byte per-frame limit so libopus / ffmpeg accept them.Input: 48 kHz mono or stereo.
Round-trip through our own decoder on a 300 Hz low-band tone:
- SWB 20 ms hybrid mono: ~24 dB low-band SNR
- FB 20 ms hybrid mono: ~24 dB low-band SNR
- SWB 20 ms hybrid stereo (300 Hz L / 400 Hz R): ~23 dB L / ~23 dB R
Cross-decode through libopus / ffmpeg: SWB / FB mono and stereo at both 10 ms and 20 ms decode without error to non-trivial PCM.
The high band is exercised by swept-sine tests (
hybrid_*_sweep_*) that confirm both the < 4 kHz (SILK) and > 8 kHz (CELT) regions carry recovered energy after a round-trip — including per-channel on stereo Hybrid.Stereo paths feed a mid/side pair into two SILK frame encoders and emit the RFC §4.2.7.1 prediction header per embedded 20 ms SILK frame (weights are shipped as 0 for this pass — enough for a clean round-trip, see follow-up list below).
Packet layout: TOC byte + SILK bitstream. Always framing code 0; 40 / 60 ms packets use the RFC §4.2.4 multi-SILK-frame-per-Opus-frame mechanism rather than framing codes 1/2/3.
- Analysis-by-synthesis design: each per-bandwidth SilkFrameEncoder
runs the same LPC filter the decoder reconstructs from the NLSF
stage-1 index (shared BandwidthParams descriptor: NB/MB use LPC
order 10, WB uses LPC order 16), computes the residual sample-by-
sample against the decoder's reconstructed past, and emits
quantised residual magnitudes. Round-trip SNR through our own
decoder clears 20 dB on speech-like tones — typical measured values
(see
encoder_roundtrip.rs):- NB mono: ~24 dB
- MB mono: ~25 dB
- WB mono: ~29 dB
- NB stereo: ~31 dB (L) / ~27 dB (R) at all of 10 / 20 / 40 / 60 ms
- MB stereo: ~36 dB (L) / ~31 dB (R) at 20 ms
- WB stereo: ~43 dB (L) / ~33 dB (R) at 20 ms
- Bitstream layout follows RFC 6716 §4.2 header order (frame type →
gains → NLSF → LTP (skipped for unvoiced) → LCG seed → excitation);
the excitation body uses an MVP carrier format documented in
src/silk/excitation.rs(nibble-pair + sign per sample in place of the RFC's shell-pulse split). Byte-exact parity with libopus'silk_encbit-stream is a tracked follow-up.
-
Input sample formats (all encoders):
S16,S16P,F32,F32P.
Not yet supported
- SILK LBRR redundancy frames — the LBRR flags are parsed (so the
range coder stays aligned) but the redundancy payload itself is not
yet decoded. Packets that enable LBRR return
Error::Unsupported. - Channel mapping family 1 / 2 (Vorbis / ambisonic multistream, more than 2 channels).
- SILK stereo predictor — the stereo encoder currently emits
prediction weights of (0, 0). Wiring the full Wiener-filter analysis
path in
silk::encoder::stereo_predict_weights_q13is a follow-up (the function is already in place; the remaining work is subtracting the predicted side from the coded side before it reaches the SilkFrameEncoder). - 10 ms Hybrid (configs 12 / 14) — 20 ms mono + stereo Hybrid is wired (configs 13 / 15); 10 ms Hybrid needs the LM=2 CELT encoder path which still runs LM=3 only.
- Voiced / LTP-path SILK encoding — the encoder emits
signal_type = unvoicedon every frame so the LTP loop-back is not exercised; this still round-trips speech-like tones at ≥ 20 dB SNR but gives up the pitch-prediction gain that voiced LTP provides. - CELT encoding of 2.5 / 5 / 10 ms frames, 40 / 60 ms multi-frame packets, and framing codes 1 / 2 / 3 on the encoder side.
- Native CELT stereo encoding (coupled L/R PVQ with intensity and
dual-stereo) — tracked in
oxideav-celt. - Bit-exact CELT PVQ + IMDCT output. The current CELT decoder
preserves energy (roughly 90 % of the input energy on a 1 kHz sine
round-trip) but the reconstructed waveform phase can drift vs libopus.
The round-trip PSNR bar in the integration tests is ~8 dB today —
good enough to prove encode+decode work end-to-end, short of the
25+ dB a bit-exact decoder would give. Tracked in
oxideav-celtmodule docs.
Usage
Decode
use ;
use ;
let mut codecs = new;
register;
let mut params = audio;
params.channels = Some;
params.sample_rate = Some;
let mut dec = codecs.make_decoder?;
let opus_packet_bytes: = read_opus_packet_bytes;
let pkt = new;
dec.send_packet?;
if let Audio = dec.receive_frame?
#
# Ok::
For Opus-in-Ogg, pull packets via the oxideav-ogg demuxer first; the
first Ogg packet is the OpusHead which this crate parses with
oxideav_opus::parse_opus_head.
Encode (CELT-only, 48 kHz)
use Encoder;
use ;
use ;
let mut params = audio;
params.channels = Some;
params.sample_rate = Some;
let mut enc = new?;
// One Opus frame = 960 samples at 48 kHz = 20 ms.
let pcm_s16 = vec!;
let frame = Audio;
enc.send_frame?;
let pkt = enc.receive_packet?;
// pkt.data[0] is the TOC byte: (31 << 3) | (stereo_bit << 2) | 0
# Ok::
Encode (SILK-only, NB mono, 20 ms)
Analogous constructors exist for MB mono (new_mb_mono_20ms, 12 kHz
internal), WB mono (new_wb_mono_20ms, 16 kHz internal) and NB stereo
(new_nb_stereo_20ms, 8 kHz internal, 2-channel input).
use Encoder;
use ;
use ;
let mut params = audio;
params.channels = Some;
params.sample_rate = Some; // 8 000 Hz
let mut enc = new_nb_mono_20ms?;
// One SILK NB frame at the internal rate = 160 samples = 20 ms.
let pcm_s16 = vec!;
let frame = Audio;
enc.send_frame?;
// One Opus packet per 20 ms of input.
let pkt = enc.receive_packet?;
// pkt.data[0] is the TOC byte: (1 << 3) | 0 — SILK NB 20 ms mono.
# Ok::
Codec IDs and capabilities
- Codec ID:
"opus"(registered viaoxideav_opus::register). - The capability entry reports
max_channels = 2andmax_sample_rate = 48_000, which matches what the decoder + encoder actually accept today.
License
MIT — see LICENSE.