oxideav-videotoolbox
macOS VideoToolbox hardware decode/encode bridge for the oxideav framework.
Why a bridge crate?
Apple's VideoToolbox exposes the dedicated media engine on Apple Silicon (and the equivalent IP on Intel Macs). For codecs the chip supports natively this is 5-50× faster than software decoding and orders of magnitude more energy-efficient.
This crate is a thin runtime-loaded bridge — no compile-time link dependency on VideoToolbox, no Objective-C / Swift. The framework is opened via [libloading] on first use.
Fallback behaviour
Two distinct failure paths fall back automatically to the pure-Rust codec:
- Load failure — older macOS, missing framework, sandboxed environment without VT entitlements.
register()logs and returns without registering, so the SW codec is the only candidate at dispatch. - Init failure —
VTDecompressionSessionCreate/VTCompressionSessionCreatereturns a non-zeroOSStatusfor the requested parameters. Common triggers: stream above the device's max resolution, hardware encoder slot already busy (concurrent-session cap), unsupported pixel format, codec profile the device doesn't accelerate. The factory returnsErr; the registry'smake_decoder_with/make_encoder_withretries the next-priority impl (typically the SW one).
Pipelines that require hardware (e.g. real-time low-latency capture where the SW path can't keep up) can opt out of the SW fallback by setting CodecPreferences { require_hardware: true, .. } — the registry will then surface the OSStatus error instead of degrading silently.
Platform gating
The whole crate is #![cfg(target_os = "macos")]. On Linux / Windows it compiles to an empty rlib; the umbrella oxideav crate gates the register call behind the same cfg.
Priority
Hardware factories register with CodecCapabilities::with_priority(10) — lower numbers win at resolution time, so on macOS hardware paths are preferred over the pure-Rust impls (which sit at priority 100+).
Opt-out
Users who want to force the pure-Rust path globally can pass --no-hwaccel to the oxideav CLI; this sets CodecPreferences { no_hardware: true }, which the pipeline forwards to make_decoder_with / make_encoder_with so HW factories are skipped at dispatch time. The runtime context still registers VT — oxideav list shows the *_videotoolbox rows regardless of the flag — only resolution is biased.
Coverage roadmap
| Codec | Decode (M-series) | Encode (M-series) | Status |
|---|---|---|---|
| H.264 | hardware | hardware | wired (≈ 51 dB PSNR_Y) |
| HEVC | hardware | hardware | wired (≈ 54 dB PSNR_Y) |
| ProRes | hardware | hardware | wired (≈ 52 dB PSNR_Y) |
| JPEG (MJPEG) | hardware | hardware | wired (≈ 36 dB PSNR_Y) |
| MPEG-2 | hardware | — (no VT encoder) | wired (≈ 61 dB PSNR_Y, decode-only) |
| VP9 | hardware (M1+) | — (no VT encoder) | wired (decode-only) |
| MPEG-4 Pt 2 | hardware | — (no VT encoder) | wired (decode-only, VOL→ESDS extension atoms, ≈ 72 dB PSNR_Y) |
| AV1 | hardware (M3+) / VT-internal SW elsewhere | hardware (M3+) | decode wired (round 8); encode roadmap |
Round 1: scaffolding. Round 2: H.264 + HEVC decode + encode. Round 3: JPEG (MJPEG) + ProRes decode + encode via a shared blob-codec module (blob.rs) — single-blob frames built on CMVideoFormatDescriptionCreate(width, height, codecType) rather than the parameter-set extraction H.264/HEVC need. Round 4: MPEG-2 video decode (kCMVideoCodecType_MPEG2Video) — decode-only, since VideoToolbox exposes an MPEG-2 decoder but no encoder; an elementary-stream framer (FrameSplit::Mpeg2Es) carves the incoming bitstream into per-picture access units. Round 5: VP9 decode (kCMVideoCodecType_VP9 = 'vp09') — decode-only (no VT VP9 encoder); hardware decode on M1+ Apple Silicon, with VT falling back to software on older Macs that lack the dedicated VP9 IP. VP9 frames are container-framed (IVF / Matroska / MP4) so FrameSplit::Whole applies unchanged — no per-picture splitter is needed. Round 6: MPEG-4 Part 2 video decode (kCMVideoCodecType_MPEG4Video = 'mp4v') — the DivX / Xvid / Visual ASP / SP family, not H.264 (which is MPEG-4 Part 10 and ships via 'avc1'). Decode-only as well: VideoToolbox exposes no MPEG-4 Pt 2 compression session. A new FrameSplit::Mpeg4PartTwoEs framer splits the elementary stream on VOP start codes (00 00 01 B6) and attaches preceding VOS / Visual Object / VO / VOL / GOV / user-data headers to the first VOP. Round 7: VOL→ESDS extension-atom path — on hosts where VT enforces VOL-via-extradata for MPEG-4 Pt 2, the decoder extracts the configuration prefix (everything up to the first VOP start code) from the first packet, wraps it in a proper ISO/IEC 14496-1 ES_Descriptor + DecoderConfigDescriptor + DecoderSpecificInfo blob (ObjectTypeIndication = 0x20, streamType = VisualStream), and supplies it to CMVideoFormatDescriptionCreate via kCMFormatDescriptionExtension_SampleDescriptionExtensionAtoms keyed by "esds". Measured PSNR_Y vs ffmpeg's software decode jumps to ≈ 72.8 dB (sample-exact within IDCT tolerance) on the integration fixture. Round 8 (this commit): AV1 video decode (kCMVideoCodecType_AV1 = 'av01') — decode-only for now; hardware decode is gated to Apple Silicon M3+, with VT falling back to its internal software AV1 path on older hardware where that path exists (and to oxideav-av1 via the registry's SW-fallback elsewhere). AV1 frames are container-framed (IVF / Matroska / MP4 / WebM / RTP) so the existing FrameSplit::Whole path applies unchanged. Remaining roadmap: AV1 encode (the M3+ 'av01' compression session, macOS 14+), plus an optional av1C extension-atom path analogous to round 7's ESDS for hosts that require the AV1 Sequence Header out-of-band.
Round 8 implementation notes
- AV1 decode wiring via VideoToolbox.
make_av1_decoderregisters a decoder againstCodecId::new("av1")(tagsav01 / AV01 / V_AV1) backed by aBlobDecoderoverkCMVideoCodecType_AV1 = 'av01' = 0x6176_3031. The factory is decode-only — a VT AV1 compression session exists on macOS 14+ for M3+ hardware but needs its own callback/pixel-buffer wiring and is a follow-up round. - Hardware path is M3+; older silicon falls back gracefully. Apple Silicon M3 (and later) carries dedicated AV1 hardware-decode IP that VT routes to automatically. On M1 / M2 and Intel hosts VideoToolbox either takes its internal software AV1 path (where the macOS build includes one) or returns a non-zero
OSStatusatVTDecompressionSessionCreate. The registry's lower-priority pure-Rustoxideav-av1decoder is the fallback in the latter case — the round-8 wiring does not invent a software path, it just exposes the hardware path when present. - Framing is
FrameSplit::Whole— no in-codec splitter. AV1 is container-framed (IVF / Matroska / MP4 / WebM / RTP). Each demuxedPacketcarries exactly one AV1 temporal unit (one or more OBUs that together compose a single decoded frame), and that temporal unit goes straight into aCMSampleBufferfor VT. AV1 has no Annex-B / picture-start-code mechanism that would require an in-codec carve like MPEG-2's or MPEG-4 Part 2's. - Configuration record (
av1C) is a follow-up round. AV1 in ISOBMFF / Matroska carries anav1Cconfiguration record whose payload is the AV1 Sequence Header OBU (per the AV1 ISOBMFF mapping underdocs/container/mpeg4/av1-isobmff/). On hosts where VT requires the Sequence Header viakCMFormatDescriptionExtension_SampleDescriptionExtensionAtomsrather than extracted from the first packet, supplying theav1Cblob via the extension atoms is the same pattern as MPEG-4 Part 2's ESDS wired in round 7 — the round-7 ESDS plumbing inBlobDecoder::ensure_sessionalready supports an arbitrary extension-atom key, so addingav1Cis a small follow-up once a host needs it. - Validated against ffmpeg as a black-box.
av1_decode_against_ffmpegasksffmpeg -c:v libaom-av1 -f ivfto produce a 320×240 / 10-frame gradient stream (the same shape as the VP9 fixture), parses the IVF container (32-byte file header + per-frame 12-byte header + payload), feeds each frame as onePacketthroughmake_av1_decoder, and compares to ffmpeg's own software decode at PSNR_Y ≥ 30 dB. The test self-skips when ffmpeg / libaom-av1 / the framework / the VT AV1 decoder is unavailable on the host — older OS, M1/M2 hosts without a VT AV1 fallback, ffmpeg builds without libaom-av1. - Two new unit/integration tests.
register_installs_av1_decode_only(decoder registered forCodecId::new("av1"), no encoder);av1_codec_type_is_av01_fourcc(the codec-type constant equalsu32::from_be_bytes(b"av01")=0x6176_3031). Plus the integrationav1_decode_against_ffmpegcovering the end-to-end IVF→VT→PSNR path.
Round 7 implementation notes
- VOL→ESDS extension-atom path for MPEG-4 Part 2. Round 6 wired the decoder against
(codec_type, width, height)only — fine on hosts where VT's MPEG-4 Pt 2 decoder is lenient about extracting the VOL from the bitstream prefix, but on stricter VT hosts session creation returnedkVTVideoDecoderBadDataErr(the registry then fell back to the pure-Rust impl). Round 7 closes that gap. Before opening the session,BlobDecoder(underFrameSplit::Mpeg4PartTwoEs) sniffs the leading bytes of the first packet up to but not including the first VOP start code (00 00 01 B6), wraps them in a full ISO/IEC 14496-1 ESDS descriptor (ES_Descriptor 0x03 → DecoderConfigDescriptor 0x04 [ObjectTypeIndication 0x20 = MPEG-4 Visual, streamType 0x04 = VisualStream] → DecoderSpecificInfo 0x05 = VOL bytes, with a 1-byteSLConfigDescriptor 0x06 predefined=2tail), and feeds the result throughkCMFormatDescriptionExtension_SampleDescriptionExtensionAtoms = { "esds": CFData }on the format-description creation call. The ffmpeg-fixture decode test (mpeg4_part_two_decode_against_ffmpeg) now reaches ≈ 72.8 dB PSNR_Y vs ffmpeg's reference decode on the gradient fixture (10/10 frames returned). - Hosts that don't need the ESDS extension still work. The extraction is best-effort: if the first packet starts with a VOP (no headers to capture) or the buffer has no VOP marker,
extract_mpeg4_part_two_volreturnsNoneand the decoder falls back to the round-6 plain(codec_type, width, height)path. Existing JPEG / ProRes / MPEG-2 / VP9 framers go through the same code path without an extensions dictionary, so their session creation is byte-for-byte unchanged from round 6. - ISO/IEC 14496-1 reference. The ESDS descriptor structure is documented in
docs/container/mpeg4/ISO_IEC_14496-1-System-2010.pdf§7.2.6; theoxideav-mp4muxer uses the same shape for AACmp4asample entries, so the BER length helper and field layout are consistent with the rest of the workspace. - CoreFoundation symbol added.
sys.rsnow resolvesCFDataCreateand exposes acf_data(vt, &[u8])helper that copies the slice into a freshCFData. Used only by the MPEG-4 Pt 2 path so far; available for any future codec needing a binary-blob extension atom (FLACdfLa, AV1av1C, …). - Six new unit tests.
mpeg4_extract_vol_*(5 cases) cover the VOL-prefix extractor (returns prefix before VOP, includes GOV/user-data,Nonewhen no VOP,Nonewhen buffer starts with VOP,Noneon empty buffer);esds_*(5 cases) cover the ESDS builder (FullBox header zeros, ES_Descriptor tag, DCD tag + ObjectTypeIndication + streamType byte, DSI carries the VOL verbatim, SLCpredefined = 2).
Round 6 implementation notes
- MPEG-4 Part 2 is decode-only. VideoToolbox ships an MPEG-4 Part 2 decoder (historically used for DivX / Xvid playback) but no MPEG-4 Pt 2 compression session, so
make_mpeg4_part_two_decoderregisters a decoder againstCodecId::new("mpeg4")(tagsmp4v / MP4V / M4S2 / m4s2 / DIVX / divx / DX50 / XVID / xvid / FMP4 / fmp4 / V_MPEG4/ISO/ASP) and there is deliberately no matching encoder factory. This is MPEG-4 Pt 2 — distinct from MPEG-4 Pt 10 (H.264), which uses'avc1'and stays on its ownCodecId::new("h264")row. - Elementary-stream framer. Like MPEG-2 ES, an MPEG-4 Pt 2 ES is not pre-framed.
FrameSplit::Mpeg4PartTwoEssplits on the VOP start code (00 00 01 B6), attaching any leading VOS (B0) / Visual Object (B5) / VO (00..1F) / VOL (20..2F) / GOV (B3) / user-data (B2) bytes to the first VOP so the VOL travels with it. This is intrinsic bitstream framing (codec's job, not container's). - Session-creation caveat. VT's MPEG-4 Pt 2 decoder typically requires the VOL configuration to be supplied via
kCMFormatDescriptionExtension_*extension atoms (the ESDSDecoderSpecificInfoshape), not extracted from the bitstream as it would be for MPEG-2. Building the format description from just(codec_type, width, height)therefore can return-12909 / kVTVideoDecoderBadDataErrat session create. When that happens the registry's SW fallback takes over and the pure-Rust MPEG-4 Pt 2 decoder handles the stream. A follow-up round will extract the VOL prefix from the elementary stream and supply it via the format-description extension atoms to enable the hardware path on those hosts. - Validated against ffmpeg as a black-box. The decode test generates an MPEG-4 Pt 2 elementary stream with
ffmpeg -c:v mpeg4 -f m4v, feeds it through the VT bridge, and (when the session creates successfully) compares to ffmpeg's own software decode at PSNR_Y ≥ 30 dB. The test self-skips when ffmpeg or the framework is unavailable, and self-skips gracefully when the session-creation caveat above triggers — so CI on a runner without VOL-extradata leniency still passes. - Splitter unit tests. Four new unit tests cover the MPEG-4 Pt 2 access-unit splitter: single VOP with headers, two VOPs with first inheriting headers, no-VOP-found pass-through, and a regression test that confirms non-VOP start codes (B0, B3) don't trigger spurious splits.
Round 5 implementation notes
- VP9 is decode-only. VideoToolbox ships a VP9 decoder (M1+) but no VP9 compression session, so
make_vp9_decoderregisters a decoder againstCodecId::new("vp9")(tagsvp09 / VP90 / V_VP9) and there is deliberately no matching encoder factory. - Container-framed, no in-codec splitter. Unlike MPEG-2's elementary-stream input, VP9 has no Annex-B / picture-start-code mechanism — frames are framed by the surrounding container (IVF, Matroska, MP4) and arrive as one self-contained payload per
Packet.BlobDecoderis therefore instantiated withFrameSplit::Whole; bytes flow straight fromPacket::datainto theCMSampleBufferwithout any in-codec carving. - Validated against ffmpeg. The decode test asks
ffmpeg -c:v libvpx-vp9 -f ivfto produce a 320×240 / 10-frame gradient stream, parses the IVF container (32-byte file header + per-frame 12-byte header + payload) to recover individual VP9 frames, feeds each as onePacket, and compares to ffmpeg's own software decode (PSNR_Y ≥ 30 dB threshold). The test self-skips when ffmpeg / libvpx-vp9 / the framework / the VT VP9 decoder is unavailable — older OS, Intel Macs without VP9 IP, ffmpeg builds without libvpx.
Round 4 implementation notes
- MPEG-2 is decode-only. VideoToolbox ships an MPEG-2 decoder but no MPEG-2 compression session, so
make_mpeg2_decoderregisters a decoder againstCodecId::new("mpeg2video")(tagsmp2v / MPG2 / mpg2 / hdv2 / m2v1 / V_MPEG2) and there is deliberately no matching encoder factory. - Elementary-stream framer. Unlike the container-framed JPEG/ProRes path (one
Packet== one frame), an MPEG-2 elementary stream is not pre-framed.BlobDecodergained aFrameSplitmode;FrameSplit::Mpeg2Essplits on the picture start code (00 00 01 00), attaching any leading sequence (b3) / GOP (b8) / extension (b5) headers to the first picture so VT can size the decoder. This is intrinsic bitstream framing (the codec's job), not container parsing. - Validated against ffmpeg as a black-box. The decode test generates an MPEG-2 elementary stream with
ffmpeg(opaque validator), decodes it through VideoToolbox, and compares the result to ffmpeg's own software decode: PSNR_Y ≈ 61 dB on a 320×240 / 10-frame gradient. The test self-skips when ffmpeg or the framework is unavailable.
Round 3 implementation notes
- Blob codec module (
src/blob.rs) —BlobDecoderandBlobEncodershare one VTDecompression/VTCompression driver for every codec whose format description is(width, height, codecType)with no parameter sets. Currently used by MJPEG ('jpeg') and ProRes ('apcn'). - Pixel-format adaptive callback — VT decoders return different
CVPixelBufferformats depending on the codec: H.264/HEVC honour the NV12 destination-attribute request ('420v'), but ProRes returns 16-bit biplanar 4:2:2 ('sv22') regardless. The blob decoder callback inspectsCVPixelBufferGetPixelFormatTypeand dispatches to one of four converters: NV12 ('420v'/'420f'), packed UYVY ('2vuy'), packed YUY2 ('yuvs'), or biplanar 16-bit 4:2:2 ('sv22'). - ProRes profile selection — defaults to ProRes 422 (
'apcn') for both encode and decode. The decoder format description carries the codec-type, and VT internally dispatches to the right ProRes flavour when it sees the frame header ('icpf'magic at offset 4). Explicit profile selection viaCodecParameters::tagis a future-round item. - Roundtrip tests use a smooth diagonal gradient — the previous test pattern
(col + row/2 + frame*10) % 255had a modulo-wraparound discontinuity that JPEG's DCT could not represent without ~10 dB of error. The new gradient (clipped to video-range[16, 235]) reaches ≥ 36 dB on every codec.
Workspace policy
Calling a system OS framework via FFI is the same shape as calling libc::malloc — it's the platform, not a copied algorithm. The workspace's clean-room rule (no embedding source from libvpx, libwebp, libjxl, etc.) does not apply to this crate.
License
MIT.