oxideav-videotoolbox 0.0.3

macOS VideoToolbox hardware decode/encode bridge for the oxideav framework — runtime-loaded via libloading, no compile-time framework dep
Documentation

oxideav-videotoolbox

macOS VideoToolbox hardware decode/encode bridge for the oxideav framework.

Why a bridge crate?

Apple's VideoToolbox exposes the dedicated media engine on Apple Silicon (and the equivalent IP on Intel Macs). For codecs the chip supports natively this is 5-50× faster than software decoding and orders of magnitude more energy-efficient.

This crate is a thin runtime-loaded bridge — no compile-time link dependency on VideoToolbox, no Objective-C / Swift. The framework is opened via [libloading] on first use.

Fallback behaviour

Two distinct failure paths fall back automatically to the pure-Rust codec:

  1. Load failure — older macOS, missing framework, sandboxed environment without VT entitlements. register() logs and returns without registering, so the SW codec is the only candidate at dispatch.
  2. Init failureVTDecompressionSessionCreate / VTCompressionSessionCreate returns a non-zero OSStatus for the requested parameters. Common triggers: stream above the device's max resolution, hardware encoder slot already busy (concurrent-session cap), unsupported pixel format, codec profile the device doesn't accelerate. The factory returns Err; the registry's make_decoder_with / make_encoder_with retries the next-priority impl (typically the SW one).

Pipelines that require hardware (e.g. real-time low-latency capture where the SW path can't keep up) can opt out of the SW fallback by setting CodecPreferences { require_hardware: true, .. } — the registry will then surface the OSStatus error instead of degrading silently.

Platform gating

The whole crate is #![cfg(target_os = "macos")]. On Linux / Windows it compiles to an empty rlib; the umbrella oxideav crate gates the register call behind the same cfg.

Priority

Hardware factories register with CodecCapabilities::with_priority(10)lower numbers win at resolution time, so on macOS hardware paths are preferred over the pure-Rust impls (which sit at priority 100+).

Opt-out

Users who want to force the pure-Rust path globally can pass --no-hwaccel to the oxideav CLI; this sets CodecPreferences { no_hardware: true }, which the pipeline forwards to make_decoder_with / make_encoder_with so HW factories are skipped at dispatch time. The runtime context still registers VT — oxideav list shows the *_videotoolbox rows regardless of the flag — only resolution is biased.

Coverage roadmap

Codec Decode (M-series) Encode (M-series) Status
H.264 hardware hardware wired (≈ 51 dB PSNR_Y)
HEVC hardware hardware wired (≈ 54 dB PSNR_Y)
ProRes hardware hardware wired (≈ 52 dB PSNR_Y)
JPEG (MJPEG) hardware hardware wired (≈ 36 dB PSNR_Y)
MPEG-2 hardware — (no VT encoder) wired (≈ 61 dB PSNR_Y, decode-only)
VP9 hardware (M1+) — (no VT encoder) wired (decode-only)
MPEG-4 Pt 2 hardware — (no VT encoder) wired (decode-only, VOL→ESDS extension atoms, ≈ 72 dB PSNR_Y)
AV1 hardware (M3+) / VT-internal SW elsewhere hardware (M3+) decode wired (round 8); encode roadmap

Round 1: scaffolding. Round 2: H.264 + HEVC decode + encode. Round 3: JPEG (MJPEG) + ProRes decode + encode via a shared blob-codec module (blob.rs) — single-blob frames built on CMVideoFormatDescriptionCreate(width, height, codecType) rather than the parameter-set extraction H.264/HEVC need. Round 4: MPEG-2 video decode (kCMVideoCodecType_MPEG2Video) — decode-only, since VideoToolbox exposes an MPEG-2 decoder but no encoder; an elementary-stream framer (FrameSplit::Mpeg2Es) carves the incoming bitstream into per-picture access units. Round 5: VP9 decode (kCMVideoCodecType_VP9 = 'vp09') — decode-only (no VT VP9 encoder); hardware decode on M1+ Apple Silicon, with VT falling back to software on older Macs that lack the dedicated VP9 IP. VP9 frames are container-framed (IVF / Matroska / MP4) so FrameSplit::Whole applies unchanged — no per-picture splitter is needed. Round 6: MPEG-4 Part 2 video decode (kCMVideoCodecType_MPEG4Video = 'mp4v') — the DivX / Xvid / Visual ASP / SP family, not H.264 (which is MPEG-4 Part 10 and ships via 'avc1'). Decode-only as well: VideoToolbox exposes no MPEG-4 Pt 2 compression session. A new FrameSplit::Mpeg4PartTwoEs framer splits the elementary stream on VOP start codes (00 00 01 B6) and attaches preceding VOS / Visual Object / VO / VOL / GOV / user-data headers to the first VOP. Round 7: VOL→ESDS extension-atom path — on hosts where VT enforces VOL-via-extradata for MPEG-4 Pt 2, the decoder extracts the configuration prefix (everything up to the first VOP start code) from the first packet, wraps it in a proper ISO/IEC 14496-1 ES_Descriptor + DecoderConfigDescriptor + DecoderSpecificInfo blob (ObjectTypeIndication = 0x20, streamType = VisualStream), and supplies it to CMVideoFormatDescriptionCreate via kCMFormatDescriptionExtension_SampleDescriptionExtensionAtoms keyed by "esds". Measured PSNR_Y vs ffmpeg's software decode jumps to ≈ 72.8 dB (sample-exact within IDCT tolerance) on the integration fixture. Round 8 (this commit): AV1 video decode (kCMVideoCodecType_AV1 = 'av01') — decode-only for now; hardware decode is gated to Apple Silicon M3+, with VT falling back to its internal software AV1 path on older hardware where that path exists (and to oxideav-av1 via the registry's SW-fallback elsewhere). AV1 frames are container-framed (IVF / Matroska / MP4 / WebM / RTP) so the existing FrameSplit::Whole path applies unchanged. Remaining roadmap: AV1 encode (the M3+ 'av01' compression session, macOS 14+), plus an optional av1C extension-atom path analogous to round 7's ESDS for hosts that require the AV1 Sequence Header out-of-band.

Round 8 implementation notes

  • AV1 decode wiring via VideoToolbox. make_av1_decoder registers a decoder against CodecId::new("av1") (tags av01 / AV01 / V_AV1) backed by a BlobDecoder over kCMVideoCodecType_AV1 = 'av01' = 0x6176_3031. The factory is decode-only — a VT AV1 compression session exists on macOS 14+ for M3+ hardware but needs its own callback/pixel-buffer wiring and is a follow-up round.
  • Hardware path is M3+; older silicon falls back gracefully. Apple Silicon M3 (and later) carries dedicated AV1 hardware-decode IP that VT routes to automatically. On M1 / M2 and Intel hosts VideoToolbox either takes its internal software AV1 path (where the macOS build includes one) or returns a non-zero OSStatus at VTDecompressionSessionCreate. The registry's lower-priority pure-Rust oxideav-av1 decoder is the fallback in the latter case — the round-8 wiring does not invent a software path, it just exposes the hardware path when present.
  • Framing is FrameSplit::Whole — no in-codec splitter. AV1 is container-framed (IVF / Matroska / MP4 / WebM / RTP). Each demuxed Packet carries exactly one AV1 temporal unit (one or more OBUs that together compose a single decoded frame), and that temporal unit goes straight into a CMSampleBuffer for VT. AV1 has no Annex-B / picture-start-code mechanism that would require an in-codec carve like MPEG-2's or MPEG-4 Part 2's.
  • Configuration record (av1C) is a follow-up round. AV1 in ISOBMFF / Matroska carries an av1C configuration record whose payload is the AV1 Sequence Header OBU (per the AV1 ISOBMFF mapping under docs/container/mpeg4/av1-isobmff/). On hosts where VT requires the Sequence Header via kCMFormatDescriptionExtension_SampleDescriptionExtensionAtoms rather than extracted from the first packet, supplying the av1C blob via the extension atoms is the same pattern as MPEG-4 Part 2's ESDS wired in round 7 — the round-7 ESDS plumbing in BlobDecoder::ensure_session already supports an arbitrary extension-atom key, so adding av1C is a small follow-up once a host needs it.
  • Validated against ffmpeg as a black-box. av1_decode_against_ffmpeg asks ffmpeg -c:v libaom-av1 -f ivf to produce a 320×240 / 10-frame gradient stream (the same shape as the VP9 fixture), parses the IVF container (32-byte file header + per-frame 12-byte header + payload), feeds each frame as one Packet through make_av1_decoder, and compares to ffmpeg's own software decode at PSNR_Y ≥ 30 dB. The test self-skips when ffmpeg / libaom-av1 / the framework / the VT AV1 decoder is unavailable on the host — older OS, M1/M2 hosts without a VT AV1 fallback, ffmpeg builds without libaom-av1.
  • Two new unit/integration tests. register_installs_av1_decode_only (decoder registered for CodecId::new("av1"), no encoder); av1_codec_type_is_av01_fourcc (the codec-type constant equals u32::from_be_bytes(b"av01") = 0x6176_3031). Plus the integration av1_decode_against_ffmpeg covering the end-to-end IVF→VT→PSNR path.

Round 7 implementation notes

  • VOL→ESDS extension-atom path for MPEG-4 Part 2. Round 6 wired the decoder against (codec_type, width, height) only — fine on hosts where VT's MPEG-4 Pt 2 decoder is lenient about extracting the VOL from the bitstream prefix, but on stricter VT hosts session creation returned kVTVideoDecoderBadDataErr (the registry then fell back to the pure-Rust impl). Round 7 closes that gap. Before opening the session, BlobDecoder (under FrameSplit::Mpeg4PartTwoEs) sniffs the leading bytes of the first packet up to but not including the first VOP start code (00 00 01 B6), wraps them in a full ISO/IEC 14496-1 ESDS descriptor (ES_Descriptor 0x03 → DecoderConfigDescriptor 0x04 [ObjectTypeIndication 0x20 = MPEG-4 Visual, streamType 0x04 = VisualStream] → DecoderSpecificInfo 0x05 = VOL bytes, with a 1-byte SLConfigDescriptor 0x06 predefined=2 tail), and feeds the result through kCMFormatDescriptionExtension_SampleDescriptionExtensionAtoms = { "esds": CFData } on the format-description creation call. The ffmpeg-fixture decode test (mpeg4_part_two_decode_against_ffmpeg) now reaches ≈ 72.8 dB PSNR_Y vs ffmpeg's reference decode on the gradient fixture (10/10 frames returned).
  • Hosts that don't need the ESDS extension still work. The extraction is best-effort: if the first packet starts with a VOP (no headers to capture) or the buffer has no VOP marker, extract_mpeg4_part_two_vol returns None and the decoder falls back to the round-6 plain (codec_type, width, height) path. Existing JPEG / ProRes / MPEG-2 / VP9 framers go through the same code path without an extensions dictionary, so their session creation is byte-for-byte unchanged from round 6.
  • ISO/IEC 14496-1 reference. The ESDS descriptor structure is documented in docs/container/mpeg4/ISO_IEC_14496-1-System-2010.pdf §7.2.6; the oxideav-mp4 muxer uses the same shape for AAC mp4a sample entries, so the BER length helper and field layout are consistent with the rest of the workspace.
  • CoreFoundation symbol added. sys.rs now resolves CFDataCreate and exposes a cf_data(vt, &[u8]) helper that copies the slice into a fresh CFData. Used only by the MPEG-4 Pt 2 path so far; available for any future codec needing a binary-blob extension atom (FLAC dfLa, AV1 av1C, …).
  • Six new unit tests. mpeg4_extract_vol_* (5 cases) cover the VOL-prefix extractor (returns prefix before VOP, includes GOV/user-data, None when no VOP, None when buffer starts with VOP, None on empty buffer); esds_* (5 cases) cover the ESDS builder (FullBox header zeros, ES_Descriptor tag, DCD tag + ObjectTypeIndication + streamType byte, DSI carries the VOL verbatim, SLC predefined = 2).

Round 6 implementation notes

  • MPEG-4 Part 2 is decode-only. VideoToolbox ships an MPEG-4 Part 2 decoder (historically used for DivX / Xvid playback) but no MPEG-4 Pt 2 compression session, so make_mpeg4_part_two_decoder registers a decoder against CodecId::new("mpeg4") (tags mp4v / MP4V / M4S2 / m4s2 / DIVX / divx / DX50 / XVID / xvid / FMP4 / fmp4 / V_MPEG4/ISO/ASP) and there is deliberately no matching encoder factory. This is MPEG-4 Pt 2 — distinct from MPEG-4 Pt 10 (H.264), which uses 'avc1' and stays on its own CodecId::new("h264") row.
  • Elementary-stream framer. Like MPEG-2 ES, an MPEG-4 Pt 2 ES is not pre-framed. FrameSplit::Mpeg4PartTwoEs splits on the VOP start code (00 00 01 B6), attaching any leading VOS (B0) / Visual Object (B5) / VO (00..1F) / VOL (20..2F) / GOV (B3) / user-data (B2) bytes to the first VOP so the VOL travels with it. This is intrinsic bitstream framing (codec's job, not container's).
  • Session-creation caveat. VT's MPEG-4 Pt 2 decoder typically requires the VOL configuration to be supplied via kCMFormatDescriptionExtension_* extension atoms (the ESDS DecoderSpecificInfo shape), not extracted from the bitstream as it would be for MPEG-2. Building the format description from just (codec_type, width, height) therefore can return -12909 / kVTVideoDecoderBadDataErr at session create. When that happens the registry's SW fallback takes over and the pure-Rust MPEG-4 Pt 2 decoder handles the stream. A follow-up round will extract the VOL prefix from the elementary stream and supply it via the format-description extension atoms to enable the hardware path on those hosts.
  • Validated against ffmpeg as a black-box. The decode test generates an MPEG-4 Pt 2 elementary stream with ffmpeg -c:v mpeg4 -f m4v, feeds it through the VT bridge, and (when the session creates successfully) compares to ffmpeg's own software decode at PSNR_Y ≥ 30 dB. The test self-skips when ffmpeg or the framework is unavailable, and self-skips gracefully when the session-creation caveat above triggers — so CI on a runner without VOL-extradata leniency still passes.
  • Splitter unit tests. Four new unit tests cover the MPEG-4 Pt 2 access-unit splitter: single VOP with headers, two VOPs with first inheriting headers, no-VOP-found pass-through, and a regression test that confirms non-VOP start codes (B0, B3) don't trigger spurious splits.

Round 5 implementation notes

  • VP9 is decode-only. VideoToolbox ships a VP9 decoder (M1+) but no VP9 compression session, so make_vp9_decoder registers a decoder against CodecId::new("vp9") (tags vp09 / VP90 / V_VP9) and there is deliberately no matching encoder factory.
  • Container-framed, no in-codec splitter. Unlike MPEG-2's elementary-stream input, VP9 has no Annex-B / picture-start-code mechanism — frames are framed by the surrounding container (IVF, Matroska, MP4) and arrive as one self-contained payload per Packet. BlobDecoder is therefore instantiated with FrameSplit::Whole; bytes flow straight from Packet::data into the CMSampleBuffer without any in-codec carving.
  • Validated against ffmpeg. The decode test asks ffmpeg -c:v libvpx-vp9 -f ivf to produce a 320×240 / 10-frame gradient stream, parses the IVF container (32-byte file header + per-frame 12-byte header + payload) to recover individual VP9 frames, feeds each as one Packet, and compares to ffmpeg's own software decode (PSNR_Y ≥ 30 dB threshold). The test self-skips when ffmpeg / libvpx-vp9 / the framework / the VT VP9 decoder is unavailable — older OS, Intel Macs without VP9 IP, ffmpeg builds without libvpx.

Round 4 implementation notes

  • MPEG-2 is decode-only. VideoToolbox ships an MPEG-2 decoder but no MPEG-2 compression session, so make_mpeg2_decoder registers a decoder against CodecId::new("mpeg2video") (tags mp2v / MPG2 / mpg2 / hdv2 / m2v1 / V_MPEG2) and there is deliberately no matching encoder factory.
  • Elementary-stream framer. Unlike the container-framed JPEG/ProRes path (one Packet == one frame), an MPEG-2 elementary stream is not pre-framed. BlobDecoder gained a FrameSplit mode; FrameSplit::Mpeg2Es splits on the picture start code (00 00 01 00), attaching any leading sequence (b3) / GOP (b8) / extension (b5) headers to the first picture so VT can size the decoder. This is intrinsic bitstream framing (the codec's job), not container parsing.
  • Validated against ffmpeg as a black-box. The decode test generates an MPEG-2 elementary stream with ffmpeg (opaque validator), decodes it through VideoToolbox, and compares the result to ffmpeg's own software decode: PSNR_Y ≈ 61 dB on a 320×240 / 10-frame gradient. The test self-skips when ffmpeg or the framework is unavailable.

Round 3 implementation notes

  • Blob codec module (src/blob.rs)BlobDecoder and BlobEncoder share one VTDecompression/VTCompression driver for every codec whose format description is (width, height, codecType) with no parameter sets. Currently used by MJPEG ('jpeg') and ProRes ('apcn').
  • Pixel-format adaptive callback — VT decoders return different CVPixelBuffer formats depending on the codec: H.264/HEVC honour the NV12 destination-attribute request ('420v'), but ProRes returns 16-bit biplanar 4:2:2 ('sv22') regardless. The blob decoder callback inspects CVPixelBufferGetPixelFormatType and dispatches to one of four converters: NV12 ('420v'/'420f'), packed UYVY ('2vuy'), packed YUY2 ('yuvs'), or biplanar 16-bit 4:2:2 ('sv22').
  • ProRes profile selection — defaults to ProRes 422 ('apcn') for both encode and decode. The decoder format description carries the codec-type, and VT internally dispatches to the right ProRes flavour when it sees the frame header ('icpf' magic at offset 4). Explicit profile selection via CodecParameters::tag is a future-round item.
  • Roundtrip tests use a smooth diagonal gradient — the previous test pattern (col + row/2 + frame*10) % 255 had a modulo-wraparound discontinuity that JPEG's DCT could not represent without ~10 dB of error. The new gradient (clipped to video-range [16, 235]) reaches ≥ 36 dB on every codec.

Workspace policy

Calling a system OS framework via FFI is the same shape as calling libc::malloc — it's the platform, not a copied algorithm. The workspace's clean-room rule (no embedding source from libvpx, libwebp, libjxl, etc.) does not apply to this crate.

License

MIT.