rust_h265
A pure Rust H.265 / HEVC video decoder.
Status: functional. Main and Main 10 profile HEVC (8-bit and 10-bit 4:2:0) decodes end-to-end with CTU 16/32/64, I/P/B slices (including hierarchical B), WPP, tiles, dependent slice segments, SAO, deblocking, AQ (
cu_qp_delta), scaling lists, sign-data hiding, weighted prediction, transform skip, transquant bypass, constrained intra prediction, and PCM. 127 tests pass. Byte-exact against FFmpeg on every in-tree fixture plus real 1080p Big Buck Bunny at x265 presetsultrafast/medium/slowin both 8-bit and 10-bit. No threading or SIMD yet — ~5× slower than single-threaded FFmpeg on 8-bit content, ~2.5× on 10-bit; seeBENCHMARK.md.
While working on rust_media it became clear that there isn't a sufficiently good open source software HEVC decoder that ships as a standalone library. FFmpeg has one, but it isn't split out. Most devices have hardware HEVC decoders, but a portable software fallback is still useful when you want one binary that runs anywhere.
A pure Rust H.264 decoder (rust_h264) already exists in this style. HEVC is a substantively different codec, not an extension of H.264, so this is a fresh implementation rather than a port.
Design
- Input: Annex B bytestream (start code delimited
00 00 00 01/00 00 01). HVCC (length-prefixed, used in MP4) is not supported — callers must convert to Annex B before feeding data to the decoder. - Streaming:
Decoder::decode_nal(&[u8]) -> Result<Option<Frame>, DecodeError>plusflush(). NAL units are fed incrementally and decoded frames are emitted as they become available, in decode order (callers re-sort by POC for display). - Performance: The decoder aims to be fast, with FFmpeg's software HEVC decoder as the target benchmark. Current gap vs single-threaded FFmpeg: ~5× on 8-bit, ~2.5× on 10-bit real 1080p content. No NEON / SSE kernels yet.
- Multi-bit-depth: 8-bit and 10-bit (Main / Main 10 profile) via a generic
Pixeltrait. 12-bit infrastructure is in place but untested. Pixel planes arePixelData::U8(Vec<u8>)orPixelData::U16(Vec<u16>)— checkframe.bit_depthto determine which. - Pure Rust, no
unsafein the current codebase.unsafewill be reserved for SIMD paths once they land.
Usage
The public API mirrors rust_h264:
use ;
let h265_data = read.unwrap;
let nals = parse_annex_b;
let mut decoder = new;
for nal in &nals
// Flush the last buffered frame
if let Some = decoder.flush
Important: frame ordering
decode_nal returns frames in decode order, not display order. With B-frames, the decoder must buffer reference frames before it can decode the B-frames that depend on them. The output order differs from the intended display order.
To display frames correctly, sort by pic_order_cnt (POC). If the stream has multiple IDR boundaries (GOPs), also track IDR boundaries to avoid mixing frames from different GOPs:
use NalUnitType;
let mut idr_count: u32 = 0;
let mut frames = Vecnew;
for nal in &nals
if let Some = decoder.flush
// Sort by (GOP, POC) for display order.
frames.sort_by_key;
Common pitfall: IDR count timing
The most common mistake is incrementing idr_count before calling decode_nal. This causes the last frame of each GOP to be placed after the next IDR in display order, producing a visible glitch at every scene cut.
Wrong:
if is_idr
let frame = decoder.decode_nal?;
// frame belongs to the OLD GOP but gets the NEW idr_count
Correct:
let frame = decoder.decode_nal?;
// Push frame with current idr_count first
if is_idr
Differences from H.264
If you're coming from rust_h264, the major changes are not API-level — the public surface is essentially the same — but the codec internals are nearly entirely different. A few user-visible differences worth knowing:
- NAL header is 2 bytes, not 1.
nal_unit_typeis 6 bits, with VPS=32, SPS=33, PPS=34. IDR pictures are types 19 (IDR_W_RADL) and 20 (IDR_N_LP), not type 5. - VPS (Video Parameter Set) exists in addition to SPS and PPS.
- Block structure is a quad-tree of Coding Tree Units (typically 64×64) instead of fixed 16×16 macroblocks.
- More intra modes (35: planar, DC, 33 angular) and larger transforms (up to 32×32, plus 4×4 DST for intra luma).
- SAO (Sample Adaptive Offset) is a new in-loop filter on top of deblocking.
- Tiles, WPP, and slice segments add picture-level parallelism options absent in H.264.
These do not change how you call the decoder; they change what the decoder has to implement internally.
Tools
# Play an H.265 file in a window (press Escape to quit):
# Decode to raw YUV in display order (8-bit: yuv420p, 10-bit: yuv420p10le):
# Throughput measurement on a single file:
# Real-world benchmark matrix (downloads Big Buck Bunny, transcodes with
# x265 at several presets in 8-bit and 10-bit, runs both rust_h265 and
# FFmpeg on each):
Testing
127 tests covering the full feature matrix — 8-bit and 10-bit, CTU 16/32/64,
I/P/B slices, WPP, tiles, dependent slices, SAO, deblocking, transform skip,
transquant bypass, constrained intra, PCM, multi-slice, and more. All
in-tree fixtures are byte-exact against FFmpeg; tests that need >1 MB of
reference output (e.g. 1080p) use a SHA-256 hash of the decoded planes. The
bench_realworld example covers real 1080p Big Buck Bunny in both 8-bit
and 10-bit and requires ffmpeg and x265 on $PATH.
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.