rivet
A modular, GPU-accelerated video transcoding library and command-line
tool, written in Rust. Install the CLI with cargo install rivet-transcoder
(the command is rivet), or add the library with cargo add rivet-transcoder.
rivet takes an arbitrary input file and transcodes it to AV1 β as a
single MP4, a multi-rendition ABR ladder, or a segmented CMAF/HLS package.
The output is fully configurable: you choose the output mode, the codec,
the quality, the container/muxer, and the exact rungs, and you get
an asynchronous progress callback with a uniform per-rung status struct.
It is built from clean-room demuxers, muxers, and hardware-codec dispatch β no FFmpeg required by default (FFmpeg is available as an optional decode backend behind a feature flag).
π Detailed docs live in docs/. Start with
Architecture (the codebase map) and
Design decisions (the why); then
Pipeline (data flow), the per-crate references
(codec decode Β· codec encode Β·
container Β· engine), and the usage guides
(OutputSpec Β· Batch manifest Β·
CLI Β· HTTP API). This README is the quick tour.
Why "rivet"
It fastens generic transcoding logic into a single, reusable component β a library you can embed, a CLI you can run, and an HTTP service you can call.
The usual answer to "just transcode this" is FFmpeg β but FFmpeg is a CLI and a
C library, not a service. There's no job model, no structured per-rendition
progress, no HTTP surface: you shell out, scrape stderr, and build all the
orchestration yourself. rivet ships that part β a configurable job engine, a
uniform async progress callback, and an optional HTTP API (rivet serve) so
another application can signal a transcode over the network and poll it.
Hardware selection is the other half. Getting GPU encode/decode right across
vendors with FFmpeg means hand-picking -hwaccel flags, per-vendor encoder
names, pixel/surface formats, and init options β and it quietly falls back to a
slow software path when any of that is wrong. rivet detects the GPUs, dispatches
to the right framework per vendor (NVDEC/NVENC, AMF, QSV, with an optional FFmpeg
tier), leases them fairly across the ABR ladder, and fails fast instead of
degrading silently.
And it's built to be fast at the ladder. The source is decoded once and
the frames are fanned out to every rendition β a 5-rung ABR ladder decodes the
input one time, not five (the naΓ―ve ffmpeg-per-rung approach decodes it N
times). Encode work is then chunked and leased across all available GPUs
with mid-flight helper dispatch: when a fast rung frees its GPU, the freed lease
picks up another rung's chunks, so a slow rung finishes sooner and throughput
scales close to linearly with GPU count. Single-file output uses the same engine
β chunk-encode the one rendition across the GPUs and stitch the segments back
together losslessly. A per-rung codec invariant keeps cross-vendor chunks
bit-compatible, so an NVENC + QSV mix on the same rendition still decodes
cleanly. Stitched chunks always play (each is an independent IDR-led GOP), and
ChunkSeamMode (CLI --seam-mode, API seam) controls quality across the
seams: Parallel (default, fastest), ParallelConstQp (constant-QP, seam-flat),
or Serial (one encoder, seam-free) β see the CLI reference.
The full data flow β demux β decode-once pump β per-rung scale β multi-GPU lease engine β mux β is documented in docs/pipeline.md (with a diagram and a code map).
"Optimized for web" is a pile of decisions FFmpeg leaves to you. rivet bakes in defaults that just play in a browser (and lets you override them): AV1 (the royalty-clean codec target) + Opus audio, faststart MP4 or segment-aligned CMAF/HLS for ABR, and correct color β HDR tonemapped down to 8-bit SDR BT.709 by policy, so a clip doesn't land eye-searingly bright or washed-out on a viewer's screen. Picking those knobs correctly per source is exactly the expertise rivet encodes so you don't have to.
Quick start
Library β one file in, one file out:
let outcome = transcode_file?;
println!;
CLI β same thing:
The deeper knobs (ladders, HLS, progress, GPU selection) are in Library usage and CLI usage below.
What you configure
A job is described by an OutputSpec:
| Dimension | Type | Choices |
|---|---|---|
| Output mode | OutputMode |
SingleFile, Hls { segment_seconds } |
| Video codec | VideoCodec |
Av1 (the only implemented codec β see note) |
| Audio | AudioPolicy |
Auto (passthrough/transcode), ForceOpus, Drop |
| Container | Container |
Mp4, Cmaf |
| Muxer | Muxer |
Mp4File, CmafHls |
| Rungs | Vec<Rung> |
each Rung = width Γ height + per-rung Quality (crf / speed / target / tier / keyframe interval) |
| GPU policy | EncodePolicy / decode_gpu |
all GPUs / single / pinned / vendor-family, plus a decode-pump GPU override β see GPU scheduling |
Progress is reported through a ProgressSink as
a uniform RungProgress (status, percent,
frames, segments, bytes) per rung β wire it to a closure, a Tokio mpsc channel,
or your own implementation.
Complete reference: Configuring a transcode β the
OutputSpecguide documents every builder method, enum, and field (rungs/quality, audio, color/bit-depth, GPU policy, chunk seams) with examples and how to run a job. The sections below are a tour of the highlights.
Library usage
[]
= { = "https://github.com/elyerinfox/rivet" }
One file in, one file out
let outcome = transcode_file?;
println!;
let info = probe_file?;
println!;
A configurable job with progress
use Arc;
use ;
use RungProgress;
let bytes = read?;
// A 3-rung HLS ladder, 4-second segments, audio auto-handled.
let spec = hls
.with_audio;
// Uniform progress callback (status + percent + counters per rung).
let sink = new;
// `output_dir` is the HLS asset root; `None` uses a temp dir.
let out = run_job_blocking?;
println!;
For an async progress stream, use channel_sink(tx) with a
tokio::sync::mpsc::Sender<RungProgress> and run_job(...).await from inside a
runtime. Derive a sensible ladder from the source with
rivet::standard_ladder(width, height, max_short_side).
Color, bit depth & frame rate
A fully-specified single-file job, picking the codec quality, frame-rate cap, color/tonemap policy, and output bit depth per the table below:
use ;
let spec = single_file
.with_audio
.with_max_frame_rate // cap output cadence at 30 fps
.web_sdr; // BT.709 8-bit SDR, tonemapping any HDR source down (default)
spec.validate?; // rejects e.g. an HDR request on a build with no 10-bit encoder
The .web_sdr() line is a color preset β one call in place of
.with_color(ColorPolicy::TonemapToSdr).with_bit_depth(BitDepth::EightBit).
There are exactly two color/depth knobs: with_color (the ColorPolicy bundles
the gamut and transfer β see Output color & bit
depth) and with_bit_depth. To keep HDR instead of
tonemapping (needs a 10-bit AV1 encoder β nvidia, amd, qsv, or ffmpeg):
let spec = single_file.hdr10; // BT.2020 + PQ, 10-bit β one call
// also: .hlg() Β· .passthrough() Β· or the low-level .with_color(..).with_bit_depth(..)
Jargon, briefly. Gamut = which colors are representable: BT.709 is the standard HD/SDR gamut (what most video uses), BT.2020 is the wider one HDR uses. Transfer = the SDR-vs-HDR brightness curve: PQ (HDR10) and HLG (broadcast HDR). Bit depth is separate and the on-disk pixel format follows from it β 8-bit β
yuv420p, 10-bit βyuv420p10le(always 4:2:0). HDR presets imply 10-bit, so you never set both. See Output color & bit depth.
Choosing GPUs
encode_policy controls how encode spreads across GPUs; decode_gpu overrides
the decode-pump device. See GPU scheduling
for what each policy does.
use ;
// All NVIDIA cards (ignore an integrated AMD/Intel GPU), but decode on GPU 0.
let spec = single_file
.encode_policy
.decode_gpu;
// Or pin everything to one GPU:
let spec = single_file
.encode_policy;
Escape hatch
Need finer control than the engine offers? Reach through the re-exported component crates:
use ;
use CmafVideoMuxer;
CLI usage
Full reference: docs/cli.md β every subcommand, flag, and environment variable. A taste:
# Single MP4 at the source resolution (output defaults to <input>.av1.mp4)
# Explicit rungs β a directory of MP4s
# Auto-derived standard ABR ladder
# CMAF/HLS package with 4-second segments
# Quality + audio knobs
# Inspect without transcoding
# Inspect the host + build
# Stream media in and out (no temp files)
| |
# Convert many files from a YAML/JSON manifest (feature `batch`) β see docs/batch.md
GPU selection (mirrors EncodePolicy / decode_gpu):
Set RUST_LOG=debug for verbose logging. Force an encoder backend with
TRANSCODE_ENCODER_BACKEND=nvenc|amf|qsv.
HTTP API (server feature)
Full reference: docs/api.md β endpoints, the output-spec query params, the job lifecycle, and the OpenAPI/Swagger/Redoc docs.
For a service deployment β where another application signals rivet to
transcode something β build with the server feature and run rivet serve. It
exposes the same engine over HTTP:
POST /v1/transcode takes either a structured JSON body β point at a
server-side input/output file path (or inline base64), with a structured
spec β or a streamed binary body with the spec in query params (so
streaming the media is optional):
Interactive docs ship with it: /swagger (Swagger UI), /redoc (Redoc),
and the raw /openapi.json (OpenAPI 3.0); / links to all three.
GPU scheduling (the rung benefit)
Both HLS and single-file jobs run on a reactive multi-GPU orchestrator
(multigpu) that makes the ladder cheap:
- Decode once. A single decode pump feeds every rung β a 5-rung ladder decodes the source one time, not five.
- Lease pool. A process-wide
GpuPoolhands out one encoder lease per GPU (concurrent NVENC sessions on one context deadlock β this is the load-bearing invariant), so work runs in parallel across GPUs. - Helpers. When a fast unit of work releases its lease, the helper dispatcher grabs the freed lease and attaches an extra worker to a still-busy rung β segments/chunks are the unit of work, so a slow rung finishes sooner.
- Cross-vendor safety. A helper may land on a different GPU vendor (NVENC +
QSV on the same rendition); a per-rung AV1 codec invariant guarantees every
segment shares the
av1Ccontract, and a mismatched helper requeues its chunk and exits without aborting the job.
For single-file output, each rung is chunked at GOP boundaries and the chunks are encoded across the GPUs, then stitched β in segment order, in memory, no disk round-trip β into one MP4 per rung. Because the encoder runs constant-quality (CQP/CRF), independent chunks have no rate-control discontinuity at the seams; each chunk just starts with an IDR. On a single-GPU host (or when the frame count is unknown) it uses the serial decode-once path instead, with no chunk overhead. Either way, a host without AV1-encode silicon fails fast with a clear error.
Encode policy
OutputSpec::encode_policy(..) selects how encode work spreads across GPUs (set
it from the library or the CLI β see above):
| Policy | Single-file | HLS |
|---|---|---|
EncodePolicy::AllGpus (default) |
chunk across all GPUs, stitch | ladder across all GPUs |
EncodePolicy::SingleGpu(None) |
runs on the first GPU | runs on the first GPU |
EncodePolicy::SingleGpu(Some(i)) |
runs on GPU i |
runs on GPU i |
EncodePolicy::Family(GpuFamily::Nvidia) |
chunk across that vendor's GPUs | ladder across that vendor's GPUs |
For SingleGpu both modes run the same way β sequentially on one GPU β they just
reach it differently: single-file takes a lean serial path (no GOP chunking,
nothing to parallelize on one GPU), while HLS always runs the lease-pool
orchestrator (one lease) because its output is inherently segmented. For
AllGpus / Family they genuinely differ: single-file chunks-and-stitches,
HLS ladders-and-segments across the selected GPUs.
The decode pump follows the policy: it is pinned to a GPU from the policy's
selected set (round-robin over those indices for per-rung pumps), so a Family
/ SingleGpu constraint governs decode too, not just encode. Override it
independently with OutputSpec::decode_gpu(Some(i)) β e.g. decode on an
integrated GPU while the discrete GPUs encode.
A note on the output codec
AV1 is the only implemented video codec β it is the project's locked,
royalty-clean target (AV1 + Opus). VideoCodec is an enum so the dimension is
selectable and future codecs can be added without an API break. The encode tier
is GPU-accelerated (NVENC / AMF / QSV).
Compatibility matrix
Input β video decode
GPU decode is feature-gated β each vendor's tier is an opt-in cargo feature, and
ffmpeg adds the software catalogue (incl. ProRes). All decoders plug into the
shared decode pump (create_decoder β push_sample β decode_next).
| Codec | NVDEC nvidia |
AMF amd β |
QSV qsv |
FFmpeg ffmpeg |
|---|---|---|---|---|
| H.264 / AVC | β | β | β | β |
| HEVC / H.265 | β | β | β | β |
| VP8 | β | β | β | β |
| VP9 | β | β | β | β |
| AV1 | β | β | β | β |
| MPEG-2 | β | β | β | β |
| MPEG-4 Part 2 | β | β | β | β |
| ProRes | β | β | β | β |
- NVDEC
nvidiaβ a single, in-repo hand-rolled CUVID FFI decoder (decode/nvdec.rs, dlopen, no external crate). One path for everything NVDEC does: H.264/HEVC/AV1/VP8/VP9, MPEG-2, MPEG-4 Part 2, and 10-bit P016. Builds on both Windows MSVC and Linux. - QSV
qsv(decode/qsv_dec.rs) β hand-rolled oneVPL FFI (our own SDK-mirror code, no external crate). Hardware-verified on 3Γ Intel Arc (H.264 / HEVC / AV1 / VP9, including 10-bit P010 via the oneVPL 2.x internal-allocation +FrameInterface::Mappath). Builds on Windows + Linux. - AMF
amd(decode/amf_dec.rs) β hand-rolled AMF decode FFI. β Verified- by-review only β no AMD card on the dev box yet; tracked in TODO.md.ffmpegis the fallback if the path proves unreliable.
What happens to a 10-bit / HDR source is the ColorPolicy's call, not a
fixed rule (the decode pump never tonemaps on its own): the default
TonemapToSdr maps HDR β 8-bit SDR BT.709 for maximum web compatibility, while
Hdr10 / Hlg / Passthrough keep it 10-bit HDR through to a 10-bit
encoder (NVENC / AMF / QSV / ffmpeg) β see Output color & bit
depth. Decoding 10-bit needs a 10-bit-preserving
decoder: NVIDIA NVDEC decodes 10-bit P016 natively and Intel QSV
decodes 10-bit P010 (both carry 10-bit HEVC Main10 / HDR through), and
ffmpeg decodes 10-bit too.
Output β video encode (by vendor)
rivet encodes AV1 only (the locked, royalty-clean target), 4:2:0. One table
per vendor β rows are codecs (just AV1 today; the layout is ready for more),
columns are the output pixel format. Pair 10-bit with a HDR ColorPolicy
(below) for HDR10/HLG; on its own, 10-bit is higher-precision SDR.
NVENC β NVIDIA Ada+ (nvidia)
| Codec | 8-bit 4:2:0 | 10-bit 4:2:0 |
|---|---|---|
| AV1 | β | β
(Yuv420_10bit) |
AMF β AMD RDNA3+ (amd)
| Codec | 8-bit 4:2:0 | 10-bit 4:2:0 |
|---|---|---|
| AV1 | β | β
(P010) |
QSV β Intel Arc / Meteor Lake+ (qsv)
| Codec | 8-bit 4:2:0 | 10-bit 4:2:0 |
|---|---|---|
| AV1 | β | β (P010) |
FFmpeg (ffmpeg, software + hwaccel)
| Codec | 8-bit 4:2:0 | 10-bit 4:2:0 |
|---|---|---|
| AV1 | β | β |
GPU-only by default β a host with no AV1-encode silicon (and no ffmpeg) fails
fast at encoder construction. 4:2:2 / 4:4:4 and 12-bit are not produced β AV1
Main 4:2:0 is the web-safe profile. All three hardware encoders are
hand-rolled dlopen FFI in-tree (NVENC YUV420_10BIT, AMF P010, QSV oneVPL
P010) and build on Windows + Linux.
Output color & bit depth
Two orthogonal axes: color (with_color(ColorPolicy) β gamut + SDR/HDR
transfer) and bit depth (with_bit_depth(BitDepth) β bits per sample). Most
callers don't touch them directly β the presets bundle both:
.web_sdr() (default), .hdr10(), .hlg(), .passthrough(). The decode pump
tonemaps only when the policy says so (it never decides on its own).
validate() rejects any combination this build can't actually produce:
ColorPolicy |
Tonemap | Output signaling | Bit depth | Needs |
|---|---|---|---|---|
TonemapToSdr (default) |
HDRβSDR | BT.709 SDR | 8-bit | any encoder |
Passthrough |
no | source color verbatim | source | 10-bit encoder if source is 10-bit |
Hdr10 |
no | BT.2020 + PQ (ST 2084) | 10-bit | a 10-bit encoder (below) |
Hlg |
no | BT.2020 + ARIB STD-B67 | 10-bit | a 10-bit encoder (below) |
BitDepth is Auto (follow the color policy β the usual choice), EightBit
(yuv420p), or TenBit (yuv420p10le). 10-bit / HDR output works on
hardware β nvidia, amd, or qsv β no ffmpeg needed β or in
software with ffmpeg (per the per-vendor tables above). The 10-bit output is
web-safe AV1 Main profile (4:2:0), HDR-tagged in the container via the
colr/mdcv/clli atoms, which browsers decode and tonemap. On a build with
no 10-bit encoder, validate() returns a clear error; the capability is
queryable at runtime via codec::encode::build_output_caps().
For web compatibility keep the default β .web_sdr() (i.e. TonemapToSdr +
Auto) yields 8-bit SDR BT.709 AV1, which every browser and device that
supports AV1 plays.
Containers
| Container | Demux (in) | Mux (out) |
|---|---|---|
| MP4 / MOV | β | β (single-file + CMAF) |
| MKV / WebM | β | β |
| MPEG-TS | β | β |
| AVI (+OpenDML >1 GiB) | β | β |
| CMAF / HLS | β | β (segments + master/media playlists) |
Audio
| Codec | Passthrough | Transcode β Opus |
|---|---|---|
| AAC-LC | β | β |
| Opus | β | (kept as-is) |
| AC-3 | β | β |
| E-AC-3 | β | β |
| MP3 | β | β |
| Vorbis | β | β |
AudioPolicy::Auto passes through AAC/Opus/AC-3/E-AC-3, transcodes MP3/Vorbis to
Opus, and drops the rest. ForceOpus produces Opus from any decodable source;
Drop yields video-only output. (Multichannel β₯3ch transcode is not yet
supported and is dropped with a warning.)
Output modes
| Mode | Result |
|---|---|
single |
One self-contained MP4 per rung (faststart, AV1 + audio). |
hls |
A CMAF package: per-rung init.mp4 + seg-*.m4s, a shared audio rendition, a media playlist per rung, and a master.m3u8. |
Crates
| Crate | Responsibility |
|---|---|
codec |
Frame types, pixel formats, GPU detection, decode (NVDEC / QSV / optional FFmpeg), AV1 encode (NVENC / AMF / QSV), colorspace + HDRβSDR tonemap, audio decode/encode, probe. |
container |
Demuxers (MP4/MOV/MKV/WebM/TS/AVI), AV1 MP4 muxer with audio, fragmented-MP4 (CMAF) writers, HLS playlist generation, bounded-RSS streaming demuxer. |
rivet |
The configurable job engine (run_job), the output spec, the progress sink, the multi-GPU engine, the ABR ladder helper, the shared decode_pump, plus simple transcode/probe helpers and the rivet CLI. Re-exports codec + container. |
Building
The default build links native libraries, so it needs a C toolchain plus:
- nasm β x86 assembly for the codec stack.
- CMake + a C/C++ compiler β builds libopus (Opus audio encode). Also builds
Intel oneVPL when the
qsvfeature is enabled.
On Windows the project links the static MSVC CRT (see .cargo/config.toml). With
a modern CMake (4.x) you may need CMAKE_POLICY_VERSION_MINIMUM=3.5 so libopus's
older CMakeLists.txt configures.
Optional features
| Feature | Adds |
|---|---|
nvidia |
NVENC AV1 hardware encoder + NVDEC decoder, hand-rolled dlopen FFI (nvEncodeAPI / CUVID). NVIDIA Ada+ for AV1 encode. |
amd |
AMF AV1 hardware encoder, hand-rolled dlopen FFI. AMD RDNA3+. (AMD decode β ffmpeg.) |
qsv |
Intel QSV AV1 hardware encoder, hand-rolled dlopen oneVPL FFI (8-bit + 10-bit). Intel Arc / Meteor Lake+. (Intel decode β ffmpeg.) |
ffmpeg |
libavcodec as the primary decode path (full software catalogue + Vulkan/NVDEC/D3D11/VAAPI hwaccel + AV1 software encode). Needs FFmpeg β₯7.0 dev libs + LLVM/libclang. |
thumbnail |
rivet::thumbnail::generate_thumbnail β capture a frame and encode an AVIF still (pulls ravif/rav1e). |
batch |
rivet batch β a YAML/JSON manifest DSL to convert many files in one run (pulls serde + a YAML/JSON parser + glob). See docs/batch.md. |
server |
HTTP transcode API (rivet serve) β an axum webserver so another app can signal transcodes over the network. See HTTP API. |
ipc |
rivet ipc β a Unix-domain-socket server for streaming media in/out (Unix only at runtime). rivet pipe needs no feature. See CLI. |
The hardware encoders are opt-in. All three are hand-rolled dlopen FFI
in-tree β no external wrapper crates, no bindgen, no build-time SDK link β so
they build on both Windows MSVC and Linux (cargo build --features nvidia
etc. works on either). A default build has no hardware encoder; enable nvidia
/ amd / qsv (or ffmpeg) for your target silicon. Decode is in-tree for
all three vendors too β NVDEC (nvidia), AMF (amd), and QSV (qsv), the same
hand-rolled-FFI approach β with ffmpeg as the cross-vendor fallback.
License
Open Encoding Attribution License v1.0 β a source-available license (not OSI "open source"). It is royalty-free for every use. Personal, hobby, nonprofit/academic/research, government, and purely-internal for-profit use are free with no further obligation beyond keeping the existing notices. Shipping it in a commercial product or running it as a commercial service (the "hosted transcoder" case) is also permitted, but must display attribution per Β§5. All distribution must keep existing notices and carry the NOTICE file (Β§4). Includes a patent grant with defensive termination (Β§3). Not GPL-compatible. See LICENSE.md for the full terms and the use-case gist table.
All GPU hardware FFI is hand-rolled in-tree (mirroring the vendor SDK headers); no third-party GPU wrapper crates are used.