Module multigpu

Expand description

Multi-GPU reactive variant phase — the rung benefit.

Decode the source once and dynamically schedule every rung’s CMAF segments across all available GPUs using a fair lease pool with mid-flight helper dispatch:

  decode pump (decode once)
       │  fan out normalized frames
       ▼
  per-rung scaler ──► SegmentChunkQueue ──► encoder worker (holds a GpuLease)
                                       ──► helper worker (claims a freed lease)

One encoder per GPU at a time (GpuPool enforces it — concurrent NVENC sessions on one context deadlock).
A fast rung releases its lease early; the helper dispatcher grabs the freed lease and attaches an extra worker to a still-busy rung, so a slow rung finishes sooner. Segment work is the unit of parallelism.
Helpers may land on a different GPU vendor than the rung’s first worker; the per-rung AV1 codec invariant (RungCodecInvariant) guarantees every contributed segment shares the av1C contract, so a cross-vendor (NVENC + QSV) rendition still decodes cleanly. A mismatched helper requeues its chunk and exits — the run never aborts on it.

Storage/transport specifics stay out of the engine: progress is reported through the generic ProgressSink, so a consumer can layer an uploader (object storage, a status queue, …) on top by watching RungStatus::Completed.

Structs§

MultiGpuParams: Inputs to run_multigpu_hls.
RungManifest: One rung’s finalized CMAF manifest.
RungPackets: One rung’s full ordered AV1 packet stream, stitched from chunks encoded across GPUs. The caller muxes these into a single MP4 (+ audio).

Functions§

detect_gpu_pool: Build a GpuPool from the host’s detected GPU inventory.
gpu_pool_for_policy: Build a GpuPool constrained to the given EncodePolicy. An empty pool (e.g. a pinned index or vendor family that isn’t present) yields capacity 0, so the orchestrator’s pre-flight probe / lease claim surfaces a clear error.
policy_gpu_indices: The GPU indices an EncodePolicy selects, in detection order. Used to pin the decode pump to a device consistent with the policy (so decode honors a Family / SingleGpu constraint, not just encode).
run_multigpu_hls: Run the reactive multi-GPU variant phase. Returns one Option<RungManifest> per rung (in rung order); None means the rung produced no segments.
run_multigpu_single_file: Single-file counterpart to run_multigpu_hls: decode once, fan to per-rung scalers, and dynamically schedule each rung’s GOP-sized chunks across all GPUs (fair lease pool + mid-flight helper dispatch + cross-vendor codec invariant). Each worker encodes its chunk to packets (a fresh encoder per chunk → first frame is an IDR); the finalizer concatenates them in segment order into one ordered packet stream per rung — no disk round-trip.
serial_gpu_for_policy: The GPU index to pin a serial (single-GPU) encode/decode to under a policy: None (auto/first-available) for AllGpus, the pinned index for SingleGpu, the first device of the vendor for Family.

Module multigpu

Module multigpu Copy item path

Structs§

Functions§

Module multigpu