Expand description
Process-wide GPU reservation pool.
Each detected GPU is a slot. Callers claim() an available slot
and hold the returned GpuLease for the duration of their work;
Drop releases the slot back to the pool. The lease’s
gpu_index field is the device index the work should run on.
Concurrency model: one variant per GPU at any time. With N GPUs and M waiters, the first N waiters get leases immediately and the remaining M−N park on the semaphore until a lease drops. This is the deliberate design decision from 2026-05-02 — concurrent NVENC sessions on the same CUDA context deadlocked at session ~5/5 init, GPU went idle, no frames encoded. One-encoder-per-GPU is the load-bearing invariant; the pool’s role is to enforce it while still letting variants run in parallel ACROSS GPUs.
CPU-only hosts (no GPUs detected): claim() returns None
immediately — callers fall back to CPU encode without queuing.
Structs§
- GpuLease
- RAII guard returned by
GpuPool::claim. The slot is released (and the underlying semaphore permit dropped) when this value is dropped — typically at the end of the variant’s encode task. - GpuLease
Entry - Snapshot of one GPU slot’s lease state at a moment in time.
Returned by
GpuPool::snapshot_leasesfor Phase 2 worker_load reporting. Field shape matchesqueue::WsGpuLeaseEntryso the caller can map across without a wire-format-aware translation. - GpuPool