Skip to main content

Module gpu_pool

Module gpu_pool 

Source
Expand description

Process-wide GPU reservation pool.

Each detected GPU is a slot. Callers claim() an available slot and hold the returned GpuLease for the duration of their work; Drop releases the slot back to the pool. The lease’s gpu_index field is the device index the work should run on.

Concurrency model: one variant per GPU at any time. With N GPUs and M waiters, the first N waiters get leases immediately and the remaining M−N park on the semaphore until a lease drops. This is the deliberate design decision from 2026-05-02 — concurrent NVENC sessions on the same CUDA context deadlocked at session ~5/5 init, GPU went idle, no frames encoded. One-encoder-per-GPU is the load-bearing invariant; the pool’s role is to enforce it while still letting variants run in parallel ACROSS GPUs.

CPU-only hosts (no GPUs detected): claim() returns None immediately — callers fall back to CPU encode without queuing.

Structs§

GpuLease
RAII guard returned by GpuPool::claim. The slot is released (and the underlying semaphore permit dropped) when this value is dropped — typically at the end of the variant’s encode task.
GpuLeaseEntry
Snapshot of one GPU slot’s lease state at a moment in time. Returned by GpuPool::snapshot_leases for Phase 2 worker_load reporting. Field shape matches queue::WsGpuLeaseEntry so the caller can map across without a wire-format-aware translation.
GpuPool