pub struct MockGpuMonitor { /* private fields */ }Expand description
A GpuMonitor backed by an in-memory Vec<GpuSnapshot>.
MockGpuMonitor is the workhorse of Concerto’s test suite: it lets tests
declare the exact fleet they want to see (including edge cases like
overheating GPUs, GPUs with ECC errors, or GPUs that disappear mid-run) and
then drive the system under test through those scenarios.
Cloning is cheap — the monitor is backed by an Arc<RwLock<_>> so clones
share the same underlying state. This lets a test hold a handle for
mutation while also passing the monitor into the system under test.
The lock is a std::sync::RwLock (not tokio::sync::RwLock) so that
GpuMonitor::gpu_count, which is synchronous, can read the length
without entering the async runtime. Critical sections are always tiny
(a clone, a field write, or a retain) so blocking here is harmless.
Implementations§
Source§impl MockGpuMonitor
impl MockGpuMonitor
Sourcepub fn new(snapshots: Vec<GpuSnapshot>) -> Self
pub fn new(snapshots: Vec<GpuSnapshot>) -> Self
Create a mock monitor from an explicit list of snapshots.
Sourcepub fn with_healthy_gpus(count: usize, memory_per_gpu_gb: u64) -> Self
pub fn with_healthy_gpus(count: usize, memory_per_gpu_gb: u64) -> Self
Create a mock monitor reporting count healthy GPUs, each with
memory_per_gpu_gb gigabytes of VRAM, zero utilisation, zero memory
used, 40 degrees Celsius, and no ECC errors.
Sourcepub async fn set_memory_used(&self, gpu_id: GpuId, bytes: ByteSize)
pub async fn set_memory_used(&self, gpu_id: GpuId, bytes: ByteSize)
Overwrite the memory_used field of the GPU with the given id.
No-op if the GPU is not present (e.g. it has been removed via
MockGpuMonitor::remove_gpu).
Sourcepub async fn set_temperature(&self, gpu_id: GpuId, celsius: u32)
pub async fn set_temperature(&self, gpu_id: GpuId, celsius: u32)
Overwrite the temperature_celsius field of the GPU with the given id.
Sourcepub async fn inject_ecc_error(&self, gpu_id: GpuId)
pub async fn inject_ecc_error(&self, gpu_id: GpuId)
Increment the uncorrected ECC error count of the GPU with the given id by one.
Sourcepub async fn remove_gpu(&self, gpu_id: GpuId)
pub async fn remove_gpu(&self, gpu_id: GpuId)
Remove a GPU from the monitor’s view, simulating a GPU that has dropped off the bus (driver crash, hardware fault, hot-unplug).
Trait Implementations§
Source§impl Clone for MockGpuMonitor
impl Clone for MockGpuMonitor
Source§fn clone(&self) -> MockGpuMonitor
fn clone(&self) -> MockGpuMonitor
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more