Struct Library

Source

pub struct Library;

Implementations§

Source §

impl Library

Source

pub fn create() -> Result<Self>

Initializes NVML, but does not initialize any GPUs yet.

Newer NVML initialization adds flags that let callers adjust initialization behavior.
In NVML 5.319, Library::create replaced the older initialization path that initialized all GPU devices in the system.

This allows NVML to communicate with a GPU when other GPUs in the system are unstable or in a bad state. With this initialization mode, GPUs are discovered and initialized lazily when you request a device handle from this crate.

In contrast, the older initialization path in NVML 4.304 would fail if any detected GPU was in a bad or unstable state.

For all products.

Call this once before invoking any other methods in the library. A reference count of the number of initializations is maintained. Shutdown only occurs when the reference count reaches zero.

§Errors

Returns an error if the NVIDIA driver is not running, if NVML does not have permission to communicate with the driver, or if NVML reports an unexpected failure.

Examples found in repository ?

examples/fan_curve.rs (line 13)

12fn main() -> Result<(), Box<dyn Error>> {
13    let nvml = Library::create()?;
14    let device = nvml.device(GPU_INDEX)?;
15    let name = device.name()?;
16    let fans = device.num_fans()?;
17    let fan_limits = device.min_max_fan_speed()?;
18
19    println!("controlling GPU {GPU_INDEX}: {name}");
20    println!("fan speed range: {}%-{}%", fan_limits.min, fan_limits.max);
21    println!("curve: {CURVE:?}");
22    if !APPLY_CHANGES {
23        println!("dry run: set APPLY_CHANGES to true to write fan speeds");
24    }
25
26    for _ in 0..POLLS {
27        // Read the GPU temperature and choose a speed from the curve.
28        let temperature = device.temperature_reading(TemperatureSensor::Gpu)?;
29        let target_speed =
30            fan_speed_for_temperature(temperature).clamp(fan_limits.min, fan_limits.max);
31
32        // Apply the same target to every fan controller on the selected GPU.
33        for fan in 0..fans {
34            let current_speed = device.fan_speed(fan)?;
35            println!("fan {fan}: {temperature} C -> {target_speed}% (currently {current_speed}%)");
36
37            if APPLY_CHANGES && current_speed != target_speed {
38                device.set_fan_speed(fan, target_speed)?;
39            }
40        }
41
42        thread::sleep(POLL_INTERVAL);
43    }
44
45    Ok(())
46}

Source

pub fn create_with_flags(flags: InitFlags) -> Result<Self>

Initializes NVML with the provided initialization flags. This follows the same reference-counting behavior as Library::create.

For all products.

§Errors

Returns an error if the NVIDIA driver is not running, if NVML does not have permission to communicate with the driver, or if NVML reports an unexpected failure.

Source

pub fn driver_version(&self) -> Result<String>

Returns the version of the system’s graphics driver.

For all products.

The version identifier is an alphanumeric string. It does not exceed 80 bytes including the terminating NUL byte. This wrapper allocates the required NVML buffer internally.

§Errors

Returns an error if the internal version buffer is too small, if NVML rejects the output argument, or if NVML has not been initialized.

Source

pub fn version(&self) -> Result<String>

Returns the version of the NVML library.

For all products.

The version identifier is an alphanumeric string. It does not exceed 80 bytes including the terminating NUL byte. This wrapper allocates the required NVML buffer internally.

§Errors

Returns an error if the internal version buffer is too small or if NVML rejects the output argument.

Source

pub fn cuda_driver_version(&self) -> Result<CudaDriverVersion>

Returns the version of the CUDA driver from the shared library.

For all products.

CUDA driver version obtained by calling the CUDA driver.

§Errors

Returns an error if the CUDA driver library or version function cannot be found, or if NVML rejects the output argument.

Source

pub fn cuda_driver_version_fallback(&self) -> Result<CudaDriverVersion>

Returns the version of the CUDA driver.

For all products.

The CUDA driver version is retrieved from the currently installed version of CUDA. If the CUDA library is not found, this returns a known supported version number.

§Errors

Returns an error if NVML rejects the query arguments.

Source

pub fn process_name(&self, pid: Pid) -> Result<String>

Source

pub fn process_name_with_capacity( &self, pid: Pid, capacity: usize, ) -> Result<String>

Returns the name of the process with the provided process ID.

For all products.

The returned process name is truncated to capacity and encoded in ANSI.

§Errors

Returns an error if NVML rejects the process ID or output buffer, if the process does not exist, if the current process lacks permission, if NVML has not been initialized, or if NVML reports an unexpected failure.

Source

pub fn driver_branch(&self) -> Result<String>

Returns the driver branch of the NVIDIA driver installed on the system.

For all products.

The branch identifier is an alphanumeric string. It does not exceed 80 bytes including the terminating NUL byte. This wrapper allocates the required NVML buffer internally.

§Errors

Returns an error if the internal driver-branch buffer is too small, if NVML rejects the query arguments, if NVML has not been initialized, or if NVML reports an unexpected failure.

Source

pub fn vgpu_driver_capability( &self, capability: VgpuDriverCapability, ) -> Result<bool>

Returns the requested vGPU driver capability.

See VgpuDriverCapability for the supported capabilities. Returns a boolean indicating whether the capability is supported.

For Maxwell or newer fully supported devices.

§Errors

Returns an error if NVML rejects the requested capability, if the current driver state does not support vGPU capability queries, if NVML has not been initialized, or if NVML reports an unexpected failure.

Source

pub fn vgpu_version(&self) -> Result<VgpuVersionRange>

Query the ranges of supported vGPU versions.

Returns the preset linear range of supported vGPU versions for the NVIDIA vGPU Manager and the administrator-configured range. If the preset range has not been overridden by Library::set_vgpu_version, both ranges are the same.

This wrapper returns both the preset supported range and the administrator-configured current range. By default, the current range matches the preset range.

§Errors

Returns an error if NVML does not support this query, rejects the output buffers, or fails while fetching the version ranges.

Source

pub fn set_vgpu_version(&self, version: VgpuVersion) -> Result<()>

Override the preset range of vGPU versions supported by the NVIDIA vGPU Manager with a range set by an administrator.

Configures the NVIDIA vGPU Manager with an administrator-provided range of supported vGPU versions. This range must be a subset of the preset range that the NVIDIA vGPU Manager supports. The custom range set by an administrator takes precedence over the preset range and is advertised to the guest VM for negotiating the vGPU version. See Library::vgpu_version for details of how to query the preset range of versions supported.

This overrides the preset vGPU version range with the administrator-provided range.

After host system reboot or driver reload, the range of supported versions reverts to the range that is preset for the NVIDIA vGPU Manager.

The range set by the administrator must be a subset of the preset range that the NVIDIA vGPU Manager supports. Otherwise, an error is returned.
If the range of supported guest driver versions does not overlap the range set by the administrator, the guest driver fails to load.
If the range of supported guest driver versions overlaps the range set by the administrator, the guest driver loads with a negotiated vGPU version equal to the maximum value in the overlapping range.
No VMs must be running on the host when setting the version range. If a VM is running on the host, the call fails.

§Errors

Returns an error if version is invalid or outside the preset supported range, if a VM is running on the host, or if the installed vGPU Manager does not support overriding the version range.

Source

pub fn vgpu_compatibility( &self, vgpu_metadata: &VgpuMetadata, pgpu_metadata: &PgpuMetadata, ) -> Result<VgpuCompatibility>

Takes a vGPU instance metadata structure read from VgpuInstance::metadata, and a vGPU metadata structure for a physical GPU read from Device::vgpu_metadata, and returns compatibility information of the vGPU instance and the physical GPU.

This wrapper returns compatibility information describing whether the vGPU or VM may be booted on the physical GPU. If the vGPU / VM compatibility with the physical GPU is limited, a limit code indicates the factor limiting compatibility. See the returned compatibility structure for the reported limit code.

vGPU compatibility does not take into account dynamic capacity conditions that may limit a system’s ability to boot a given vGPU or associated VM.

§Errors

Returns an error if NVML rejects either metadata blob or reports an unexpected compatibility-query failure.

Source

pub fn topology_gpu_set(&self, cpu_number: u32) -> Result<Vec<Device>>

Returns the set of GPUs that have CPU affinity with the given CPU number. Supported on Linux only.

§Errors

Returns an error if NVML rejects the CPU number, if topology discovery is not supported on this platform, or if NVML fails while collecting the GPU set.

Source

pub fn excluded_device_count(&self) -> Result<u32>

Returns the number of excluded GPU devices in the system.

For all products.

§Errors

Returns an error if NVML rejects the count output.

Source

pub fn excluded_device(&self, index: u32) -> Result<ExcludedDeviceInfo>

Acquire the device information for an excluded GPU device, based on its index.

For all products.

Valid indices are derived from the count returned by Library::excluded_device_count. For example, if the count is 2 the valid indices are 0 and 1, corresponding to GPU 0 and GPU 1.

§Errors

Returns an error if index is out of range or NVML rejects the device information output.

Source

pub fn excluded_devices(&self) -> Result<Vec<ExcludedDeviceInfo>>

Source

pub fn device_count(&self) -> Result<u32>

Returns the number of compute devices in the system. A compute device is a single GPU.

For all products.

Library::device_count returns the count of all devices in the system even if Library::device returns Status::NoPermission for some devices. Update your code to handle this error, or use the NVML 4.304 or older header file. For backward binary compatibility reasons, the _v1 symbol is still present in the shared library. The old _v1 NVML entry point does not count devices that NVML has no permission to talk to.

§Errors

Returns an error if NVML rejects the count output, if NVML has not been initialized, or if NVML reports an unexpected failure.

Source

pub fn unit_count(&self) -> Result<u32>

Returns the number of units in the system.

For S-class products.

§Errors

Returns an error if NVML rejects the count output, if NVML has not been initialized, or if NVML reports an unexpected failure.

Source

pub fn unit(&self, index: u32) -> Result<Unit>

Acquire the handle for a particular unit, based on its index.

For S-class products.

Valid indices are derived from the count returned by Library::unit_count. For example, if the count is 2 the valid indices are 0 and 1, corresponding to UNIT 0 and UNIT 1.

The order in which NVML enumerates units has no guarantees of consistency between reboots.

§Errors

Returns an error if index is out of range, if NVML rejects the handle output, if NVML has not been initialized, or if NVML reports an unexpected failure.

Source

pub fn units(&self) -> Result<Vec<Unit>>

Source

pub fn device(&self, index: u32) -> Result<Device>

Acquire the handle for a particular device, based on its index.

For all products.

Valid indices are derived from the accessible device count returned by Library::device_count. For example, if the count is 2 the valid indices are 0 and 1, corresponding to GPU 0 and GPU 1.

The order in which NVML enumerates devices has no guarantees of consistency between reboots. Prefer PCI bus IDs or UUIDs for stable device lookup. See Library::device_by_uuid and Library::device_by_pci_bus_id.

The NVML index may not correlate with other libraries, such as the CUDA device index.

Starting from NVML 5, this call causes NVML to initialize the target GPU. NVML may initialize additional GPUs if:

The target GPU is an SLI slave.

Library::device_count returns the count of all devices in the system even if Library::device returns Status::NoPermission for some devices. Update your code to handle this error, or use the NVML 4.304 or older header file. For backward binary compatibility reasons, the _v1 symbol is still present in the shared library. The old _v1 NVML entry point does not count devices that NVML has no permission to talk to.

This means that Library::device and _v1 can return different devices for the same index. Code that uses the default _v2 mappings at the top of the file is unaffected.

§Errors

Returns an error if index is out of range, if the process cannot access the target GPU, if the GPU cannot be initialized because of power, interrupt, or bus-access problems, if NVML has not been initialized, or if NVML reports an unexpected failure.

Examples found in repository ?

examples/fan_curve.rs (line 14)

12fn main() -> Result<(), Box<dyn Error>> {
13    let nvml = Library::create()?;
14    let device = nvml.device(GPU_INDEX)?;
15    let name = device.name()?;
16    let fans = device.num_fans()?;
17    let fan_limits = device.min_max_fan_speed()?;
18
19    println!("controlling GPU {GPU_INDEX}: {name}");
20    println!("fan speed range: {}%-{}%", fan_limits.min, fan_limits.max);
21    println!("curve: {CURVE:?}");
22    if !APPLY_CHANGES {
23        println!("dry run: set APPLY_CHANGES to true to write fan speeds");
24    }
25
26    for _ in 0..POLLS {
27        // Read the GPU temperature and choose a speed from the curve.
28        let temperature = device.temperature_reading(TemperatureSensor::Gpu)?;
29        let target_speed =
30            fan_speed_for_temperature(temperature).clamp(fan_limits.min, fan_limits.max);
31
32        // Apply the same target to every fan controller on the selected GPU.
33        for fan in 0..fans {
34            let current_speed = device.fan_speed(fan)?;
35            println!("fan {fan}: {temperature} C -> {target_speed}% (currently {current_speed}%)");
36
37            if APPLY_CHANGES && current_speed != target_speed {
38                device.set_fan_speed(fan, target_speed)?;
39            }
40        }
41
42        thread::sleep(POLL_INTERVAL);
43    }
44
45    Ok(())
46}

Source

pub fn device_by_uuid(&self, uuid: &str) -> Result<Device>

Acquire the handle for a device from its globally unique immutable UUID.

For all products.

Starting from NVML 5, this call causes NVML to initialize the target GPU. NVML may initialize additional GPUs as it searches for the target GPU.

§Errors

Returns an error if uuid contains an interior NUL byte or does not identify a device, if NVML cannot initialize one of the GPUs searched, if NVML has not been initialized, or if NVML reports an unexpected failure.

Source

pub fn device_by_pci_bus_id(&self, pci_bus_id: &str) -> Result<Device>

Acquire the handle for a particular device, based on its PCI bus id.

For all products.

This value corresponds to the PCI bus ID returned by Device::pci_info.

Starting from NVML 5, this call causes NVML to initialize the target GPU. NVML may initialize additional GPUs if:

The target GPU is an SLI slave.

Older NVML releases returned Status::NotFound instead of Status::NoPermission.

§Errors

Returns an error if pci_bus_id contains an interior NUL byte or does not identify a device, if the process cannot access the target GPU, if NVML cannot initialize it because of power, interrupt, or bus-access problems, if NVML has not been initialized, or if NVML reports an unexpected failure.

Source

pub fn device_by_serial(&self, serial: &str) -> Result<Device>

Source

pub fn devices(&self) -> Result<Vec<Device>>

Source

pub fn event_set(&self) -> Result<EventSet>

Creates an empty set of events. The returned event set is freed automatically when dropped.

For Fermi or newer fully supported devices.

§Errors

Returns an error if NVML rejects the event-set output, if NVML has not been initialized, or if NVML reports an unexpected failure.

Source

pub fn system_event_set(&self) -> Result<SystemEventSet>

Creates an empty set of system events. The returned system event set is freed automatically when dropped.

For Fermi or newer fully supported devices.

§Errors

Returns an error if the installed NVML version does not support the request layout, if NVML rejects the request, if NVML has not been initialized, or if NVML reports an unexpected failure.

Source

pub fn nvlink_bw_mode(&self) -> Result<u32>

Returns the global NVLink bandwidth mode.

§Errors

Returns an error if NVML rejects the output, if the system does not support global NVLink bandwidth mode, or if the current process lacks the required privileges.

Source

pub fn conf_compute_capabilities(&self) -> Result<ConfComputeSystemCaps>

Returns Confidential Computing system capabilities.

For Ampere or newer fully supported devices. Supported on Linux, Windows TCC.

§Errors

Returns an error if NVML rejects the output, if the system does not support this Confidential Computing query, or if NVML has not been initialized.

Source

pub fn conf_compute_state(&self) -> Result<ConfComputeSystemState>

Returns Confidential Computing system state.

For Ampere or newer fully supported devices. Supported on Linux, Windows TCC.

§Errors

Returns an error if NVML rejects the output, if the system does not support this Confidential Computing query, or if NVML has not been initialized.

Source

pub fn conf_compute_gpus_ready_state(&self) -> Result<bool>

Returns Confidential Computing GPU ready state.

For Ampere or newer fully supported devices. Supported on Linux, Windows TCC.

§Errors

Returns an error if NVML rejects the output, if the system does not support this Confidential Computing query, or if NVML has not been initialized.

Source

pub fn conf_compute_key_rotation_threshold( &self, ) -> Result<ConfComputeKeyRotationThreshold>

Returns Confidential Computing key rotation threshold detail.

For Hopper or newer fully supported devices. Supported on Linux, Windows TCC.

§Errors

Returns an error if NVML rejects the request, if the system does not support this Confidential Computing query, if NVML has not been initialized, or if NVML reports an unexpected failure.

Source

pub fn conf_compute_settings(&self) -> Result<ConfComputeSystemSettings>

Returns Confidential Computing system settings.

For Hopper or newer fully supported devices. Supported on Linux, Windows TCC.

§Errors

Returns an error if the installed NVML version does not support the request layout, if NVML rejects the request, if a target GPU is inaccessible, if the system does not support Confidential Computing settings, if NVML has not been initialized, or if NVML reports an unexpected failure.

Source

pub fn hic_versions(&self) -> Result<Vec<HwbcEntry>>

Returns the IDs and firmware versions for any Host Interface Cards (HICs) in the system.

For S-class products.

This wrapper queries the required entry count internally. The HIC must be connected to an S-class system to be reported.

§Errors

Returns an error if the number of HIC entries changes while the wrapper is fetching them, if NVML rejects the query, or if NVML has not been initialized.

Source

pub fn gpm_sample(&self) -> Result<GpmSample>

Allocates a sample buffer to be used with NVML GPM. At least two of these buffers are required to use the NVML GPM feature.

For Hopper or newer fully supported devices.

§Errors

Returns an error if NVML rejects the allocation request or if system memory is insufficient.

Source

pub fn gpm_metrics( &self, sample1: &GpmSample, sample2: &GpmSample, metric_ids: &[GpmMetricId], ) -> Result<Vec<GpmMetric>>

Calculate GPM metrics from two samples.

For Hopper or newer fully supported devices.

To retrieve metrics, allocate two sample buffers with Library::gpm_sample and store them in metrics.sample1 and metrics.sample2. Next, fill each requested metric ID in metrics.metrics[i].metric_id and set metrics.num_metrics to the total number of metrics to retrieve. Then call Device::gpm_sample twice to obtain two samples of counters.

The interval between these two Device::gpm_sample calls must be greater than 100 ms due to the internal sample refresh rate. Finally, call Library::gpm_metrics to retrieve the metrics into metrics.metrics.