Struct nvml_wrapper::Nvml

source ·
pub struct Nvml { /* private fields */ }
Expand description

The main struct that this library revolves around.

According to NVIDIA’s documentation, “It is the user’s responsibility to call nvmlInit() before calling any other methods, and nvmlShutdown() once NVML is no longer being used.” This struct is used to enforce those rules.

Also according to NVIDIA’s documentation, “NVML is thread-safe so it is safe to make simultaneous NVML calls from multiple threads.” In the Rust world, this translates to NVML being Send + Sync. You can .clone() an Arc wrapped NVML and enjoy using it on any thread.

NOTE: If you care about possible errors returned from nvmlShutdown(), use the .shutdown() method on this struct. The Drop implementation ignores errors.

When reading documentation on this struct and its members, remember that a lot of it, especially in regards to errors returned, is copied from NVIDIA’s docs. While they can be found online here, the hosted docs sometimes outdated and may not accurately reflect the version of NVML that this library is written for; beware. You should ideally read the doc comments on an up-to-date NVML API header. Such a header can be downloaded as part of the CUDA toolkit.

Implementations§

source§

impl Nvml

source

pub fn init() -> Result<Self, NvmlError>

Handles NVML initialization and must be called before doing anything else.

While it is possible to initialize NVML multiple times (NVIDIA’s docs state that reference counting is used internally), you should strive to initialize NVML once at the start of your program’s execution; the constructors handle dynamically loading function symbols from the NVML lib and are therefore somewhat expensive.

Note that this will initialize NVML but not any GPUs. This means that NVML can communicate with a GPU even when other GPUs in a system are bad or unstable.

By default, initialization looks for “libnvidia-ml.so” on linux and “nvml.dll” on Windows. These default names should work for default installs on those platforms; if further specification is required, use Nvml::builder.

§Errors
  • DriverNotLoaded, if the NVIDIA driver is not running
  • NoPermission, if NVML does not have permission to talk to the driver
  • Unknown, on any unexpected error
source

pub fn init_with_flags(flags: InitFlags) -> Result<Self, NvmlError>

An initialization function that allows you to pass flags to control certain behaviors.

This is the same as init() except for the addition of flags.

§Errors
  • DriverNotLoaded, if the NVIDIA driver is not running
  • NoPermission, if NVML does not have permission to talk to the driver
  • Unknown, on any unexpected error
§Examples
use nvml_wrapper::bitmasks::InitFlags;

// Don't fail if the system doesn't have any NVIDIA GPUs
//
// Also, don't attach any GPUs during initialization
Nvml::init_with_flags(InitFlags::NO_GPUS | InitFlags::NO_ATTACH)?;
source

pub fn builder<'a>() -> NvmlBuilder<'a>

Create an NvmlBuilder for further flexibility in how NVML is initialized.

source

pub fn shutdown(self) -> Result<(), NvmlError>

Use this to shutdown NVML and release allocated resources if you care about handling potential errors (the Drop implementation ignores errors!).

§Errors
  • Uninitialized, if the library has not been successfully initialized
  • Unknown, on any unexpected error
source

pub fn device_count(&self) -> Result<u32, NvmlError>

Get the number of compute devices in the system (compute device == one GPU).

Note that this count can include devices you do not have permission to access.

§Errors
  • Uninitialized, if the library has not been successfully initialized
  • Unknown, on any unexpected error
source

pub fn sys_driver_version(&self) -> Result<String, NvmlError>

Gets the version of the system’s graphics driver and returns it as an alphanumeric string.

§Errors
  • Uninitialized, if the library has not been successfully initialized
  • Utf8Error, if the string obtained from the C function is not valid Utf8
source

pub fn sys_nvml_version(&self) -> Result<String, NvmlError>

Gets the version of the system’s NVML library and returns it as an alphanumeric string.

§Errors
  • Utf8Error, if the string obtained from the C function is not valid Utf8
source

pub fn sys_cuda_driver_version(&self) -> Result<i32, NvmlError>

Gets the version of the system’s CUDA driver.

Calls into the CUDA library (cuDriverGetVersion()).

You can use cuda_driver_version_major and cuda_driver_version_minor to get the major and minor driver versions from this number.

§Errors
  • FunctionNotFound, if cuDriverGetVersion() is not found in the shared library
  • LibraryNotFound, if libcuda.so.1 or libcuda.dll cannot be found
source

pub fn sys_process_name( &self, pid: u32, length: usize ) -> Result<String, NvmlError>

Gets the name of the process for the given process ID, cropped to the provided length.

§Errors
  • Uninitialized, if the library has not been successfully initialized
  • InvalidArg, if the length is 0 (if this is returned without length being 0, file an issue)
  • NotFound, if the process does not exist
  • NoPermission, if the user doesn’t have permission to perform the operation
  • Utf8Error, if the string obtained from the C function is not valid UTF-8. NVIDIA’s docs say that the string encoding is ANSI, so this may very well happen.
  • Unknown, on any unexpected error
source

pub fn device_by_index(&self, index: u32) -> Result<Device<'_>, NvmlError>

Acquire the handle for a particular device based on its index (starts at 0).

Usage of this function causes NVML to initialize the target GPU. Additional GPUs may be initialized if the target GPU is an SLI slave.

You can determine valid indices by using .device_count(). This function doesn’t call that for you, but the actual C function to get the device handle will return an error in the case of an invalid index. This means that the InvalidArg error will be returned if you pass in an invalid index.

NVIDIA’s docs state that “The order in which NVML enumerates devices has no guarantees of consistency between reboots. For that reason it is recommended that devices be looked up by their PCI ids or UUID.” In this library, that translates into usage of .device_by_uuid() and .device_by_pci_bus_id().

The NVML index may not correlate with other APIs such as the CUDA device index.

§Errors
  • Uninitialized, if the library has not been successfully initialized
  • InvalidArg, if index is invalid
  • InsufficientPower, if any attached devices have improperly attached external power cables
  • NoPermission, if the user doesn’t have permission to talk to this device
  • IrqIssue, if the NVIDIA kernel detected an interrupt issue with the attached GPUs
  • GpuLost, if the target GPU has fallen off the bus or is otherwise inaccessible
  • Unknown, on any unexpected error
source

pub fn device_by_pci_bus_id<S: AsRef<str>>( &self, pci_bus_id: S ) -> Result<Device<'_>, NvmlError>
where Vec<u8>: From<S>,

Acquire the handle for a particular device based on its PCI bus ID.

Usage of this function causes NVML to initialize the target GPU. Additional GPUs may be initialized if the target GPU is an SLI slave.

The bus ID corresponds to the bus_id returned by Device.pci_info().

§Errors
  • Uninitialized, if the library has not been successfully initialized
  • InvalidArg, if pci_bus_id is invalid
  • NotFound, if pci_bus_id does not match a valid device on the system
  • InsufficientPower, if any attached devices have improperly attached external power cables
  • NoPermission, if the user doesn’t have permission to talk to this device
  • IrqIssue, if the NVIDIA kernel detected an interrupt issue with the attached GPUs
  • GpuLost, if the target GPU has fallen off the bus or is otherwise inaccessible
  • NulError, for which you can read the docs on std::ffi::NulError
  • Unknown, on any unexpected error
source

pub fn device_by_serial<S: AsRef<str>>( &self, board_serial: S ) -> Result<Device<'_>, NvmlError>
where Vec<u8>: From<S>,

👎Deprecated: use .device_by_uuid(), this errors on dual GPU boards

Not documenting this because it’s deprecated and does not seem to work anymore.

source

pub fn device_by_uuid<S: AsRef<str>>( &self, uuid: S ) -> Result<Device<'_>, NvmlError>
where Vec<u8>: From<S>,

Acquire the handle for a particular device based on its globally unique immutable UUID.

Usage of this function causes NVML to initialize the target GPU. Additional GPUs may be initialized as the function called within searches for the target GPU.

§Errors
  • Uninitialized, if the library has not been successfully initialized
  • InvalidArg, if uuid is invalid
  • NotFound, if uuid does not match a valid device on the system
  • InsufficientPower, if any attached devices have improperly attached external power cables
  • IrqIssue, if the NVIDIA kernel detected an interrupt issue with the attached GPUs
  • GpuLost, if the target GPU has fallen off the bus or is otherwise inaccessible
  • NulError, for which you can read the docs on std::ffi::NulError
  • Unknown, on any unexpected error

NVIDIA doesn’t mention NoPermission for this one. Strange!

source

pub fn topology_common_ancestor( &self, device1: &Device<'_>, device2: &Device<'_> ) -> Result<TopologyLevel, NvmlError>

Gets the common ancestor for two devices.

Note: this is the same as Device.topology_common_ancestor().

§Errors
  • InvalidArg, if the device is invalid
  • NotSupported, if this Device or the OS does not support this feature
  • UnexpectedVariant, for which you can read the docs for
  • Unknown, on any unexpected error
§Platform Support

Only supports Linux.

source

pub fn unit_by_index(&self, index: u32) -> Result<Unit<'_>, NvmlError>

Acquire the handle for a particular Unit based on its index.

Valid indices are derived from the count returned by .unit_count(). For example, if unit_count is 2 the valid indices are 0 and 1, corresponding to UNIT 0 and UNIT 1.

Note that the order in which NVML enumerates units has no guarantees of consistency between reboots.

§Errors
  • Uninitialized, if the library has not been successfully initialized
  • InvalidArg, if index is invalid
  • Unknown, on any unexpected error
§Device Support

For S-class products.

source

pub fn are_devices_on_same_board( &self, device1: &Device<'_>, device2: &Device<'_> ) -> Result<bool, NvmlError>

Checks if the passed-in Devices are on the same physical board.

Note: this is the same as Device.is_on_same_board_as().

§Errors
  • Uninitialized, if the library has not been successfully initialized
  • InvalidArg, if either Device is invalid
  • NotSupported, if this check is not supported by this Device
  • GpuLost, if this Device has fallen off the bus or is otherwise inaccessible
  • Unknown, on any unexpected error
source

pub fn topology_gpu_set( &self, cpu_number: u32 ) -> Result<Vec<Device<'_>>, NvmlError>

Gets the set of GPUs that have a CPU affinity with the given CPU number.

§Errors
  • InvalidArg, if cpu_number is invalid
  • NotSupported, if this Device or the OS does not support this feature
  • Unknown, an error has occurred in the underlying topology discovery
§Platform Support

Only supports Linux.

source

pub fn hic_versions(&self) -> Result<Vec<HwbcEntry>, NvmlError>

Gets the IDs and firmware versions for any Host Interface Cards in the system.

§Errors
  • Uninitialized, if the library has not been successfully initialized
§Device Support

Supports S-class products.

source

pub fn hic_count(&self) -> Result<u32, NvmlError>

Gets the count of Host Interface Cards in the system.

§Errors
  • Uninitialized, if the library has not been successfully initialized
§Device Support

Supports S-class products.

source

pub fn unit_count(&self) -> Result<u32, NvmlError>

Gets the number of units in the system.

§Errors
  • Uninitialized, if the library has not been successfully initialized
  • Unknown, on any unexpected error
§Device Support

Supports S-class products.

source

pub fn create_event_set(&self) -> Result<EventSet<'_>, NvmlError>

Create an empty set of events.

§Errors
  • Uninitialized, if the library has not been successfully initialized
  • Unknown, on any unexpected error
§Device Support

Supports Fermi and newer fully supported devices.

source

pub fn discover_gpus(&self, pci_info: PciInfo) -> Result<(), NvmlError>

Request the OS and the NVIDIA kernel driver to rediscover a portion of the PCI subsystem in search of GPUs that were previously removed.

The portion of the PCI tree can be narrowed by specifying a domain, bus, and device in the passed-in pci_info. If all of these fields are zeroes, the entire PCI tree will be searched. Note that for long-running NVML processes, the enumeration of devices will change based on how many GPUs are discovered and where they are inserted in bus order.

All newly discovered GPUs will be initialized and have their ECC scrubbed which may take several seconds per GPU. All device handles are no longer guaranteed to be valid post discovery. I am not sure if this means all device handles, literally, or if NVIDIA is referring to handles that had previously been obtained to devices that were then removed and have now been re-discovered.

Must be run as administrator.

§Errors
  • Uninitialized, if the library has not been successfully initialized
  • OperatingSystem, if the operating system is denying this feature
  • NoPermission, if the calling process has insufficient permissions to perform this operation
  • NulError, if an issue is encountered when trying to convert a Rust String into a CString.
  • Unknown, on any unexpected error
§Device Support

Supports Pascal and newer fully supported devices.

Some Kepler devices are also supported (that’s all NVIDIA says, no specifics).

§Platform Support

Only supports Linux.

source

pub fn excluded_device_count(&self) -> Result<u32, NvmlError>

Gets the number of excluded GPU devices in the system.

§Device Support

Supports all devices.

source

pub fn excluded_device_info( &self, index: u32 ) -> Result<ExcludedDeviceInfo, NvmlError>

Gets information for the specified excluded device.

§Errors
  • InvalidArg, if the given index is invalid
  • Utf8Error, if strings obtained from the C function are not valid Utf8
§Device Support

Supports all devices.

Trait Implementations§

source§

impl Debug for Nvml

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl Drop for Nvml

This Drop implementation ignores errors! Use the .shutdown() method on the Nvml struct if you care about handling them.

source§

fn drop(&mut self)

Executes the destructor for this type. Read more
source§

impl EventLoopProvider for Nvml

source§

fn create_event_loop<'nvml>( &'nvml self, devices: Vec<&Device<'nvml>> ) -> Result<EventLoop<'_>, NvmlErrorWithSource>

Create an event loop that will register itself to recieve events for the given Devices.

This function creates an event set and registers each devices’ supported event types for it. The returned EventLoop struct then has methods that you can call to actually utilize it.

§Errors
  • Uninitialized, if the library has not been successfully initialized
  • GpuLost, if any of the given Devices have fallen off the bus or are otherwise inaccessible
  • Unknown, on any unexpected error
§Platform Support

Only supports Linux.

Auto Trait Implementations§

§

impl !RefUnwindSafe for Nvml

§

impl Send for Nvml

§

impl Sync for Nvml

§

impl Unpin for Nvml

§

impl !UnwindSafe for Nvml

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.