pub struct Device { /* private fields */ }Expand description
Represents a CUDA-capable GPU device.
Wraps a CUdevice handle obtained from the driver API. Devices are
identified by a zero-based ordinal index. The handle is a lightweight
integer that can be freely copied.
§Examples
use oxicuda_driver::device::Device;
oxicuda_driver::init()?;
let device = Device::get(0)?;
println!("GPU: {}", device.name()?);
println!("Memory: {} MB", device.total_memory()? / (1024 * 1024));
let (major, minor) = device.compute_capability()?;
println!("Compute: {major}.{minor}");Implementations§
Source§impl Device
impl Device
Sourcepub fn get(ordinal: i32) -> CudaResult<Self>
pub fn get(ordinal: i32) -> CudaResult<Self>
Get a device handle by ordinal (0-indexed).
§Errors
Returns CudaError::InvalidDevice if the ordinal is out of range,
or CudaError::NotInitialized if the driver has not been loaded.
Sourcepub fn count() -> CudaResult<i32>
pub fn count() -> CudaResult<i32>
Get the number of CUDA-capable devices in the system.
§Errors
Returns an error if the driver cannot enumerate devices.
Sourcepub fn name(&self) -> CudaResult<String>
pub fn name(&self) -> CudaResult<String>
Get the device name (e.g., "NVIDIA A100-SXM4-80GB").
The returned string is an ASCII identifier provided by the driver.
§Errors
Returns an error if the driver call fails.
Sourcepub fn total_memory(&self) -> CudaResult<usize>
pub fn total_memory(&self) -> CudaResult<usize>
Sourcepub fn attribute(&self, attr: CUdevice_attribute) -> CudaResult<i32>
pub fn attribute(&self, attr: CUdevice_attribute) -> CudaResult<i32>
Query an arbitrary device attribute.
This is the low-level building block for all the convenience methods
below. Callers can use any CUdevice_attribute variant directly.
§Errors
Returns an error if the attribute is not supported or the driver call fails.
Sourcepub fn compute_capability(&self) -> CudaResult<(i32, i32)>
pub fn compute_capability(&self) -> CudaResult<(i32, i32)>
Get compute capability as (major, minor).
For example, an A100 returns (8, 0) and an RTX 4090 returns (8, 9).
§Errors
Returns an error if the driver call fails.
Sourcepub fn max_threads_per_block(&self) -> CudaResult<i32>
pub fn max_threads_per_block(&self) -> CudaResult<i32>
Get the maximum number of threads per block.
Sourcepub fn max_block_dim(&self) -> CudaResult<(i32, i32, i32)>
pub fn max_block_dim(&self) -> CudaResult<(i32, i32, i32)>
Get the maximum block dimensions as (x, y, z).
Sourcepub fn max_grid_dim(&self) -> CudaResult<(i32, i32, i32)>
pub fn max_grid_dim(&self) -> CudaResult<(i32, i32, i32)>
Get the maximum grid dimensions as (x, y, z).
Sourcepub fn max_threads_per_multiprocessor(&self) -> CudaResult<i32>
pub fn max_threads_per_multiprocessor(&self) -> CudaResult<i32>
Get the maximum number of threads per multiprocessor.
Sourcepub fn max_blocks_per_multiprocessor(&self) -> CudaResult<i32>
pub fn max_blocks_per_multiprocessor(&self) -> CudaResult<i32>
Get the maximum number of blocks per multiprocessor.
Sourcepub fn multiprocessor_count(&self) -> CudaResult<i32>
pub fn multiprocessor_count(&self) -> CudaResult<i32>
Get the number of streaming multiprocessors (SMs) on the device.
Sourcepub fn warp_size(&self) -> CudaResult<i32>
pub fn warp_size(&self) -> CudaResult<i32>
Get the warp size in threads (typically 32 for all NVIDIA GPUs).
Get the maximum shared memory per block in bytes.
Get the maximum shared memory per multiprocessor in bytes.
Get the maximum opt-in shared memory per block in bytes.
This is the upper bound achievable via
cuFuncSetAttribute(CU_FUNC_ATTRIBUTE_MAX_DYNAMIC_SHARED_SIZE_BYTES).
Sourcepub fn max_registers_per_block(&self) -> CudaResult<i32>
pub fn max_registers_per_block(&self) -> CudaResult<i32>
Get the maximum number of 32-bit registers per block.
Sourcepub fn max_registers_per_multiprocessor(&self) -> CudaResult<i32>
pub fn max_registers_per_multiprocessor(&self) -> CudaResult<i32>
Get the maximum number of 32-bit registers per multiprocessor.
Sourcepub fn l2_cache_size(&self) -> CudaResult<i32>
pub fn l2_cache_size(&self) -> CudaResult<i32>
Get the L2 cache size in bytes.
Sourcepub fn total_constant_memory(&self) -> CudaResult<i32>
pub fn total_constant_memory(&self) -> CudaResult<i32>
Get the total constant memory on the device in bytes.
Sourcepub fn clock_rate_khz(&self) -> CudaResult<i32>
pub fn clock_rate_khz(&self) -> CudaResult<i32>
Get the core clock rate in kHz.
Sourcepub fn memory_clock_rate_khz(&self) -> CudaResult<i32>
pub fn memory_clock_rate_khz(&self) -> CudaResult<i32>
Get the memory clock rate in kHz.
Sourcepub fn memory_bus_width(&self) -> CudaResult<i32>
pub fn memory_bus_width(&self) -> CudaResult<i32>
Get the global memory bus width in bits.
Sourcepub fn pci_bus_id(&self) -> CudaResult<i32>
pub fn pci_bus_id(&self) -> CudaResult<i32>
Get the PCI bus ID of the device.
Sourcepub fn pci_device_id(&self) -> CudaResult<i32>
pub fn pci_device_id(&self) -> CudaResult<i32>
Get the PCI device ID.
Sourcepub fn pci_domain_id(&self) -> CudaResult<i32>
pub fn pci_domain_id(&self) -> CudaResult<i32>
Get the PCI domain ID.
Sourcepub fn supports_managed_memory(&self) -> CudaResult<bool>
pub fn supports_managed_memory(&self) -> CudaResult<bool>
Check if the device supports managed (unified) memory.
Sourcepub fn supports_concurrent_managed_access(&self) -> CudaResult<bool>
pub fn supports_concurrent_managed_access(&self) -> CudaResult<bool>
Check if the device supports concurrent managed memory access.
Sourcepub fn supports_concurrent_kernels(&self) -> CudaResult<bool>
pub fn supports_concurrent_kernels(&self) -> CudaResult<bool>
Check if the device supports concurrent kernel execution.
Sourcepub fn supports_cooperative_launch(&self) -> CudaResult<bool>
pub fn supports_cooperative_launch(&self) -> CudaResult<bool>
Check if the device supports cooperative kernel launches.
Sourcepub fn ecc_enabled(&self) -> CudaResult<bool>
pub fn ecc_enabled(&self) -> CudaResult<bool>
Check if ECC memory is enabled on the device.
Sourcepub fn is_integrated(&self) -> CudaResult<bool>
pub fn is_integrated(&self) -> CudaResult<bool>
Check if the device is integrated (shares memory with the host).
Sourcepub fn can_map_host_memory(&self) -> CudaResult<bool>
pub fn can_map_host_memory(&self) -> CudaResult<bool>
Check if the device can map host memory into its address space.
Sourcepub fn supports_unified_addressing(&self) -> CudaResult<bool>
pub fn supports_unified_addressing(&self) -> CudaResult<bool>
Check if the device uses a unified address space with the host.
Sourcepub fn supports_stream_priorities(&self) -> CudaResult<bool>
pub fn supports_stream_priorities(&self) -> CudaResult<bool>
Check if the device supports stream priorities.
Sourcepub fn supports_compute_preemption(&self) -> CudaResult<bool>
pub fn supports_compute_preemption(&self) -> CudaResult<bool>
Check if the device supports compute preemption.
Sourcepub fn async_engine_count(&self) -> CudaResult<i32>
pub fn async_engine_count(&self) -> CudaResult<i32>
Get the number of asynchronous engines (copy engines).
Sourcepub fn is_multi_gpu_board(&self) -> CudaResult<bool>
pub fn is_multi_gpu_board(&self) -> CudaResult<bool>
Check if the device is on a multi-GPU board.
Sourcepub fn has_kernel_exec_timeout(&self) -> CudaResult<bool>
pub fn has_kernel_exec_timeout(&self) -> CudaResult<bool>
Check if there is a kernel execution timeout enforced by the OS.
Sourcepub fn compute_mode(&self) -> CudaResult<i32>
pub fn compute_mode(&self) -> CudaResult<i32>
Get the compute mode (0=default, 1=exclusive-thread, 2=prohibited, 3=exclusive-process).
Sourcepub fn tcc_driver(&self) -> CudaResult<bool>
pub fn tcc_driver(&self) -> CudaResult<bool>
Check if the device uses the TCC (Tesla Compute Cluster) driver model.
TCC mode disables the display driver, giving full GPU resources to compute workloads.
Sourcepub fn multi_gpu_board_group_id(&self) -> CudaResult<i32>
pub fn multi_gpu_board_group_id(&self) -> CudaResult<i32>
Get the multi-GPU board group identifier.
Devices on the same board share the same group ID.
Sourcepub fn max_persisting_l2_cache_size(&self) -> CudaResult<i32>
pub fn max_persisting_l2_cache_size(&self) -> CudaResult<i32>
Get the maximum persisting L2 cache size in bytes (Ampere+).
Sourcepub fn supports_generic_compression(&self) -> CudaResult<bool>
pub fn supports_generic_compression(&self) -> CudaResult<bool>
Check if the device supports generic memory compression.
Sourcepub fn supports_pageable_memory_access(&self) -> CudaResult<bool>
pub fn supports_pageable_memory_access(&self) -> CudaResult<bool>
Check if the device supports pageable memory access.
Sourcepub fn pageable_memory_uses_host_page_tables(&self) -> CudaResult<bool>
pub fn pageable_memory_uses_host_page_tables(&self) -> CudaResult<bool>
Check if pageable memory access uses host page tables.
Sourcepub fn supports_direct_managed_mem_from_host(&self) -> CudaResult<bool>
pub fn supports_direct_managed_mem_from_host(&self) -> CudaResult<bool>
Check if the device supports direct managed memory access from the host.
Sourcepub fn memory_pool_supported_handle_types(&self) -> CudaResult<i32>
pub fn memory_pool_supported_handle_types(&self) -> CudaResult<i32>
Get memory pool supported handle types as a bitmask.
Sourcepub fn supports_host_native_atomics(&self) -> CudaResult<bool>
pub fn supports_host_native_atomics(&self) -> CudaResult<bool>
Check if the device supports host-visible native atomic operations.
Sourcepub fn single_to_double_perf_ratio(&self) -> CudaResult<i32>
pub fn single_to_double_perf_ratio(&self) -> CudaResult<i32>
Get the ratio of single-precision to double-precision performance.
A higher value means the GPU is relatively faster at FP32 than FP64.
Sourcepub fn supports_cooperative_multi_device_launch(&self) -> CudaResult<bool>
pub fn supports_cooperative_multi_device_launch(&self) -> CudaResult<bool>
Check if the device supports cooperative multi-device kernel launches.
Sourcepub fn supports_flush_remote_writes(&self) -> CudaResult<bool>
pub fn supports_flush_remote_writes(&self) -> CudaResult<bool>
Check if the device supports flushing outstanding remote writes.
Sourcepub fn supports_host_register(&self) -> CudaResult<bool>
pub fn supports_host_register(&self) -> CudaResult<bool>
Check if the device supports host-side memory register functions.
Sourcepub fn can_use_host_pointer_for_registered_mem(&self) -> CudaResult<bool>
pub fn can_use_host_pointer_for_registered_mem(&self) -> CudaResult<bool>
Check if the device can use host pointers for registered memory.
Sourcepub fn supports_gpu_direct_rdma(&self) -> CudaResult<bool>
pub fn supports_gpu_direct_rdma(&self) -> CudaResult<bool>
Check if the device supports GPU Direct RDMA.
Sourcepub fn supports_tensor_map_access(&self) -> CudaResult<bool>
pub fn supports_tensor_map_access(&self) -> CudaResult<bool>
Check if the device supports tensor-map access (Hopper+).
Sourcepub fn supports_multicast(&self) -> CudaResult<bool>
pub fn supports_multicast(&self) -> CudaResult<bool>
Check if the device supports multicast operations.
Sourcepub fn mps_enabled(&self) -> CudaResult<bool>
pub fn mps_enabled(&self) -> CudaResult<bool>
Check if Multi-Process Service (MPS) is enabled on the device.
Sourcepub fn max_texture_1d_width(&self) -> CudaResult<i32>
pub fn max_texture_1d_width(&self) -> CudaResult<i32>
Get the maximum 1D texture width.
Sourcepub fn max_texture_2d_dims(&self) -> CudaResult<(i32, i32)>
pub fn max_texture_2d_dims(&self) -> CudaResult<(i32, i32)>
Get the maximum 2D texture dimensions as (width, height).
Sourcepub fn max_texture_3d_dims(&self) -> CudaResult<(i32, i32, i32)>
pub fn max_texture_3d_dims(&self) -> CudaResult<(i32, i32, i32)>
Get the maximum 3D texture dimensions as (width, height, depth).
Sourcepub fn gpu_overlap(&self) -> CudaResult<bool>
pub fn gpu_overlap(&self) -> CudaResult<bool>
Check if the device can copy memory and execute a kernel concurrently.
Sourcepub fn max_pitch(&self) -> CudaResult<i32>
pub fn max_pitch(&self) -> CudaResult<i32>
Get the maximum pitch for memory copies in bytes.
Sourcepub fn texture_alignment(&self) -> CudaResult<i32>
pub fn texture_alignment(&self) -> CudaResult<i32>
Get the texture alignment requirement in bytes.
Sourcepub fn surface_alignment(&self) -> CudaResult<i32>
pub fn surface_alignment(&self) -> CudaResult<i32>
Get the surface alignment requirement in bytes.
Sourcepub fn supports_deferred_mapping(&self) -> CudaResult<bool>
pub fn supports_deferred_mapping(&self) -> CudaResult<bool>
Check if the device supports deferred mapping of CUDA arrays.
Sourcepub fn supports_memory_pools(&self) -> CudaResult<bool>
pub fn supports_memory_pools(&self) -> CudaResult<bool>
Check if the device supports memory pools (cudaMallocAsync).
Sourcepub fn supports_cluster_launch(&self) -> CudaResult<bool>
pub fn supports_cluster_launch(&self) -> CudaResult<bool>
Check if the device supports cluster launch (Hopper+).
Sourcepub fn supports_virtual_memory_management(&self) -> CudaResult<bool>
pub fn supports_virtual_memory_management(&self) -> CudaResult<bool>
Check if the device supports virtual memory management APIs.
Sourcepub fn supports_handle_type_posix_fd(&self) -> CudaResult<bool>
pub fn supports_handle_type_posix_fd(&self) -> CudaResult<bool>
Check if the device supports POSIX file descriptor handles for IPC.
Sourcepub fn supports_handle_type_win32(&self) -> CudaResult<bool>
pub fn supports_handle_type_win32(&self) -> CudaResult<bool>
Check if the device supports Win32 handles for IPC.
Sourcepub fn supports_handle_type_win32_kmt(&self) -> CudaResult<bool>
pub fn supports_handle_type_win32_kmt(&self) -> CudaResult<bool>
Check if the device supports Win32 KMT handles for IPC.
Sourcepub fn supports_gpu_direct_rdma_vmm(&self) -> CudaResult<bool>
pub fn supports_gpu_direct_rdma_vmm(&self) -> CudaResult<bool>
Check if the device supports GPU Direct RDMA with CUDA VMM.
Sourcepub fn gpu_direct_rdma_flush_writes_options(&self) -> CudaResult<i32>
pub fn gpu_direct_rdma_flush_writes_options(&self) -> CudaResult<i32>
Get the GPU Direct RDMA flush-writes options bitmask.
Sourcepub fn gpu_direct_rdma_writes_ordering(&self) -> CudaResult<i32>
pub fn gpu_direct_rdma_writes_ordering(&self) -> CudaResult<i32>
Get the GPU Direct RDMA writes ordering.
Sourcepub fn max_access_policy_window_size(&self) -> CudaResult<i32>
pub fn max_access_policy_window_size(&self) -> CudaResult<i32>
Get the maximum access-policy window size for L2 cache.
Get the reserved shared memory per block in bytes.
Sourcepub fn supports_timeline_semaphore_interop(&self) -> CudaResult<bool>
pub fn supports_timeline_semaphore_interop(&self) -> CudaResult<bool>
Check if timeline semaphore interop is supported.
Sourcepub fn supports_mem_sync_domain(&self) -> CudaResult<bool>
pub fn supports_mem_sync_domain(&self) -> CudaResult<bool>
Check if memory sync domain operations are supported.
Sourcepub fn mem_sync_domain_count(&self) -> CudaResult<i32>
pub fn mem_sync_domain_count(&self) -> CudaResult<i32>
Get the number of memory sync domains.
Sourcepub fn supports_gpu_direct_rdma_fabric(&self) -> CudaResult<bool>
pub fn supports_gpu_direct_rdma_fabric(&self) -> CudaResult<bool>
Check if GPU-Direct Fabric (RDMA) is supported.
Sourcepub fn supports_unified_function_pointers(&self) -> CudaResult<bool>
pub fn supports_unified_function_pointers(&self) -> CudaResult<bool>
Check if unified function pointers are supported.
Sourcepub fn supports_ipc_events(&self) -> CudaResult<bool>
pub fn supports_ipc_events(&self) -> CudaResult<bool>
Check if IPC event handles are supported.
Sourcepub fn numa_config(&self) -> CudaResult<i32>
pub fn numa_config(&self) -> CudaResult<i32>
Get the NUMA configuration of the device.
Sourcepub fn numa_id(&self) -> CudaResult<i32>
pub fn numa_id(&self) -> CudaResult<i32>
Get the NUMA ID of the device.
Sourcepub fn host_numa_id(&self) -> CudaResult<i32>
pub fn host_numa_id(&self) -> CudaResult<i32>
Get the host NUMA ID of the device.
Sourcepub fn texture_pitch_alignment(&self) -> CudaResult<i32>
pub fn texture_pitch_alignment(&self) -> CudaResult<i32>
Get the texture pitch alignment requirement in bytes.
Sourcepub fn info(&self) -> CudaResult<DeviceInfo>
pub fn info(&self) -> CudaResult<DeviceInfo>
Gather comprehensive device information in a single call.
Returns a DeviceInfo with all key properties. Individual attribute
query failures are silently replaced with default values (0 / false)
so that the call succeeds even on older drivers that lack some attributes.
§Errors
Returns an error only if the device name or total memory cannot be queried (fundamental properties).
Source§impl Device
impl Device
Sourcepub fn occupancy_info(&self) -> CudaResult<DeviceOccupancyInfo>
pub fn occupancy_info(&self) -> CudaResult<DeviceOccupancyInfo>
Gather all occupancy-relevant hardware attributes into a
DeviceOccupancyInfo struct.
On macOS (where no NVIDIA driver is available) this returns synthetic values for a typical SM 8.6 (Ampere) GPU so that CPU-side occupancy analysis can still run.
§Errors
Returns a CudaError if an attribute query fails on a real GPU.