pub struct GpuDevice { /* private fields */ }Expand description
Handle to a single CUDA GPU device.
Holds a CUDA context, default stream, and a cached cuBLAS handle.
The cuBLAS handle is created once and reused for all matmul/bmm ops,
eliminating the ~1.7ms cuModuleLoadData overhead that occurs when
creating a new CudaBlas per operation.
Implementations§
Source§impl GpuDevice
impl GpuDevice
pub fn new(ordinal: usize) -> GpuResult<Self>
Sourcepub fn fork_for_capture(parent: &GpuDevice) -> GpuResult<Self>
pub fn fork_for_capture(parent: &GpuDevice) -> GpuResult<Self>
Create a GpuDevice with a non-blocking stream forked from the
given device’s default stream. The forked stream supports CUDA graph
capture (which the legacy default stream does not).
pub fn context(&self) -> &Arc<CudaContext>
Sourcepub fn default_stream(&self) -> &Arc<CudaStream>
pub fn default_stream(&self) -> &Arc<CudaStream>
The device’s default (legacy) stream.
Prefer current_stream which respects the
thread-local stream override set by [StreamGuard].
Sourcepub fn stream(&self) -> Arc<CudaStream>
pub fn stream(&self) -> Arc<CudaStream>
The active stream for this device on the current thread.
Returns the thread-local stream set by [StreamGuard] if one is
active, otherwise falls back to the device’s default stream. All
kernel launches and memory operations should use this.
Sourcepub fn blas(&self) -> &CudaBlas
pub fn blas(&self) -> &CudaBlas
The cached cuBLAS handle — reused for all matmul/bmm operations.
pub fn ordinal(&self) -> usize
Source§impl GpuDevice
impl GpuDevice
Sourcepub fn memory_info(&self) -> GpuResult<(usize, usize)>
pub fn memory_info(&self) -> GpuResult<(usize, usize)>
Query free and total GPU memory for this device.
Returns (free_bytes, total_bytes).
§Errors
Returns GpuError::Driver if the CUDA driver call fails.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for GpuDevice
impl RefUnwindSafe for GpuDevice
impl Send for GpuDevice
impl Sync for GpuDevice
impl Unpin for GpuDevice
impl UnsafeUnpin for GpuDevice
impl UnwindSafe for GpuDevice
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T> DistributionExt for Twhere
T: ?Sized,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more