pub struct GpuCapability {
pub compute_major: i32,
pub compute_minor: i32,
pub has_tensor_cores: bool,
pub has_fp64_tensor_cores: bool,
pub has_async_copy: bool,
pub has_cluster_launch: bool,
pub has_tma: bool,
pub min_warp_size: i32,
}Fields§
§compute_major: i32§compute_minor: i32§has_tensor_cores: bool§has_fp64_tensor_cores: bool§has_async_copy: bool§has_cluster_launch: bool§has_tma: bool§min_warp_size: i32Implementations§
Source§impl GpuCapability
impl GpuCapability
pub const fn from_compute_capability(major: i32, minor: i32) -> Self
Sourcepub const fn nvrtc_arch(&self) -> &'static str
pub const fn nvrtc_arch(&self) -> &'static str
NVRTC --gpu-architecture virtual-arch string for this device’s compute
capability (e.g. compute_80 for an A100 8.0).
Critical for NVRTC correctness, not just performance: with no
--gpu-architecture, NVRTC defaults to a virtual arch below sm_60,
where the atomicAdd(double*, double) overload (added in compute
capability 6.0) does not exist. A kernel source using double atomics
(the SAE arrow/Schur PCG kernels do) then fails to compile, the module
load Errs, and the whole device path silently falls back to the CPU.
Keying the arch to the real device capability admits those kernels.
Returns a &'static str because cudarc’s CompileOptions::arch is
Option<&'static str>. Unknown/future capabilities round DOWN to the
nearest known major to stay valid for the installed NVRTC, never up
(an arch newer than the toolkit knows would itself fail to compile).
Trait Implementations§
Source§impl Clone for GpuCapability
impl Clone for GpuCapability
Source§fn clone(&self) -> GpuCapability
fn clone(&self) -> GpuCapability
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for GpuCapability
impl Debug for GpuCapability
impl Eq for GpuCapability
Source§impl PartialEq for GpuCapability
impl PartialEq for GpuCapability
Source§fn eq(&self, other: &GpuCapability) -> bool
fn eq(&self, other: &GpuCapability) -> bool
self and other values to be equal, and is used by ==.impl StructuralPartialEq for GpuCapability
Auto Trait Implementations§
impl Freeze for GpuCapability
impl RefUnwindSafe for GpuCapability
impl Send for GpuCapability
impl Sync for GpuCapability
impl Unpin for GpuCapability
impl UnsafeUnpin for GpuCapability
impl UnwindSafe for GpuCapability
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T, U> Imply<T> for U
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more