pub struct DeviceS2KernelMatrix {
pub rows: usize,
pub cols: usize,
pub ld: usize,
pub col_major_dev: CudaSlice<f64>,
pub stream: Arc<CudaStream>,
}Expand description
Device-resident (rows × cols) matrix in column-major layout with
leading dimension ld ≥ rows. The slice holds ld * cols f64
elements; entry (i, j) lives at col_major_dev[j * ld + i].
On non-Linux builds the type is intentionally a host shadow so the surrounding orchestration compiles without cudarc.
Fields§
§rows: usize§cols: usize§ld: usize§col_major_dev: CudaSlice<f64>§stream: Arc<CudaStream>Implementations§
Source§impl DeviceS2KernelMatrix
impl DeviceS2KernelMatrix
Sourcepub fn to_host_array(&self) -> Result<Array2<f64>, GpuError>
pub fn to_host_array(&self) -> Result<Array2<f64>, GpuError>
Copy the device matrix back to the host as a regular ndarray
(rows × cols) row-major view. Convenience for tests + parity
comparisons; production paths should keep the matrix resident.
The device matrix is (ld × cols) column-major; the host wants
(rows × cols) row-major. Two costs dominate this round-trip on the
real V100:
- the device→host copy of the full
ld·cols·8 Bpayload, and - the column-major→row-major transpose.
On Linux the dtoh is staged through a cacheable pinned host buffer
(see [
PinnedF64]) so the DMA runs at full PCIe bandwidth (~10 GB/s) instead of the ~1.3 GB/s the driver achieves staging a pageable destination, and the subsequent host reads during the transpose hit L1/L2 normally (unlike write-combined pinned memory). The transpose itself is the parallel cache-blocked [col_major_to_row_major_parallel].
Auto Trait Implementations§
impl Freeze for DeviceS2KernelMatrix
impl RefUnwindSafe for DeviceS2KernelMatrix
impl Send for DeviceS2KernelMatrix
impl Sync for DeviceS2KernelMatrix
impl Unpin for DeviceS2KernelMatrix
impl UnsafeUnpin for DeviceS2KernelMatrix
impl UnwindSafe for DeviceS2KernelMatrix
Blanket Implementations§
impl<T> Allocation for T
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
Source§impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T, U> Imply<T> for U
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
impl<T> Read<Exclusive, BecauseExclusive> for Twhere
T: ?Sized,
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
The inverse inclusion map: attempts to construct
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
Checks if
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
Use with care! Same as
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
The inclusion map: converts
self to the equivalent element of its superset.