pub struct GpuTensor<T: GpuFloat> { /* private fields */ }Expand description
A tensor residing on a CUDA GPU.
Wraps a CudaBuffer<T> with shape metadata and a reference to the
GpuDevice that owns the memory. Created by tensor_to_gpu or
the convenience functions cuda / cuda_default.
Convert back to a CPU Tensor with GpuTensor::cpu or the free
function tensor_to_cpu.
Implementations§
Source§impl<T: GpuFloat> GpuTensor<T>
impl<T: GpuFloat> GpuTensor<T>
Sourcepub fn buffer(&self) -> &CudaBuffer<T>
pub fn buffer(&self) -> &CudaBuffer<T>
Borrow the underlying CudaBuffer.
Sourcepub fn cpu(&self) -> FerrotorchResult<Tensor<T>>
pub fn cpu(&self) -> FerrotorchResult<Tensor<T>>
Copy this tensor back to CPU, returning a Tensor<T>.
This is a convenience wrapper around tensor_to_cpu.
Source§impl<T: GpuFloat> GpuTensor<T>
impl<T: GpuFloat> GpuTensor<T>
Sourcepub fn add(&self, other: &GpuTensor<T>) -> GpuResult<GpuTensor<T>>
pub fn add(&self, other: &GpuTensor<T>) -> GpuResult<GpuTensor<T>>
Elementwise addition: out[i] = self[i] + other[i].
Uses a PTX kernel for f32; falls back to CPU round-trip for f64.
§Errors
GpuError::LengthMismatchif shapes differ.GpuError::DeviceMismatchif tensors are on different devices.GpuError::Driveron CUDA runtime errors.
Sourcepub fn sub(&self, other: &GpuTensor<T>) -> GpuResult<GpuTensor<T>>
pub fn sub(&self, other: &GpuTensor<T>) -> GpuResult<GpuTensor<T>>
Elementwise subtraction: out[i] = self[i] - other[i].
Uses a PTX kernel for f32; falls back to CPU round-trip for f64.
Sourcepub fn mul(&self, other: &GpuTensor<T>) -> GpuResult<GpuTensor<T>>
pub fn mul(&self, other: &GpuTensor<T>) -> GpuResult<GpuTensor<T>>
Elementwise multiplication: out[i] = self[i] * other[i].
Uses a PTX kernel for f32; falls back to CPU round-trip for f64.
Sourcepub fn neg(&self) -> GpuResult<GpuTensor<T>>
pub fn neg(&self) -> GpuResult<GpuTensor<T>>
Elementwise negation: out[i] = -self[i].
Uses a PTX kernel for f32; falls back to CPU round-trip for f64.
Sourcepub fn relu(&self) -> GpuResult<GpuTensor<T>>
pub fn relu(&self) -> GpuResult<GpuTensor<T>>
Elementwise ReLU: out[i] = max(self[i], 0).
Uses a PTX kernel for f32; falls back to CPU round-trip for f64.
Sourcepub fn matmul(&self, other: &GpuTensor<T>) -> GpuResult<GpuTensor<T>>
pub fn matmul(&self, other: &GpuTensor<T>) -> GpuResult<GpuTensor<T>>
Matrix multiplication: C = self @ other.
Both tensors must be 2-D. self has shape [m, k] and other has
shape [k, n]. The result has shape [m, n].
Uses cuBLAS SGEMM for f32 and DGEMM for f64.
§Errors
GpuError::ShapeMismatchif either tensor is not 2-D or if the inner dimensions do not match (self.shape[1] != other.shape[0]).GpuError::DeviceMismatchif tensors are on different devices.GpuError::Blason cuBLAS runtime errors.
Sourcepub fn conv2d(
&self,
weight: &GpuTensor<T>,
bias: Option<&GpuTensor<T>>,
stride: (usize, usize),
padding: (usize, usize),
) -> GpuResult<GpuTensor<T>>
pub fn conv2d( &self, weight: &GpuTensor<T>, bias: Option<&GpuTensor<T>>, stride: (usize, usize), padding: (usize, usize), ) -> GpuResult<GpuTensor<T>>
2-D convolution: output = conv2d(self, weight, bias).
Uses im2col (CPU) + cuBLAS GEMM (GPU) — no cuDNN required.
self must have shape [B, C_in, H, W] and weight must have
shape [C_out, C_in, kH, kW]. bias, if provided, must have
shape [C_out]. The result has shape [B, C_out, H_out, W_out].
Currently only supports f32. For f64 tensors, returns
GpuError::ShapeMismatch (f64 conv path not yet implemented).
§Errors
GpuError::ShapeMismatchif tensor dimensions are wrong, channel counts don’t match, or ifTis notf32.GpuError::DeviceMismatchif tensors are on different devices.GpuError::Blason cuBLAS runtime errors.
Trait Implementations§
Auto Trait Implementations§
impl<T> Freeze for GpuTensor<T>
impl<T> RefUnwindSafe for GpuTensor<T>where
T: RefUnwindSafe,
impl<T> Send for GpuTensor<T>
impl<T> Sync for GpuTensor<T>
impl<T> Unpin for GpuTensor<T>
impl<T> UnsafeUnpin for GpuTensor<T>
impl<T> UnwindSafe for GpuTensor<T>where
T: RefUnwindSafe,
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T> DistributionExt for Twhere
T: ?Sized,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more