Skip to main content

GpuTensor

ferrotorch_gpu::tensor_bridge

Struct GpuTensor

pub struct GpuTensor<T: GpuFloat> { /* private fields */ }

Expand description

A tensor residing on a CUDA GPU.

Wraps a CudaBuffer<T> with shape metadata and a reference to the GpuDevice that owns the memory. Created by tensor_to_gpu or the convenience functions cuda / cuda_default.

Convert back to a CPU Tensor with GpuTensor::cpu or the free function tensor_to_cpu.

Implementations§

impl<T: GpuFloat> GpuTensor<T>

pub fn shape(&self) -> &[usize]

The shape of this tensor.

pub fn numel(&self) -> usize

Total number of elements.

pub fn device(&self) -> &GpuDevice

The GPU device that holds this tensor’s data.

pub fn buffer(&self) -> &CudaBuffer<T>

Borrow the underlying CudaBuffer.

pub fn ndim(&self) -> usize

Number of dimensions.

pub fn cpu(&self) -> FerrotorchResult<Tensor<T>>

Copy this tensor back to CPU, returning a Tensor<T>.

This is a convenience wrapper around tensor_to_cpu.

impl<T: GpuFloat> GpuTensor<T>

pub fn add(&self, other: &GpuTensor<T>) -> GpuResult<GpuTensor<T>>

Elementwise addition: out[i] = self[i] + other[i].

Uses a PTX kernel for f32; falls back to CPU round-trip for f64.

§Errors

GpuError::LengthMismatch if shapes differ.
GpuError::DeviceMismatch if tensors are on different devices.
GpuError::Driver on CUDA runtime errors.

pub fn sub(&self, other: &GpuTensor<T>) -> GpuResult<GpuTensor<T>>

Elementwise subtraction: out[i] = self[i] - other[i].

Uses a PTX kernel for f32; falls back to CPU round-trip for f64.

pub fn mul(&self, other: &GpuTensor<T>) -> GpuResult<GpuTensor<T>>

Elementwise multiplication: out[i] = self[i] * other[i].

Uses a PTX kernel for f32; falls back to CPU round-trip for f64.

pub fn neg(&self) -> GpuResult<GpuTensor<T>>

Elementwise negation: out[i] = -self[i].

Uses a PTX kernel for f32; falls back to CPU round-trip for f64.

pub fn relu(&self) -> GpuResult<GpuTensor<T>>

Elementwise ReLU: out[i] = max(self[i], 0).

Uses a PTX kernel for f32; falls back to CPU round-trip for f64.

pub fn matmul(&self, other: &GpuTensor<T>) -> GpuResult<GpuTensor<T>>

Matrix multiplication: C = self @ other.

Both tensors must be 2-D. self has shape [m, k] and other has shape [k, n]. The result has shape [m, n].

Uses cuBLAS SGEMM for f32 and DGEMM for f64.

§Errors

GpuError::ShapeMismatch if either tensor is not 2-D or if the inner dimensions do not match (self.shape[1] != other.shape[0]).
GpuError::DeviceMismatch if tensors are on different devices.
GpuError::Blas on cuBLAS runtime errors.

pub fn conv2d( &self, weight: &GpuTensor<T>, bias: Option<&GpuTensor<T>>, stride: (usize, usize), padding: (usize, usize), ) -> GpuResult<GpuTensor<T>>

2-D convolution: output = conv2d(self, weight, bias).

Uses im2col (CPU) + cuBLAS GEMM (GPU) — no cuDNN required.

self must have shape [B, C_in, H, W] and weight must have shape [C_out, C_in, kH, kW]. bias, if provided, must have shape [C_out]. The result has shape [B, C_out, H_out, W_out].

Currently only supports f32. For f64 tensors, returns GpuError::ShapeMismatch (f64 conv path not yet implemented).

§Errors

GpuError::ShapeMismatch if tensor dimensions are wrong, channel counts don’t match, or if T is not f32.
GpuError::DeviceMismatch if tensors are on different devices.
GpuError::Blas on cuBLAS runtime errors.

Trait Implementations§

impl<T: GpuFloat> Debug for GpuTensor<T>

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

impl<T> Freeze for GpuTensor<T>

impl<T> RefUnwindSafe for GpuTensor<T>
where T: RefUnwindSafe,

impl<T> Send for GpuTensor<T>

impl<T> Sync for GpuTensor<T>

impl<T> Unpin for GpuTensor<T>

impl<T> UnsafeUnpin for GpuTensor<T>

impl<T> UnwindSafe for GpuTensor<T>
where T: RefUnwindSafe,

Blanket Implementations§

impl<T> Any for T
where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

impl<T> Borrow<T> for T
where T: ?Sized,

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

impl<T> BorrowMut<T> for T
where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

impl<T> ByRef<T> for T

fn by_ref(&self) -> &T

impl<T> DistributionExt for T
where T: ?Sized,

fn rand<T>(&self, rng: &mut (impl Rng + ?Sized)) -> T
where Self: Distribution<T>,

impl<T> From<T> for T

fn from(t: T) -> T

Returns the argument unchanged.

impl<T, U> Into<U> for T
where U: From<T>,

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

impl<T> Pointable for T

const ALIGN: usize

The alignment of pointer.

type Init = T

The type for initializers.

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more

impl<T, U> TryFrom<U> for T
where U: Into<T>,

type Error = Infallible

The type returned in the event of a conversion error.

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

fn vzip(self) -> V

impl<T, U> Imply<T> for U
where T: ?Sized, U: ?Sized,