Struct Tensor

Source

pub struct Tensor { /* private fields */ }

Expand description

Tensor represents a multi-dimensional array with lazy evaluation.

Operations like addition and multiplication build a computation graph without allocating buffers. Buffers are only allocated when:

Creating input tensors via from_slice()
Evaluating the computation graph via realize()

§Global Graph Substitution

Tensors are registered in a global registry to support atomic graph substitution. When rangeify transforms a UOp (e.g., NEG → BUFFERIZE(NEG)), all tensors referencing it are updated atomically via apply_map_to_tensors().

This is critical for diamond patterns (like argmin’s NEG feeding both MAX and EQ) where different consumers must see the same transformed version.

§Buffer Ownership (RAII)

Tensors own their buffers via Arc<Buffer>. When all Tensor clones referencing a buffer are dropped, the buffer is automatically freed. This provides RAII cleanup without manual buffer management.

§Examples

let a = Tensor::from_slice(&[1.0f32, 2.0, 3.0]);
let b = Tensor::from_slice(&[4.0f32, 5.0, 6.0]);
let mut c = &a + &b;  // Lazy - only builds UOp graph
c.realize().unwrap();  // Executes the computation

Implementations§

Source §

impl Tensor

Source

pub fn relu(&self) -> Result<Self>

Rectified Linear Unit: max(0, x).

ReLU is one of the most common activation functions in deep learning. It’s simple, efficient, and helps mitigate the vanishing gradient problem.

§Examples

let x = Tensor::from_slice(&[-2.0f32, -1.0, 0.0, 1.0, 2.0]);
let y = x.relu()?;
// y = [0.0, 0.0, 0.0, 1.0, 2.0]

Source

pub fn sigmoid(&self) -> Result<Self>

Sigmoid activation: 1 / (1 + exp(-x)).

Maps input to range (0, 1), commonly used for binary classification.

§Examples

let x = Tensor::from_slice(&[-2.0f32, -1.0, 0.0, 1.0, 2.0]);
let y = x.sigmoid()?;
// y ≈ [0.119, 0.268, 0.5, 0.731, 0.880]

Source

pub fn tanh(&self) -> Result<Self>

Hyperbolic tangent: tanh(x).

Maps input to range (-1, 1), centered at zero.

§Examples

let x = Tensor::from_slice(&[-2.0f32, -1.0, 0.0, 1.0, 2.0]);
let y = x.tanh()?;
// y ≈ [-0.964, -0.762, 0.0, 0.762, 0.964]

Source

pub fn softmax(&self, axis: impl Into<AxisSpec>) -> Result<Self>

Softmax activation: exp(x - max(x)) / sum(exp(x - max(x))).

Converts logits to probability distribution over specified axis. Numerically stable implementation using max subtraction.

§Arguments

axis - Axis along which to compute softmax (default: -1, last axis)

§Examples

let logits = Tensor::from_slice(&[1.0f32, 2.0, 3.0, 4.0]);
let probs = logits.softmax(-1)?;
// sum(probs) = 1.0, probs[i] > 0 for all i

Source

pub fn log_softmax(&self, axis: impl Into<AxisSpec>) -> Result<Self>

Log-softmax activation: log(softmax(x)).

Numerically stable implementation: x - max(x) - log(sum(exp(x - max(x)))).

More numerically stable than computing log(softmax(x)) separately.

§Arguments

axis - Axis along which to compute log-softmax (default: -1, last axis)

§Examples

let logits = Tensor::from_slice(&[1.0f32, 2.0, 3.0, 4.0]);
let log_probs = logits.log_softmax(-1)?;
// More numerically stable than logits.softmax(-1)?.try_log()

Source

pub fn logsumexp(&self, axis: impl Into<AxisSpec>) -> Result<Self>

Log-sum-exp: log(sum(exp(x))).

Numerically stable implementation: max(x) + log(sum(exp(x - max(x)))).

§Arguments

axis - Axis along which to compute logsumexp

§Examples

let x = Tensor::from_slice(&[1.0f32, 2.0, 3.0, 4.0]);
let lse = x.logsumexp(-1)?;

Source

pub fn gelu(&self) -> Result<Self>

GELU activation (Gaussian Error Linear Unit).

Smooth approximation: 0.5 * x * (1 + tanh(sqrt(2/π) * (x + 0.044715 * x^3))).

GELU is the standard activation for Transformer models (BERT, GPT, etc.).

§Examples

let x = Tensor::from_slice(&[-2.0f32, -1.0, 0.0, 1.0, 2.0]);
let y = x.gelu()?;

Source

pub fn gelu_exact(&self) -> Result<Self>

Exact GELU: 0.5 * x * (1 + erf(x / sqrt(2))).

Source

pub fn hard_sigmoid(&self, alpha: f64, beta: f64) -> Result<Self>

Hard Sigmoid: clamp(alpha * x + beta, 0, 1).

Piecewise linear approximation of sigmoid. Faster to compute.

§Arguments

alpha - Slope (default 0.2 in ONNX)
beta - Offset (default 0.5 in ONNX)

Source

pub fn leaky_relu(&self, alpha: f64) -> Result<Self>

Leaky ReLU: x if x > 0, alpha * x otherwise.

§Arguments

alpha - Negative slope (default 0.01 in ONNX)

Source

pub fn prelu(&self, slope: &Tensor) -> Result<Self>

PReLU: x if x > 0, slope * x otherwise.

Like LeakyReLU but with a learned per-channel slope.

Source

pub fn thresholded_relu(&self, alpha: f64) -> Result<Self>

Thresholded ReLU: x if x > alpha, 0 otherwise.

§Arguments

alpha - Threshold (default 1.0 in ONNX)

Source

pub fn elu(&self, alpha: f64) -> Result<Self>

ELU: x if x > 0, alpha * (exp(x) - 1) otherwise.

§Arguments

alpha - Scale for negative part (default 1.0 in ONNX)

Source

pub fn selu(&self, alpha: f64, gamma: f64) -> Result<Self>

SELU: gamma * (alpha * exp(x) - alpha) if x <= 0, gamma * x if x > 0.

Self-normalizing activation with fixed constants.

§Arguments

alpha - Default 1.6732632…
gamma - Default 1.0507010…

Source

pub fn swish(&self) -> Result<Self>

Swish/SiLU activation: x * sigmoid(x).

Also known as SiLU (Sigmoid Linear Unit). Used in modern CNN architectures and some Transformers.

§Examples

let x = Tensor::from_slice(&[-2.0f32, -1.0, 0.0, 1.0, 2.0]);
let y = x.swish()?;

Source

pub fn silu(&self) -> Result<Self>

Alias for swish (matches PyTorch naming).

Source

pub fn glu(&self, dim: isize) -> Result<Self>

Gated Linear Unit: splits self along dim into two halves, returns first_half * sigmoid(second_half).

Source

pub fn softplus(&self, beta: f64) -> Result<Self>

Softplus: log(1 + exp(beta*x)) / beta, numerically stable via logaddexp.

Source

pub fn mish(&self) -> Result<Self>

Mish: x * tanh(softplus(x)).

Source

pub fn relu6(&self) -> Result<Self>

ReLU6: relu(x) - relu(x-6) = clamp(x, 0, 6).

Source

pub fn hardswish(&self) -> Result<Self>

HardSwish: x * relu6(x+3) / 6.

Source

pub fn softsign(&self) -> Result<Self>

Softsign: x / (1 + |x|).

Source

pub fn celu(&self, alpha: f64) -> Result<Self>

CELU: max(0, x) + min(0, alpha*(exp(x/alpha)-1)).

Source

pub fn batchnorm<'f1, 'f2, 'f3, 'f4, 'f5>( &'f1 self, ) -> TensorBatchnormBuilder<'f1, 'f2, 'f3, 'f4, 'f5>

Batch Normalization.

Applies: y = scale * (x - mean) * invstd + bias where invstd = 1 / sqrt(var + epsilon)

This is the inference mode batchnorm (no running stats update). The caller provides pre-computed mean and inverse standard deviation.

§Arguments

scale - Gamma/weight parameter (optional, defaults to 1)
bias - Beta parameter (optional, defaults to 0)
mean - Running mean
invstd - Inverse standard deviation (1 / sqrt(var + eps))
axis - Axis/axes to normalize over (default: 1 for NCHW)

§Examples

let x = Tensor::randn(&[8, 4, 16, 16]);
let mean = x.mean(AxisSpec::Multiple(vec![0, 2, 3]))?;
let var = x.var(AxisSpec::Multiple(vec![0, 2, 3]))?;
let eps = Tensor::from_slice([1e-5]);
let invstd = var.try_add(&eps)?.try_rsqrt()?;
let normalized = x.batchnorm().mean(&mean).invstd(&invstd).call()?;

Source §

pub fn logical_not(&self) -> Result<Tensor>

Logical NOT for boolean tensors.

Converts to boolean dtype and applies logical negation. For non-boolean tensors, treats zero as false, non-zero as true.

§Examples

let t = Tensor::from_slice(&[true, false, true]);
let result = t.logical_not()?;  // [false, true, false]

let nums = Tensor::from_slice(&[0.0f32, 1.0, 2.0]);
let result = nums.logical_not()?;  // [true, false, false]

Source

pub fn bitwise_not(&self) -> Result<Tensor>

Bitwise NOT for integer tensors.

Applies bitwise NOT operation using two’s complement: ~x = -x - 1. Only works for integer dtypes.

§Examples

let t = Tensor::from_slice(&[0i32, 1, 2, -1]);
let result = t.bitwise_not()?;  // [-1, -2, -3, 0]

§Errors

Returns error if called on non-integer dtype.

Source §

impl Tensor

Source

pub fn bitwise_and(&self, other: &Tensor) -> Result<Tensor>

Bitwise AND operation.

Performs element-wise bitwise AND between two tensors with broadcasting. Both tensors must have integer or boolean dtype.

Source

pub fn bitwise_or(&self, other: &Tensor) -> Result<Tensor>

Bitwise OR operation.

Performs element-wise bitwise OR between two tensors with broadcasting. Both tensors must have integer or boolean dtype.

Source

pub fn bitwise_xor(&self, other: &Tensor) -> Result<Tensor>

Bitwise XOR operation.

Performs element-wise bitwise XOR between two tensors with broadcasting. Both tensors must have integer or boolean dtype.

Source

pub fn lshift(&self, other: &Tensor) -> Result<Tensor>

Left shift operation.

Shifts bits of the tensor to the left by the specified amount with broadcasting. The tensor must have integer or boolean dtype.

Source

pub fn rshift(&self, other: &Tensor) -> Result<Tensor>

Right shift operation.

Shifts bits of the tensor to the right by the specified amount with broadcasting. The tensor must have integer or boolean dtype.

Source §

impl Tensor

Source

pub fn broadcast_to(&self, target_shape: &Shape) -> Result<Tensor>

Broadcast tensor to a target shape.

This is the low-level broadcast operation that reshapes (adds explicit 1 dimensions) and then expands (replicates data along size-1 dimensions).

§Algorithm

If shape already matches, return self
Pad shape with 1s on the left to match rank
Reshape to add explicit 1 dimensions
Expand size-1 dimensions to target size

§Examples

// [3] -> [2, 3]
let t = Tensor::from_slice([1.0f32, 2.0, 3.0]);
let target = vec![SInt::from(2), SInt::from(3)];
let broadcasted = t.broadcast_to(&target)?;

§Errors

Returns error if:

Shape has more dimensions than target
Dimension sizes are incompatible (not 1 and not equal to target)

Source §

impl Tensor

Source

pub fn where_(&self, condition: &Tensor, other: &Tensor) -> Result<Self>

Element-wise conditional selection: condition ? self : other.

For each element, returns self[i] if condition[i] is true, else other[i].

§Arguments

condition - Boolean tensor (dtype should be Bool or will be treated as boolean)
other - Alternative value tensor

§Shape Requirements

All three tensors (self, condition, other) must be broadcastable to the same shape.

§Examples

let x = Tensor::from_slice(&[1.0f32, 2.0, 3.0, 4.0]);
let condition = &x.gt(&Tensor::from_slice(&[2.0f32]))?; // [false, false, true, true]
let zeros = Tensor::from_slice(&[0.0f32]);

// Replace values > 2.0 with the original value, else 0
let result = x.where_(condition, &zeros)?;
// result = [0.0, 0.0, 3.0, 4.0]

Source

pub fn maximum(&self, other: &Tensor) -> Result<Self>

Element-wise maximum: max(self, other).

Returns the element-wise maximum of two tensors. This is NOT a reduction - it returns a tensor of the same shape.

§Shape Requirements

Both tensors must be broadcastable to the same shape.

§Examples

let a = Tensor::from_slice(&[1.0f32, 5.0, 3.0]);
let b = Tensor::from_slice(&[2.0f32, 3.0, 4.0]);
let result = a.maximum(&b)?;
// result = [2.0, 5.0, 4.0]

Source

pub fn minimum(&self, other: &Tensor) -> Result<Self>

Element-wise minimum: min(self, other).

Returns the element-wise minimum of two tensors. This is NOT a reduction - it returns a tensor of the same shape.

§Shape Requirements

Both tensors must be broadcastable to the same shape.

§Examples

let a = Tensor::from_slice(&[1.0f32, 5.0, 3.0]);
let b = Tensor::from_slice(&[2.0f32, 3.0, 4.0]);
let result = a.minimum(&b)?;
// result = [1.0, 3.0, 3.0]

Source

pub fn clamp<'f1, 'f2, 'f3>(&'f1 self) -> TensorClampBuilder<'f1, 'f2, 'f3>

Clamp values to a range: max(min_val, min(self, max_val)).

Constrains all elements to be within [min_val, max_val].

§Examples

let x = Tensor::from_slice(&[-1.0f32, 0.0, 1.0, 2.0, 3.0]);
let min = Tensor::from_slice(&[0.0f32, 0.0, 0.0, 0.0, 0.0]);
let max = Tensor::from_slice(&[2.0f32, 2.0, 2.0, 2.0, 2.0]);

// Clamp to [0, 2]
let result = x.clamp().min(&min).max(&max).call()?;
// result = [0.0, 0.0, 1.0, 2.0, 2.0]

// Clamp only lower bound
let result = x.clamp().min(&min).call()?;
// result = [0.0, 0.0, 1.0, 2.0, 3.0]

// Clamp only upper bound
let result = x.clamp().max(&max).call()?;
// result = [-1.0, 0.0, 1.0, 2.0, 2.0]

Source

pub fn clip<'f1, 'f2, 'f3>(&'f1 self) -> TensorClipBuilder<'f1, 'f2, 'f3>

Alias for clamp (matches NumPy/PyTorch naming).

§Examples

let x = Tensor::from_slice(&[-1.0f32, 0.0, 1.0, 2.0, 3.0]);
let min = Tensor::from_slice(&[0.0f32, 0.0, 0.0, 0.0, 0.0]);
let max = Tensor::from_slice(&[2.0f32, 2.0, 2.0, 2.0, 2.0]);

// Clip to [0, 2]
let result = x.clip().min(&min).max(&max).call()?;

Source §

impl Tensor

Source

pub fn from_slice<T: HasDType, C: AsRef<[T]>>(source: C) -> Self

Create tensor from slice on CPU (default device).

§Examples

let a = Tensor::from_slice(&[1.0f32, 2.0, 3.0]);

Source

pub fn from_slice_with<T: HasDType, C: AsRef<[T]>>() -> TensorFromSliceWithBuilder<T, C>

Create tensor from slice with explicit device specification using builder pattern.

Source §

impl Tensor

Source

pub fn from_raw_bytes( data: &[u8], shape: &[usize], dtype: DType, ) -> Result<Self>

Create tensor from raw bytes with explicit dtype and shape.

The bytes are interpreted as little-endian values of the given dtype. Length must equal product(shape) * dtype.bytes(). Used for types without a native Rust representation (Float16, BFloat16, FP8).

Source

pub fn from_ndarray<T, S, D>(array: &ArrayBase<S, D>) -> Self
where T: HasDType + Clone, S: Data<Elem = T>, D: Dimension,

Create tensor from an ndarray (owned Array or ArrayView).

When the array is already C-contiguous, uses the backing slice directly (no intermediate allocation). Falls back to .iter().cloned().collect() for Fortran-order or non-contiguous layouts.

§Examples

let t = Tensor::from_ndarray(&array![[1.0f32, 2.0, 3.0], [4.0, 5.0, 6.0]]);
let view = t.array_view::<f32>().unwrap();
assert_eq!(view[[1, 2]], 6.0);

Source

pub fn buffer(&self) -> Option<Buffer>

Get a reference to the underlying buffer.

Returns None for lazy tensors that haven’t been realized yet. Returns Some(buffer) for input tensors and realized tensors.

Source

pub fn as_ndarray<T: HasDType + Default + Clone>(&self) -> Result<ArrayD<T>>

Read realized tensor data as an ndarray.

The tensor must have a buffer (from from_slice, realize(), etc.). Returns error if the tensor has not been realized.

§Examples

let t = Tensor::from_slice(&[1.0f32, 2.0, 3.0]);
let result = t.as_ndarray::<f32>().unwrap();
assert_eq!(result.shape(), &[3]);

Source

pub fn as_vec<T: HasDType + Default + Clone>(&self) -> Result<Vec<T>>

Read realized tensor data as a flat Vec<T>.

The tensor must have a buffer (from from_slice, realize(), etc.). Returns error if the tensor has not been realized.

§Examples

let t = Tensor::from_slice(&[1.0f32, 2.0, 3.0]);
let v = t.as_vec::<f32>().unwrap();
assert_eq!(v, vec![1.0, 2.0, 3.0]);

Source

pub fn array_view<T: HasDType>(&self) -> Result<ArrayViewD<'_, T>>

Typed immutable view into the buffer, shaped by the tensor’s logical shape.

Uses the tensor’s concrete shape for multidimensional indexing. Falls back to the buffer’s flat shape for symbolic tensors.

§Examples

let t = Tensor::from_ndarray(&array![[1.0f32, 2.0], [3.0, 4.0]]);
let view = t.array_view::<f32>().unwrap();
assert_eq!(view[[0, 1]], 2.0);

Source

pub fn array_view_mut<T: HasDType>(&self) -> Result<ArrayViewMutD<'_, T>>

Typed mutable view into the buffer, shaped by the tensor’s logical shape.

§Examples

let t = Tensor::from_ndarray(&array![[0.0f32, 0.0, 0.0], [0.0, 0.0, 0.0]]);
t.array_view_mut::<f32>().unwrap()[[1, 2]] = 42.0;
assert_eq!(t.array_view::<f32>().unwrap()[[1, 2]], 42.0);

Source §

impl Tensor

Source

pub fn einsum(formula: &str, operands: &[&Tensor]) -> Result<Tensor>

Source §

impl Tensor

Source

pub fn gather(&self, dim: isize, index: &Tensor) -> Result<Self>

Gather values along an axis specified by dim, using index for element selection.

Source

pub fn index_select(&self, dim: isize, index: &Tensor) -> Result<Self>

Select elements along dim using a 1D index tensor.

For input shape [A, B, C] with dim=1 and index shape [K], returns shape [A, K, C].

Source

pub fn one_hot_along_dim( &self, num_classes: usize, dim: isize, ) -> Result<Tensor>

One-hot encoding: self == arange(num_classes) broadcast along dim. Returns a boolean tensor with True at the class positions.

Source

pub fn normalize_negative_indices(&self, dim_size: i64) -> Result<Tensor>

Normalize negative indices: indices[i] = indices[i] < 0 ? indices[i] + dim_size : indices[i]

Source

pub fn scatter( &self, dim: isize, index: &Tensor, src: &Tensor, ) -> Result<Tensor>

Scatter values along dim using index positions.

For each position in index, places the corresponding src value into self at the specified index along dim. When multiple indices map to the same position, the last value wins (matching PyTorch/Tinygrad semantics).

Source

pub fn scatter_reduce( &self, dim: isize, index: &Tensor, src: &Tensor, reduce: ScatterReduction, include_self: bool, ) -> Result<Tensor>

Scatter with reduction. Applies reduce (sum/prod/amax/amin) at scatter positions.

Source

pub fn masked_select(&self, mask: &Tensor) -> Result<Tensor>

Select elements where mask is true, returning a flat tensor.

Requires realize() internally (data-dependent output size).

Source

pub fn compress( &self, condition: &[bool], axis: Option<isize>, ) -> Result<Tensor>

Select elements along an axis where condition is true.

If axis is None, the input is flattened first and selection is along axis 0. The condition is a 1D boolean/integer tensor; nonzero values select.

Source

pub fn sort(&self, dim: isize, descending: bool) -> Result<(Tensor, Tensor)>

Bitonic sort along a dimension. Returns (sorted_values, indices).

Source

pub fn topk( &self, k: usize, dim: isize, largest: bool, ) -> Result<(Tensor, Tensor)>

Top-k elements along a dimension. Returns (values, indices).

Source

pub fn nonzero(&self) -> Result<Tensor>

Indices of non-zero elements. Returns [num_nonzero, ndim] tensor.

Source

pub fn reverse_sequence( &self, sequence_lens: &Tensor, time_axis: usize, batch_axis: usize, ) -> Result<Self>

Reverse the first sequence_lens[i] elements along time_axis for each batch element i along batch_axis, leaving the rest unchanged.

Source

pub fn gather_nd(&self, indices: &Tensor, batch_dims: usize) -> Result<Tensor>

Gather values using N-dimensional indices.

Source

pub fn scatter_nd( &self, indices: &Tensor, updates: &Tensor, reduction: &str, ) -> Result<Tensor>

Scatter updates into a tensor using N-dimensional indices.

Source

pub fn tensor_scatter( &self, update: &Tensor, write_indices: Option<&Tensor>, mode: &str, axis: isize, ) -> Result<Tensor>

Batch-aware tensor scatter with write index offsets.

Source §

impl Tensor

Source

pub fn sin(&self) -> Result<Tensor>

Sine function: sin(x).

Computes the sine of each element. Requires float dtype.

§Examples

use std::f32::consts::PI;
let t = Tensor::from_slice(&[0.0f32, PI/2.0, PI]);
let result = t.sin()?;  // [0, 1, 0]

§Errors

Returns error if dtype is not float.

Source

pub fn cos(&self) -> Result<Tensor>

Cosine function: cos(x).

Computes the cosine of each element. Requires float dtype.

§Examples

use std::f32::consts::PI;
let t = Tensor::from_slice(&[0.0f32, PI/2.0, PI]);
let result = t.cos()?;  // [1, 0, -1]

§Errors

Returns error if dtype is not float.

Source

pub fn tan(&self) -> Result<Tensor>

Tangent function: tan(x).

Computes the tangent of each element. Requires float dtype.

§Examples

use std::f32::consts::PI;
let t = Tensor::from_slice(&[0.0f32, PI/4.0]);
let result = t.tan()?;  // [0, 1]

§Errors

Returns error if dtype is not float.

Source

pub fn floor(&self) -> Result<Tensor>

Floor function: round towards -∞.

Returns the largest integer less than or equal to each element. For integer dtypes, returns the tensor unchanged.

§Examples

let t = Tensor::from_slice(&[1.2f32, -1.2, 2.8, -2.8]);
let result = t.floor()?;  // [1.0, -2.0, 2.0, -3.0]

Source

pub fn ceil(&self) -> Result<Tensor>

Ceiling function: round towards +∞.

Returns the smallest integer greater than or equal to each element. For integer dtypes, returns the tensor unchanged.

§Examples

let t = Tensor::from_slice(&[1.2f32, -1.2, 2.8, -2.8]);
let result = t.ceil()?;  // [2.0, -1.0, 3.0, -2.0]

Source

pub fn round(&self) -> Result<Tensor>

Round function: round to nearest integer (half to even).

Rounds each element to the nearest integer. Ties are rounded to the nearest even number. For integer dtypes, returns the tensor unchanged.

§Examples

let t = Tensor::from_slice(&[1.2f32, 1.5, 2.5, -1.5]);
let result = t.round()?;  // [1.0, 2.0, 2.0, -2.0]

Source

pub fn trunc(&self) -> Result<Tensor>

Truncate function: round towards zero.

Removes the fractional part, rounding towards zero. For integer dtypes, returns the tensor unchanged.

§Examples

let t = Tensor::from_slice(&[1.2f32, -1.2, 2.8, -2.8]);
let result = t.trunc()?;  // [1.0, -1.0, 2.0, -2.0]

Source

pub fn erf(&self) -> Result<Tensor>

Error function: erf(x).

Computes the error function (Gauss error function) of each element. Requires float dtype. Critical for GELU activation.

§Examples

let t = Tensor::from_slice(&[-1.0f32, 0.0, 1.0]);
let result = t.erf()?;  // [-0.8427, 0, 0.8427]

§Errors

Returns error if dtype is not float.

Source

pub fn reciprocal(&self) -> Result<Tensor>

Reciprocal: 1/x.

Computes the reciprocal of each element.

§Examples

let t = Tensor::from_slice(&[1.0f32, 2.0, 4.0]);
let result = t.reciprocal()?;  // [1.0, 0.5, 0.25]

Source

pub fn square(&self) -> Result<Tensor>

Square: x².

Computes the square of each element.

§Examples

let t = Tensor::from_slice(&[1.0f32, 2.0, 3.0, -4.0]);
let result = t.square()?;  // [1.0, 4.0, 9.0, 16.0]

Source

pub fn sign(&self) -> Result<Tensor>

Sign function: -1 for negative, 0 for zero, 1 for positive.

Returns the sign of each element.

§Examples

let t = Tensor::from_slice(&[-5.0f32, 0.0, 3.0, -0.0]);
let result = t.sign()?;  // [-1.0, 0.0, 1.0, 0.0]

Source

pub fn lerp(&self, end: &Tensor, weight: &Tensor) -> Result<Tensor>

Linear interpolation: self + (end - self) * weight.

Source

pub fn isnan(&self) -> Result<Tensor>

Returns true where elements are NaN: self != self.

Source

pub fn isinf( &self, detect_positive: bool, detect_negative: bool, ) -> Result<Tensor>

Returns true where elements are infinite.

Detects ±∞ via bitcast to the corresponding unsigned integer type and a bit-pattern compare. Operating in integer space sidesteps Svod’s float range analysis, which folds x == ±inf to false because dtype_bounds returns finite ±max for floats. Tinygrad gets away with the float compare because its dtype.min/max are ±inf.

Source

pub fn sinh(&self) -> Result<Tensor>

Hyperbolic sine: (exp(x) - exp(-x)) / 2.

Source

pub fn cosh(&self) -> Result<Tensor>

Hyperbolic cosine: (exp(x) + exp(-x)) / 2.

Source

pub fn asinh(&self) -> Result<Tensor>

Inverse hyperbolic sine: log(x + sqrt(x² + 1)).

Source

pub fn acosh(&self) -> Result<Tensor>

Inverse hyperbolic cosine: log(x + sqrt(x² - 1)).

Source

pub fn atanh(&self) -> Result<Tensor>

Inverse hyperbolic tangent: 0.5 * log((1+x)/(1-x)).

Source

pub fn asin(&self) -> Result<Tensor>

Arcsine using polynomial approximation (Abramowitz & Stegun 4.4.46).

Source

pub fn acos(&self) -> Result<Tensor>

Arccosine: π/2 - asin(x).

Source

pub fn atan(&self) -> Result<Tensor>

Arctangent: asin(x / sqrt(1 + x²)).

Source

pub fn shrink(&self, bias: f64, lambd: f64) -> Result<Tensor>

Shrinkage operator: applies soft/hard thresholding.

(x < -λ)*(x+bias) + (x > λ)*(x-bias)

Source

pub fn det(&self) -> Result<Tensor>

Matrix determinant via LU decomposition with partial pivoting.

Input shape: [..., n, n]. Output shape: [...]. Batch dimensions are preserved. Uses O(n³) computation with O(n) graph construction steps (unrolled at compile time).

Source §

impl Tensor

Source

pub fn dot(&self, other: &Tensor) -> Result<Tensor>

Dot product / matrix multiplication.

Core method following Tinygrad’s API:

1D @ 1D: dot product (scalar)
2D @ 2D: matrix multiplication
1D @ 2D: vector @ matrix
2D @ 1D: matrix @ vector
3D+: batched matmul (batch dims broadcast)

§Arguments

other - Right-hand tensor

§Examples

// Vector dot product
let a = Tensor::from_slice(&[1.0f32, 2.0, 3.0]);
let b = Tensor::from_slice(&[4.0f32, 5.0, 6.0]);
let result = a.dot(&b)?; // scalar: 32.0

// Matrix multiplication
let a = Tensor::from_slice(&[1.0f32, 2.0, 3.0, 4.0]).try_reshape(&[2, 2])?;
let b = Tensor::from_slice(&[5.0f32, 6.0, 7.0, 8.0]).try_reshape(&[2, 2])?;
let result = a.dot(&b)?; // [2, 2]

Source

pub fn matmul(&self, other: &Tensor) -> Result<Tensor>

Matrix multiplication (alias for dot).

Matches PyTorch API. Equivalent to self.dot(other).

§Examples

let a = Tensor::from_slice(&[1.0f32, 2.0, 3.0, 4.0]).try_reshape(&[2, 2])?;
let b = Tensor::from_slice(&[5.0f32, 6.0, 7.0, 8.0]).try_reshape(&[2, 2])?;
let result = a.matmul(&b)?;

Source §

impl Tensor

Source

pub fn matmul_with<'f1, 'f2>(&'f1 self) -> TensorMatmulWithBuilder<'f1, 'f2>

Matrix multiplication with optional dtype.

§Examples

let a = Tensor::from_slice(&[1.0f32, 2.0, 3.0, 4.0]).try_reshape(&[2, 2])?;
let b = Tensor::from_slice(&[5.0f32, 6.0, 7.0, 8.0]).try_reshape(&[2, 2])?;
let result = a.matmul_with(&b).dtype(DType::Float64).call()?;

Source

pub fn gemm<'f1, 'f2, 'f3>(&'f1 self) -> TensorGemmBuilder<'f1, 'f2, 'f3>

General Matrix Multiplication: alpha * A @ B + beta * C

Source

pub fn linear<'f1, 'f2, 'f3>(&'f1 self) -> TensorLinearBuilder<'f1, 'f2, 'f3>

Linear transformation: self @ weight.T + bias.

Common operation in neural networks (fully connected layers). Follows PyTorch convention where weight has shape [out_features, in_features] and is transposed before multiplication.

§Arguments

weight - Weight matrix (shape: [out_features, in_features])
bias - Optional bias vector (shape: [out_features])

§Shape Requirements

self: [..., in_features]
weight: [out_features, in_features]
bias: [out_features] or None
result: [..., out_features]

§Examples

let input = Tensor::from_slice(&[1.0f32, 2.0, 3.0]).try_reshape(&[1, 3])?;
let weight = Tensor::from_slice(&[1.0f32, 2.0, 3.0, 4.0, 5.0, 6.0]).try_reshape(&[2, 3])?;
let bias = Tensor::from_slice(&[0.1f32, 0.2f32]);
let result = input.linear().weight(&weight).bias(&bias).call()?;
// result shape: [1, 2]

Source §

impl Tensor

Source

pub fn conv2d<'f1, 'f2, 'f3, 'f4, 'f5, 'f6>( &'f1 self, ) -> TensorConv2dBuilder<'f1, 'f2, 'f3, 'f4, 'f5, 'f6>

N-d convolution. Input (N, Cin, *spatial), Weight (Cout, Cin/groups, *kernel).

Computes cross-correlation (conv without kernel flip) by extracting sliding windows via pool, then contracting against the weight tensor. Supports grouped convolution, strided/dilated kernels, and asymmetric padding.

§Examples

Basic 2D convolution with uniform data:

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 5, 5), 1.0f32));
let w = Tensor::from_ndarray(&Array4::from_elem((1, 1, 3, 3), 1.0f32));
let mut y = x.conv2d().weight(&w).call().unwrap();
y.realize().unwrap();
// 3x3 kernel of ones on input of ones => each output element is 9.0
assert_eq!(y.as_vec::<f32>().unwrap(), vec![9.0; 9]);

With stride:

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 5, 5), 1.0f32));
let w = Tensor::from_ndarray(&Array4::from_elem((1, 1, 3, 3), 1.0f32));
let mut y = x.conv2d().weight(&w).stride(&[2, 2]).call().unwrap();
y.realize().unwrap();
let shape: Vec<_> = y.shape().unwrap().iter().map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, vec![1, 1, 2, 2]);
assert_eq!(y.as_vec::<f32>().unwrap(), vec![9.0; 4]);

With padding:

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 3, 3), 1.0f32));
let w = Tensor::from_ndarray(&Array4::from_elem((1, 1, 3, 3), 1.0f32));
// padding=1 on each side: output matches input spatial dims
let mut y = x.conv2d().weight(&w).padding(&[(1, 1), (1, 1)]).call().unwrap();
y.realize().unwrap();
let vals = y.as_vec::<f32>().unwrap();
assert_eq!(vals.len(), 9); // 3x3 output
// Center element sees full 3x3 window of ones = 9.0
assert_eq!(vals[4], 9.0);
// Corner element sees 2x2 window = 4.0
assert_eq!(vals[0], 4.0);

With bias:

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 3, 3), 1.0f32));
let w = Tensor::from_ndarray(&Array4::from_elem((1, 1, 3, 3), 1.0f32));
let b = Tensor::from_slice([10.0f32]);
let mut y = x.conv2d().weight(&w).bias(&b).call().unwrap();
y.realize().unwrap();
// Each output element: 9.0 + 10.0 = 19.0
assert_eq!(y.as_vec::<f32>().unwrap(), vec![19.0]);

Source

pub fn conv_transpose2d<'f1, 'f2, 'f3, 'f4, 'f5, 'f6, 'f7>( &'f1 self, ) -> TensorConvTranspose2dBuilder<'f1, 'f2, 'f3, 'f4, 'f5, 'f6, 'f7>

Transposed convolution (fractionally-strided convolution).

Computes the gradient of a forward convolution, commonly used for upsampling. Internally flips the kernel, interleaves zeros for stride > 1, computes transposed padding, then delegates to conv2d.

Input (N, Cin, *spatial), Weight (Cin, Cout/groups, *kernel).

§Examples

Basic transposed convolution (upsampling):

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 2, 2), 1.0f32));
let w = Tensor::from_ndarray(&Array4::from_elem((1, 1, 3, 3), 1.0f32));
let mut y = x.conv_transpose2d().weight(&w).call().unwrap();
y.realize().unwrap();
let vals = y.as_vec::<f32>().unwrap();
assert_eq!(vals.len(), 16); // 4x4 output
// Center elements see full overlap of both input positions
assert_eq!(vals[5], 4.0);

With stride (stronger upsampling):

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 2, 2), 1.0f32));
let w = Tensor::from_ndarray(&Array4::from_elem((1, 1, 3, 3), 1.0f32));
let mut y = x.conv_transpose2d().weight(&w).stride(&[2, 2]).call().unwrap();
y.realize().unwrap();
let vals = y.as_vec::<f32>().unwrap();
assert_eq!(vals.len(), 25); // 5x5 output

With padding and output padding:

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 2, 2), 1.0f32));
let w = Tensor::from_ndarray(&Array4::from_elem((1, 1, 3, 3), 1.0f32));
let mut y = x.conv_transpose2d()
    .weight(&w)
    .stride(&[2, 2])
    .padding(&[(1, 1), (1, 1)])
    .output_padding(&[1, 1])
    .call()
    .unwrap();
y.realize().unwrap();
let vals = y.as_vec::<f32>().unwrap();
assert_eq!(vals.len(), 16); // 4x4 output

Source §

impl Tensor

Source

pub fn affine_grid<'f1, 'f2>() -> TensorAffineGridBuilder<'f1, 'f2>

Generate an affine sampling grid from transformation parameters.

Produces a grid of normalized coordinates suitable for grid_sample. theta holds affine matrices of shape [N, spatial_dims, spatial_dims+1]. size is the target output shape [N, C, *spatial_dims].

§Examples

Identity transform producing a 4x4 grid:

let theta = Tensor::from_ndarray(&array![[[1.0f32, 0.0, 0.0], [0.0, 1.0, 0.0]]]);
let grid = Tensor::affine_grid().theta(&theta).size(&[1, 1, 4, 4]).call().unwrap();
let shape: Vec<usize> = grid.shape().unwrap().iter()
    .map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, vec![1, 4, 4, 2]); // [N, H, W, 2]

With align_corners:

let theta = Tensor::from_ndarray(&array![[[1.0f32, 0.0, 0.0], [0.0, 1.0, 0.0]]]);
let grid = Tensor::affine_grid()
    .theta(&theta)
    .size(&[1, 1, 4, 4])
    .align_corners(true)
    .call()
    .unwrap();
let shape: Vec<usize> = grid.shape().unwrap().iter()
    .map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, vec![1, 4, 4, 2]);

Source

pub fn grid_sample<'f1, 'f2>(&'f1 self) -> TensorGridSampleBuilder<'f1, 'f2>

Sample input at positions specified by a coordinate grid.

self: Input tensor [N, C, *spatial_dims]
grid: Coordinate grid [N, *output_spatial_dims, n_spatial] with values in [-1, 1]
Returns: [N, C, *output_spatial_dims]

§Examples

Sample with a grid from affine_grid:

let theta = Tensor::from_ndarray(&array![[[1.0f32, 0.0, 0.0], [0.0, 1.0, 0.0]]]);
let grid = Tensor::affine_grid().theta(&theta).size(&[1, 1, 4, 4]).call().unwrap();
let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 4, 4), 1.0f32));
let y = x.grid_sample().grid(&grid).call().unwrap();
let shape: Vec<usize> = y.shape().unwrap().iter()
    .map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, vec![1, 1, 4, 4]);

With nearest-mode interpolation:

let theta = Tensor::from_ndarray(&array![[[1.0f32, 0.0, 0.0], [0.0, 1.0, 0.0]]]);
let grid = Tensor::affine_grid().theta(&theta).size(&[1, 1, 4, 4]).call().unwrap();
let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 4, 4), 1.0f32));
let y = x.grid_sample()
    .grid(&grid)
    .mode(GridSampleMode::Nearest)
    .call()
    .unwrap();
let shape: Vec<usize> = y.shape().unwrap().iter()
    .map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, vec![1, 1, 4, 4]);

Source §

impl Tensor

Source

pub fn layernorm(&self, axis: isize, eps: f64) -> Result<Tensor>

Layer normalization over axes [axis..ndim). Casts to f32 internally for numerical stability.

Normalizes the input so that the slice along the specified trailing axes has zero mean and unit variance, then returns the result cast back to the original dtype.

§Examples

let x = Tensor::from_ndarray(&array![[1.0f32, 2.0, 3.0], [4.0, 5.0, 6.0]]);
let mut y = x.layernorm(-1, 1e-5).unwrap();
y.realize().unwrap();
let vals = y.as_vec::<f32>().unwrap();
// Each row is independently normalized to mean~0, std~1
assert!((vals[0] + vals[1] + vals[2]).abs() < 1e-5);

Source

pub fn layernorm_with_stats( &self, axis: isize, eps: f64, ) -> Result<(Tensor, Tensor, Tensor)>

Layer normalization returning (normalized, mean, inv_std_dev).

Computes in f32 for numerical stability (matches ONNX stash_type=1). The mean and inv_std_dev tensors remain in f32 regardless of input dtype.

§Examples

let x = Tensor::from_ndarray(&array![[1.0f32, 2.0, 3.0]]);
let (_normed, mut mean, _inv_std) = x.layernorm_with_stats(-1, 1e-5).unwrap();
mean.realize().unwrap();
let mean_val = mean.as_vec::<f32>().unwrap();
assert!((mean_val[0] - 2.0).abs() < 1e-5);

Source

pub fn rms_norm(&self, axis: isize, eps: f64) -> Result<Tensor>

RMS normalization over axes [axis..ndim).

Like layernorm but without mean subtraction: divides each element by the root-mean-square of its slice. Computes the normalization factor in f32, then multiplies the original (unconverted) input.

§Examples

let x = Tensor::from_ndarray(&array![[1.0f32, 2.0, 3.0]]);
let mut y = x.rms_norm(-1, 1e-5).unwrap();
y.realize().unwrap();
let vals = y.as_vec::<f32>().unwrap();
// RMS of [1,2,3] = sqrt((1+4+9)/3) ≈ 2.16
// Output ≈ [0.46, 0.93, 1.39]
assert!((vals[0] - 1.0 / (14.0f32 / 3.0).sqrt()).abs() < 1e-4);

Source

pub fn lp_normalize(&self, axis: isize, p: i64) -> Result<Tensor>

Lp normalization along an axis.

Divides each element by the Lp norm of its slice along axis, so that every such slice has unit Lp norm. Only p=1 (L1) and p=2 (L2) are implemented; any p != 1 defaults to L2.

§Examples

L2 normalization (default p=2):

let x = Tensor::from_ndarray(&array![[3.0f32, 4.0]]);
let mut y = x.lp_normalize(-1, 2).unwrap();
y.realize().unwrap();
let vals = y.as_vec::<f32>().unwrap();
// L2 norm of [3,4] = 5, so output ≈ [0.6, 0.8]
assert!((vals[0] - 0.6).abs() < 1e-5);
assert!((vals[1] - 0.8).abs() < 1e-5);

L1 normalization (p=1):

let x = Tensor::from_ndarray(&array![[3.0f32, 4.0]]);
let mut y = x.lp_normalize(-1, 1).unwrap();
y.realize().unwrap();
let vals = y.as_vec::<f32>().unwrap();
// L1 norm of [3,4] = 7, so output ≈ [3/7, 4/7]
assert!((vals[0] - 3.0 / 7.0).abs() < 1e-5);

Source

pub fn mean_variance_normalize( &self, axes: &[isize], eps: f64, ) -> Result<Tensor>

Mean Variance Normalization.

Subtracts the mean and divides by the population standard deviation (plus eps) over the given axes. Implements the ONNX MeanVarianceNormalization operator.

§Examples

let x = Tensor::from_ndarray(&array![[1.0f32, 2.0, 3.0], [4.0, 5.0, 6.0]]);
let mut y = x.mean_variance_normalize(&[0, 1], 1e-5).unwrap();
y.realize().unwrap();
let vals = y.as_vec::<f32>().unwrap();
// Global mean = 3.5, std ≈ 1.708
assert!((vals[0] - (1.0 - 3.5) / (35.0f32 / 12.0).sqrt()).abs() < 1e-4);
assert!(vals[0] < 0.0);
assert!(vals[5] > 0.0);

Source

pub fn group_norm<'f1, 'f2, 'f3>( &'f1 self, ) -> TensorGroupNormBuilder<'f1, 'f2, 'f3>

Group normalization: reshape into groups, layernorm each group, then apply per-channel scale and bias.

Input must be at least 2-D with shape [N, C, ...]. Channels are split into num_groups groups and each group is independently normalized. Casts to f32 internally for numerical stability.

§Examples

let x = Tensor::from_ndarray(&Array4::from_elem((1, 4, 2, 2), 1.0f32));
let scale = Tensor::from_slice([1.0f32; 4]);
let bias = Tensor::from_slice([0.0f32; 4]);
let y = x.group_norm().scale(&scale).bias(&bias).num_groups(2).call().unwrap();
let shape: Vec<_> = y.shape().unwrap().iter().map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, [1, 4, 2, 2]);

Custom epsilon:

let x = Tensor::from_ndarray(&Array4::from_elem((1, 4, 2, 2), 1.0f32));
let scale = Tensor::from_slice([1.0f32; 4]);
let bias = Tensor::from_slice([0.0f32; 4]);
let y = x.group_norm().scale(&scale).bias(&bias).num_groups(2).eps(1e-6).call().unwrap();
let shape: Vec<_> = y.shape().unwrap().iter().map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, [1, 4, 2, 2]);

Source §

impl Tensor

Source

pub fn try_pad_value( &self, padding: &[(isize, isize)], value: f64, ) -> Result<Tensor>

Pad with a custom fill value. Delegates to try_pad when value == 0.0.

Each element of padding is (before, after) for the corresponding dimension. Non-zero fill is implemented via an additive mask to avoid nested WHERE conditions.

§Examples

Zero padding (delegates to try_pad):

let x = Tensor::from_slice([1.0f32, 2.0, 3.0]);
let mut y = x.try_pad_value(&[(1, 1)], 0.0).unwrap();
y.realize().unwrap();
assert_eq!(y.as_vec::<f32>().unwrap(), vec![0.0, 1.0, 2.0, 3.0, 0.0]);

Negative-infinity padding (useful for max pooling):

let x = Tensor::from_slice([1.0f32, 2.0, 3.0]);
let mut y = x.try_pad_value(&[(1, 0)], f64::NEG_INFINITY).unwrap();
y.realize().unwrap();
assert_eq!(y.as_vec::<f32>().unwrap(), vec![f32::NEG_INFINITY, 1.0, 2.0, 3.0]);

Source

pub fn pad_with<'f1, 'f2>(&'f1 self) -> TensorPadWithBuilder<'f1, 'f2>

Pad with configurable mode and fill value.

Supports four padding modes via PadMode:

Constant (default): fill with value (default 0.0)
Replicate: repeat boundary values
Reflect: mirror without repeating boundary
Circular: wrap around

§Examples

Constant padding (default mode):

let x = Tensor::from_slice([1.0f32, 2.0, 3.0]);
let mut y = x.pad_with().padding(&[(1, 1)]).call().unwrap();
y.realize().unwrap();
assert_eq!(y.as_vec::<f32>().unwrap(), vec![0.0, 1.0, 2.0, 3.0, 0.0]);

Constant padding with a custom fill value:

let x = Tensor::from_slice([1.0f32, 2.0, 3.0]);
let mut y = x.pad_with().padding(&[(1, 1)]).value(-f64::INFINITY).call().unwrap();
y.realize().unwrap();
assert_eq!(y.as_vec::<f32>().unwrap(), vec![f32::NEG_INFINITY, 1.0, 2.0, 3.0, f32::NEG_INFINITY]);

Replicate (edge) padding:

let x = Tensor::from_slice([1.0f32, 2.0, 3.0]);
let mut y = x.pad_with().padding(&[(2, 2)]).mode(PadMode::Replicate).call().unwrap();
y.realize().unwrap();
assert_eq!(y.as_vec::<f32>().unwrap(), vec![1.0, 1.0, 1.0, 2.0, 3.0, 3.0, 3.0]);

Reflect padding:

let x = Tensor::from_slice([1.0f32, 2.0, 3.0]);
let mut y = x.pad_with().padding(&[(2, 2)]).mode(PadMode::Reflect).call().unwrap();
y.realize().unwrap();
assert_eq!(y.as_vec::<f32>().unwrap(), vec![3.0, 2.0, 1.0, 2.0, 3.0, 2.0, 1.0]);

Circular (wrap) padding:

let x = Tensor::from_slice([1.0f32, 2.0, 3.0]);
let mut y = x.pad_with().padding(&[(2, 2)]).mode(PadMode::Circular).call().unwrap();
y.realize().unwrap();
assert_eq!(y.as_vec::<f32>().unwrap(), vec![2.0, 3.0, 1.0, 2.0, 3.0, 1.0, 2.0]);

Source §

impl Tensor

Source

pub fn pool( &self, kernel: &[usize], stride: &[usize], dilation: &[usize], ) -> Result<Tensor>

Sliding window extraction via shape manipulation (Tinygrad’s _pool).

Input: (..., *spatial) → Output: (..., *out_spatial, *kernel).

This is a low-level building block for pooling and convolution. It extracts all sliding windows of the given kernel size, stride, and dilation from the spatial dimensions, appending the kernel dimensions at the end.

Source §

impl Tensor

Source

pub fn avg_pool2d<'f1, 'f2, 'f3, 'f4, 'f5>( &'f1 self, ) -> TensorAvgPool2dBuilder<'f1, 'f2, 'f3, 'f4, 'f5>

Average pooling over spatial dimensions.

Computes the mean of each sliding window. Supports padding, dilation, count_include_pad (whether padded zeros count in the denominator), and ceil_mode (round output size up instead of down).

Stride defaults to kernel_size when not specified.

§Examples

Basic 2x2 average pooling:

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 4, 4), 1.0f32));
let mut y = x.avg_pool2d().kernel_size(&[2, 2]).call().unwrap();
y.realize().unwrap();
let shape: Vec<_> = y.shape().unwrap().iter().map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, vec![1, 1, 2, 2]);
assert_eq!(y.as_vec::<f32>().unwrap(), vec![1.0; 4]);

With explicit stride:

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 4, 4), 1.0f32));
let y = x.avg_pool2d().kernel_size(&[2, 2]).stride(&[1, 1]).call().unwrap();
let shape: Vec<_> = y.shape().unwrap().iter().map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, vec![1, 1, 3, 3]);

With padding and count_include_pad disabled:

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 2, 2), 1.0f32));
let mut y = x.avg_pool2d()
    .kernel_size(&[2, 2])
    .stride(&[1, 1])
    .padding(&[(1, 1), (1, 1)])
    .count_include_pad(false)
    .call()
    .unwrap();
y.realize().unwrap();
// With count_include_pad=false, only non-padded elements count in the average
assert_eq!(y.as_vec::<f32>().unwrap(), vec![1.0; 9]);

Source

pub fn max_pool2d<'f1, 'f2, 'f3, 'f4, 'f5>( &'f1 self, ) -> TensorMaxPool2dBuilder<'f1, 'f2, 'f3, 'f4, 'f5>

Max pooling over spatial dimensions.

Returns the maximum value in each sliding window. Padded positions are filled with -inf (float) or i64::MIN (integer) so they never win.

Stride defaults to kernel_size when not specified.

§Examples

Basic 2x2 max pooling:

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 4, 4), 1.0f32));
let mut y = x.max_pool2d().kernel_size(&[2, 2]).call().unwrap();
y.realize().unwrap();
let shape: Vec<_> = y.shape().unwrap().iter().map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, vec![1, 1, 2, 2]);
assert_eq!(y.as_vec::<f32>().unwrap(), vec![1.0; 4]);

With stride and padding:

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 4, 4), 1.0f32));
let mut y = x.max_pool2d()
    .kernel_size(&[3, 3])
    .stride(&[1, 1])
    .padding(&[(1, 1), (1, 1)])
    .call()
    .unwrap();
y.realize().unwrap();
let shape: Vec<_> = y.shape().unwrap().iter().map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, vec![1, 1, 4, 4]);
assert_eq!(y.as_vec::<f32>().unwrap(), vec![1.0; 16]);

Source

pub fn max_pool2d_with_indices<'f1, 'f2, 'f3, 'f4, 'f5>( &'f1 self, ) -> TensorMaxPool2dWithIndicesBuilder<'f1, 'f2, 'f3, 'f4, 'f5>

Max pooling returning both values and flat indices.

Returns (values, indices) where indices are flat offsets into the input spatial dimensions. Indices can be passed to max_unpool2d to invert the operation.

Uses a reverse-arange trick (from Tinygrad) to compute first-occurrence indices without explicit argmax.

§Examples

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 4, 4), 1.0f32));
let (mut values, indices) = x.max_pool2d_with_indices()
    .kernel_size(&[2, 2])
    .call()
    .unwrap();
let _ = indices;
values.realize().unwrap();
let shape: Vec<_> = values.shape().unwrap().iter().map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, vec![1, 1, 2, 2]);
assert_eq!(values.as_vec::<f32>().unwrap(), vec![1.0; 4]);

Source

pub fn max_unpool2d<'f1, 'f2, 'f3, 'f4, 'f5, 'f6>( &'f1 self, ) -> TensorMaxUnpool2dBuilder<'f1, 'f2, 'f3, 'f4, 'f5, 'f6>

Inverse of max pooling: scatter pooled values back to their original positions.

Indices are flat offsets into the inferred output spatial shape (computed from kernel/stride/padding). When output_size exceeds the inferred shape, the result is zero-padded to match.

Uses one-hot encoding of indices to scatter values: one_hot(idx) * vals -> sum.

§Examples

Round-trip with max_pool2d_with_indices:

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 4, 4), 1.0f32));
let (values, indices) = x.max_pool2d_with_indices()
    .kernel_size(&[2, 2])
    .call()
    .unwrap();
let unpooled = values.max_unpool2d()
    .indices(&indices)
    .kernel_size(&[2, 2])
    .call()
    .unwrap();
let shape: Vec<_> = unpooled.shape().unwrap().iter()
    .map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, vec![1, 1, 4, 4]);

Source

pub fn col2im<'f1, 'f2, 'f3, 'f4, 'f5, 'f6>( &'f1 self, ) -> TensorCol2imBuilder<'f1, 'f2, 'f3, 'f4, 'f5, 'f6>

Col2Im: adjoint of im2col. Reconstructs an image from columns, summing overlaps.

Input shape: [N, C * prod(block_shape), L] where L is the number of sliding positions. Output shape: [N, C, *image_shape].

Uses the adjoint of pool: for each kernel position, stride-dilate the column data, pad to the correct offset, and accumulate. O(output_size) memory, O(bl * output_size) compute – no large one-hot intermediates.

§Examples

Reconstruct a 4x4 image from 2x2 blocks with no overlap:

// 1 batch, 1 channel, 2x2 block = 4 cols, 4 sliding positions
let cols = Tensor::from_ndarray(&Array3::from_elem((1, 4, 4), 1.0f32));
let mut img = cols.col2im()
    .image_shape(&[4, 4])
    .block_shape(&[2, 2])
    .strides(&[2, 2])
    .call()
    .unwrap();
img.realize().unwrap();
let shape: Vec<_> = img.shape().unwrap().iter()
    .map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, vec![1, 1, 4, 4]);
// Non-overlapping blocks of ones reconstruct to all ones
assert_eq!(img.as_vec::<f32>().unwrap(), vec![1.0; 16]);

Source §

impl Tensor

Source

pub fn clamp_cast(&self, dtype: DType) -> Result<Self>

Clamp to the representable range of dtype, then cast.

Values outside the target type’s range are saturated to its min/max before casting, preventing overflow wrap-around.

§Examples

let x = Tensor::from_slice([300.0f32, -10.0, 128.0]);
let mut y = x.clamp_cast(DType::UInt8).unwrap();
y.realize().unwrap();
let vals = y.as_vec::<u8>().unwrap();
assert_eq!(vals, vec![255, 0, 128]);

Source

pub fn qlinear_conv<'f1, 'f2, 'f3, 'f4, 'f5, 'f6, 'f7, 'f8, 'f9, 'f10, 'f11, 'f12, 'f13>( &'f1 self, ) -> TensorQlinearConvBuilder<'f1, 'f2, 'f3, 'f4, 'f5, 'f6, 'f7, 'f8, 'f9, 'f10, 'f11, 'f12, 'f13>

Quantized convolution: zero-point–adjust inputs, convolve in int32, rescale and requantize to the output dtype.

Implements the ONNX QLinearConv operator. The flow is:

Subtract zero points from input and weights
Perform integer convolution
Rescale by (x_scale * w_scale) / y_scale and add y_zero_point

§Examples

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 3, 3), 128u8));
let x_scale = Tensor::from_slice([0.1f32]);
let x_zp = Tensor::from_slice([128u8]);
let weight = Tensor::from_ndarray(&Array4::from_elem((1, 1, 1, 1), 128u8));
let w_scale = Tensor::from_slice([0.1f32]);
let w_zp = Tensor::from_slice([128u8]);
let y_scale = Tensor::from_slice([0.1f32]);
let y_zp = Tensor::from_slice([128u8]);
let y = x.qlinear_conv()
    .x_scale(&x_scale).x_zero_point(&x_zp)
    .weight(&weight).w_scale(&w_scale).w_zero_point(&w_zp)
    .y_scale(&y_scale).y_zero_point(&y_zp)
    .call()
    .unwrap();
let shape: Vec<usize> = y.shape().unwrap().iter()
    .map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, vec![1, 1, 3, 3]);

Source

pub fn conv_integer<'f1, 'f2, 'f3, 'f4, 'f5, 'f6, 'f7, 'f8, 'f9>( &'f1 self, ) -> TensorConvIntegerBuilder<'f1, 'f2, 'f3, 'f4, 'f5, 'f6, 'f7, 'f8, 'f9>

Integer convolution: zero-point–adjust inputs and convolve in int32. No rescaling — returns raw int32 result.

Implements the ONNX ConvInteger operator. Subtracts optional zero points from input and weights, then convolves in int32. Unlike qlinear_conv, no output rescaling is applied.

§Examples

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 3, 3), 10u8));
let weight = Tensor::from_ndarray(&Array4::from_elem((1, 1, 1, 1), 1u8));
let y = x.conv_integer().weight(&weight).call().unwrap();
let shape: Vec<usize> = y.shape().unwrap().iter()
    .map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, vec![1, 1, 3, 3]);

Source

pub fn qlinear_matmul<'f1, 'f2, 'f3, 'f4, 'f5, 'f6, 'f7, 'f8>( &'f1 self, ) -> TensorQlinearMatmulBuilder<'f1, 'f2, 'f3, 'f4, 'f5, 'f6, 'f7, 'f8>

Quantized matrix multiplication: zero-point–adjust inputs, matmul in int32, rescale and requantize to the output dtype.

Implements the ONNX QLinearMatMul operator. The flow is:

Subtract zero points from both inputs
Perform integer matrix multiplication
Rescale by (a_scale * b_scale) / y_scale and add y_zero_point

§Examples

let a = Tensor::from_ndarray(&Array2::from_elem((2, 3), 128u8));
let a_scale = Tensor::from_slice([0.1f32]);
let a_zp = Tensor::from_slice([128u8]);
let b = Tensor::from_ndarray(&Array2::from_elem((3, 4), 128u8));
let b_scale = Tensor::from_slice([0.1f32]);
let b_zp = Tensor::from_slice([128u8]);
let y_scale = Tensor::from_slice([0.1f32]);
let y_zp = Tensor::from_slice([128u8]);
let y = a.qlinear_matmul()
    .a_scale(&a_scale).a_zero_point(&a_zp)
    .b(&b).b_scale(&b_scale).b_zero_point(&b_zp)
    .y_scale(&y_scale).y_zero_point(&y_zp)
    .call()
    .unwrap();
let shape: Vec<usize> = y.shape().unwrap().iter()
    .map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, vec![2, 4]);

Source §

impl Tensor

Source

pub fn resize<'f1, 'f2, 'f3, 'f4, 'f5>( &'f1 self, ) -> TensorResizeBuilder<'f1, 'f2, 'f3, 'f4, 'f5>

Resize a tensor using interpolation (ONNX Resize operator).

Supports nearest, linear, and cubic interpolation modes with various coordinate transformation modes. Either scales or sizes must be provided to specify the target dimensions.

§Examples

Nearest-mode 2x upscale via scales:

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 2, 2), 1.0f32));
let mut y = x.resize().scales(&[1.0, 1.0, 2.0, 2.0]).call().unwrap();
y.realize().unwrap();
let shape: Vec<usize> = y.shape().unwrap().iter().map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, vec![1, 1, 4, 4]);
assert!(y.as_vec::<f32>().unwrap().iter().all(|&v| (v - 1.0).abs() < 1e-5));

Resize to explicit output sizes:

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 2, 2), 1.0f32));
let mut y = x.resize().sizes(&[1, 1, 6, 6]).call().unwrap();
y.realize().unwrap();
let shape: Vec<usize> = y.shape().unwrap().iter().map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, vec![1, 1, 6, 6]);
assert!(y.as_vec::<f32>().unwrap().iter().all(|&v| (v - 1.0).abs() < 1e-5));

Linear interpolation mode:

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 2, 2), 1.0f32));
let mut y = x.resize()
    .scales(&[1.0, 1.0, 2.0, 2.0])
    .mode(ResizeMode::Linear)
    .call()
    .unwrap();
y.realize().unwrap();
let shape: Vec<usize> = y.shape().unwrap().iter().map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, vec![1, 1, 4, 4]);
assert!(y.as_vec::<f32>().unwrap().iter().all(|&v| (v - 1.0).abs() < 1e-5));

Source §

impl Tensor

Source

pub fn rnn<'f1, 'f2, 'f3, 'f4, 'f5>( &'f1 self, ) -> TensorRnnBuilder<'f1, 'f2, 'f3, 'f4, 'f5>

Simple RNN (Elman network).

H_t = tanh(X_t @ W^T + H_{t-1} @ R^T + Wb + Rb)

x: input [seq_length, batch_size, input_size] (layout=0) or [batch_size, seq_length, input_size] (layout=1)
w: input weights [num_directions, hidden_size, input_size]
r: recurrence weights [num_directions, hidden_size, hidden_size]
bias: optional bias [num_directions, 2 * hidden_size] (Wb ++ Rb)
initial_h: optional initial hidden state [num_directions, batch_size, hidden_size]
layout: 0 = seq-first (default), 1 = batch-first

§Examples

// seq=2, batch=1, input=3
let x = Tensor::from_ndarray(&Array3::from_elem((2, 1, 3), 0.1f32));
let w = Tensor::from_ndarray(&Array3::from_elem((1, 4, 3), 0.1f32)); // [1, hidden=4, input=3]
let r = Tensor::from_ndarray(&Array3::from_elem((1, 4, 4), 0.1f32)); // [1, hidden=4, hidden=4]
let out = x.rnn().w(&w).r(&r).hidden_size(4).call().unwrap();
let y_shape: Vec<usize> = out.y.shape().unwrap().iter()
    .map(|d| d.as_const().unwrap()).collect();
assert_eq!(y_shape, vec![2, 1, 1, 4]); // [seq, num_directions, batch, hidden]
let yh_shape: Vec<usize> = out.y_h.shape().unwrap().iter()
    .map(|d| d.as_const().unwrap()).collect();
assert_eq!(yh_shape, vec![1, 1, 4]); // [num_directions, batch, hidden]

Source

pub fn gru<'f1, 'f2, 'f3, 'f4, 'f5>( &'f1 self, ) -> TensorGruBuilder<'f1, 'f2, 'f3, 'f4, 'f5>

GRU (Gated Recurrent Unit).

Gate order: [z, r, h] (update, reset, hidden).

Equations (default, linear_before_reset=0):

z = sigmoid(X @ W_z^T + H @ R_z^T + w_bz + r_bz)
r = sigmoid(X @ W_r^T + H @ R_r^T + w_br + r_br)
h = tanh(X @ W_h^T + (r * H) @ R_h^T + w_bh + r_bh)
H_new = (1 - z) * h + z * H_prev

When linear_before_reset=1:

h = tanh(X @ W_h^T + r * (H @ R_h^T + r_bh) + w_bh)
x: input [seq_length, batch_size, input_size] (layout=0) or [batch_size, seq_length, input_size] (layout=1)
w: input weights [num_directions, 3*hidden_size, input_size]
r_weights: recurrence weights [num_directions, 3*hidden_size, hidden_size]
bias: optional [num_directions, 6*hidden_size] (Wb ++ Rb)
initial_h: optional [num_directions, batch_size, hidden_size]
linear_before_reset: 0 (default) or 1
layout: 0 = seq-first (default), 1 = batch-first

§Examples

// seq=2, batch=1, input=3, hidden=4
let x = Tensor::from_ndarray(&Array3::from_elem((2, 1, 3), 0.1f32));
// GRU: w is [num_directions, 3*hidden_size, input_size]
let w = Tensor::from_ndarray(&Array3::from_elem((1, 12, 3), 0.1f32));
// GRU: r is [num_directions, 3*hidden_size, hidden_size]
let r = Tensor::from_ndarray(&Array3::from_elem((1, 12, 4), 0.1f32));
let out = x.gru().w(&w).r_weights(&r).hidden_size(4).call().unwrap();
let y_shape: Vec<usize> = out.y.shape().unwrap().iter()
    .map(|d| d.as_const().unwrap()).collect();
assert_eq!(y_shape, vec![2, 1, 1, 4]); // [seq, num_directions, batch, hidden]

Source

pub fn lstm<'f1, 'f2, 'f3, 'f4, 'f5, 'f6, 'f7>( &'f1 self, ) -> TensorLstmBuilder<'f1, 'f2, 'f3, 'f4, 'f5, 'f6, 'f7>

LSTM (Long Short-Term Memory).

Gate order: [i, o, f, c] (input, output, forget, cell).

x: input [seq_length, batch_size, input_size] (layout=0) or [batch_size, seq_length, input_size] (layout=1)
w: input weights [num_directions, 4*hidden_size, input_size]
r: recurrence weights [num_directions, 4*hidden_size, hidden_size]
bias: optional [num_directions, 8*hidden_size] (Wb ++ Rb)
initial_h: optional [num_directions, batch_size, hidden_size]
initial_c: optional [num_directions, batch_size, hidden_size]
peepholes: optional [num_directions, 3*hidden_size] (p_i, p_o, p_f)
layout: 0 = seq-first (default), 1 = batch-first

§Examples

// seq=2, batch=1, input=3, hidden=4
let x = Tensor::from_ndarray(&Array3::from_elem((2, 1, 3), 0.1f32));
// LSTM: w is [num_directions, 4*hidden_size, input_size]
let w = Tensor::from_ndarray(&Array3::from_elem((1, 16, 3), 0.1f32));
// LSTM: r is [num_directions, 4*hidden_size, hidden_size]
let r = Tensor::from_ndarray(&Array3::from_elem((1, 16, 4), 0.1f32));
let out = x.lstm().w(&w).r(&r).hidden_size(4).call().unwrap();
let y_shape: Vec<usize> = out.y.shape().unwrap().iter()
    .map(|d| d.as_const().unwrap()).collect();
assert_eq!(y_shape, vec![2, 1, 1, 4]); // [seq, num_directions, batch, hidden]
let yc_shape: Vec<usize> = out.y_c.shape().unwrap().iter()
    .map(|d| d.as_const().unwrap()).collect();
assert_eq!(yc_shape, vec![1, 1, 4]); // [num_directions, batch, hidden]

Source §

impl Tensor

Source

pub fn space_to_depth(&self, blocksize: usize) -> Result<Tensor>

Rearrange spatial data into depth (inverse of depth_to_space).

Reshapes a [N, C, H, W] tensor to [N, C*b*b, H/b, W/b] where b is the blocksize. Both H and W must be divisible by blocksize.

§Examples

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 4, 4), 1.0f32));
let mut y = x.space_to_depth(2).unwrap();
y.realize().unwrap();
let shape: Vec<_> = y.shape().unwrap().iter().map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, [1, 4, 2, 2]);
assert_eq!(y.as_vec::<f32>().unwrap(), vec![1.0; 16]);

Source

pub fn nll_loss<'f1, 'f2, 'f3>(&'f1 self) -> TensorNllLossBuilder<'f1, 'f2, 'f3>

Negative log-likelihood loss.

self is [N, C, ...] log-probabilities, target is [N, ...] class indices (dtype i64). Gathers the log-prob at the target class and negates it.

Supports optional per-class weight, ignore_index to mask out a class, and reduction (default Mean).

§Examples

let logprobs = Tensor::from_ndarray(&array![[-0.5f32, -1.0, -2.0]]);
let target = Tensor::from_slice([0i64]);
let mut loss = logprobs.nll_loss().target(&target).call().unwrap();
loss.realize().unwrap();
let val = loss.as_vec::<f32>().unwrap();
// -(-0.5) = 0.5
assert!((val[0] - 0.5).abs() < 1e-5);

With sum reduction:

let logprobs = Tensor::from_ndarray(&array![[-0.5f32, -1.0], [-2.0, -0.3]]);
let target = Tensor::from_slice([0i64, 1]);
let mut loss = logprobs.nll_loss().target(&target).reduction(Reduction::Sum).call().unwrap();
loss.realize().unwrap();
let val = loss.as_vec::<f32>().unwrap();
// sum of 0.5 + 0.3 = 0.8
assert!((val[0] - 0.8).abs() < 1e-5);

Source

pub fn dropout<'f1>(&'f1 self) -> TensorDropoutBuilder<'f1>

Dropout: randomly zeros elements during training, passes through in inference.

Returns (output, mask) where mask is a boolean tensor (true = kept). In inference mode (training=false, the default), the output is identical to the input and the mask is all-true.

Note: Training mode is not yet implemented (requires RNG); currently returns identity regardless of training.

§Examples

let x = Tensor::from_ndarray(&array![1.0f32, 2.0, 3.0]);
let (mut out, mut mask) = x.dropout().p(0.5).call().unwrap();
out.realize().unwrap();
mask.realize().unwrap();
// Default is inference mode: output == input
assert_eq!(out.as_vec::<f32>().unwrap(), vec![1.0, 2.0, 3.0]);
assert_eq!(mask.as_vec::<bool>().unwrap(), vec![true, true, true]);

Source

pub fn conv<'f1, 'f2, 'f3, 'f4, 'f5, 'f6, 'f7>( &'f1 self, ) -> TensorConvBuilder<'f1, 'f2, 'f3, 'f4, 'f5, 'f6, 'f7>

Convolution with ONNX-style parameters.

Wraps the lower-level conv2d after resolving ONNX padding conventions (auto_pad, flat pads). Input shape is [N, C, H, W, ...] and weight shape is [out_channels, in_channels/group, kH, kW, ...].

§Examples

Basic convolution with no padding:

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 5, 5), 1.0f32));
let w = Tensor::from_ndarray(&Array4::from_elem((1, 1, 3, 3), 1.0f32));
let mut y = x.conv().weight(&w).call().unwrap();
y.realize().unwrap();
let shape: Vec<_> = y.shape().unwrap().iter().map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, [1, 1, 3, 3]);
// Each output element sums a 3x3 window of ones = 9.0
assert_eq!(y.as_vec::<f32>().unwrap(), vec![9.0; 9]);

With explicit padding and strides:

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 5, 5), 1.0f32));
let w = Tensor::from_ndarray(&Array4::from_elem((1, 1, 3, 3), 1.0f32));
let mut y = x.conv().weight(&w).pads(&[1, 1, 1, 1]).strides(&[2, 2]).call().unwrap();
y.realize().unwrap();
let shape: Vec<_> = y.shape().unwrap().iter().map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, [1, 1, 3, 3]);
assert_eq!(y.as_vec::<f32>().unwrap(), vec![4.0, 6.0, 4.0, 6.0, 9.0, 6.0, 4.0, 6.0, 4.0]);

Source

pub fn conv_transpose<'f1, 'f2, 'f3, 'f4, 'f5, 'f6, 'f7, 'f8, 'f9>( &'f1 self, ) -> TensorConvTransposeBuilder<'f1, 'f2, 'f3, 'f4, 'f5, 'f6, 'f7, 'f8, 'f9>

Transposed convolution with ONNX-style parameters.

Wraps conv_transpose2d after resolving ONNX padding conventions. Supports output_shape and output_padding for precise output size control.

§Examples

Basic transposed convolution (upsampling):

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 2, 2), 1.0f32));
let w = Tensor::from_ndarray(&Array4::from_elem((1, 1, 3, 3), 1.0f32));
let mut y = x.conv_transpose().weight(&w).call().unwrap();
y.realize().unwrap();
let vals = y.as_vec::<f32>().unwrap();
assert_eq!(vals.len(), 16); // 4x4 output
assert_eq!(vals[5], 4.0); // center sees full overlap

With stride (larger upsampling factor):

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 2, 2), 1.0f32));
let w = Tensor::from_ndarray(&Array4::from_elem((1, 1, 3, 3), 1.0f32));
let mut y = x.conv_transpose().weight(&w).strides(&[2, 2]).call().unwrap();
y.realize().unwrap();
let vals = y.as_vec::<f32>().unwrap();
assert_eq!(vals.len(), 25); // 5x5 output

Source

pub fn avg_pool<'f1, 'f2, 'f3, 'f4, 'f5>( &'f1 self, ) -> TensorAvgPoolBuilder<'f1, 'f2, 'f3, 'f4, 'f5>

Average pooling with ONNX-style parameters.

Wraps avg_pool2d after resolving ONNX padding and stride conventions. Stride defaults to 1 (unlike avg_pool2d which defaults to kernel_size). Input shape is [N, C, H, W].

§Examples

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 4, 4), 1.0f32));
let mut y = x.avg_pool().kernel_shape(&[2, 2]).call().unwrap();
y.realize().unwrap();
let shape: Vec<_> = y.shape().unwrap().iter().map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, [1, 1, 3, 3]);
// Average of all-ones windows is 1.0
assert!(y.as_vec::<f32>().unwrap().iter().all(|&v| (v - 1.0).abs() < 1e-6));

With strides:

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 4, 4), 1.0f32));
let mut y = x.avg_pool().kernel_shape(&[2, 2]).strides(&[2, 2]).call().unwrap();
y.realize().unwrap();
let shape: Vec<_> = y.shape().unwrap().iter().map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, [1, 1, 2, 2]);
assert_eq!(y.as_vec::<f32>().unwrap(), vec![1.0; 4]);

Source

pub fn lp_pool<'f1, 'f2, 'f3, 'f4, 'f5>( &'f1 self, ) -> TensorLpPoolBuilder<'f1, 'f2, 'f3, 'f4, 'f5>

Lp norm pooling with ONNX-style parameters.

Computes (sum(|x|^p))^(1/p) over each pooling window. Defaults to p=2 (L2 pooling). Input shape is [N, C, H, W].

§Examples

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 4, 4), 1.0f32));
let mut y = x.lp_pool().kernel_shape(&[2, 2]).call().unwrap();
y.realize().unwrap();
let shape: Vec<_> = y.shape().unwrap().iter().map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, [1, 1, 3, 3]);
// L2 pool of 2x2 window of ones = sqrt(4) = 2.0
assert!((y.as_vec::<f32>().unwrap()[0] - 2.0).abs() < 1e-5);

Source

pub fn depth_to_space<'f1>(&'f1 self) -> TensorDepthToSpaceBuilder<'f1>

Rearrange depth data into spatial blocks (inverse of space_to_depth).

Equivalent to PyTorch’s F.pixel_shuffle. Reshapes a [N, C, H, W] tensor to [N, C/(b*b), H*b, W*b] where b is the blocksize.

§Examples

let x = Tensor::from_ndarray(&Array4::from_elem((1, 4, 1, 1), 1.0f32));
let mut y = x.depth_to_space().blocksize(2).call().unwrap();
y.realize().unwrap();
let shape: Vec<_> = y.shape().unwrap().iter().map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, [1, 1, 2, 2]);
assert_eq!(y.as_vec::<f32>().unwrap(), vec![1.0; 4]);

Using CRD mode (PyTorch pixel_shuffle order):

let x = Tensor::from_ndarray(&Array4::from_elem((1, 4, 1, 1), 1.0f32));
let mut y = x.depth_to_space().blocksize(2).mode(DepthToSpaceMode::Crd).call().unwrap();
y.realize().unwrap();
assert_eq!(y.as_vec::<f32>().unwrap(), vec![1.0; 4]);

Source

pub fn max_pool<'f1, 'f2, 'f3, 'f4, 'f5>( &'f1 self, ) -> TensorMaxPoolBuilder<'f1, 'f2, 'f3, 'f4, 'f5>

Max pooling with ONNX-style parameters.

Always returns (values, indices) where indices are flattened positions (dtype i64). Wraps max_pool2d_with_indices after resolving ONNX padding conventions.

§Examples

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 4, 4), 1.0f32));
let (vals, indices) = x.max_pool().kernel_shape(&[2, 2]).call().unwrap();
let shape: Vec<_> = vals.shape().unwrap().iter().map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, [1, 1, 3, 3]);

With strides:

let x = Tensor::from_ndarray(&Array4::from_elem((1, 1, 4, 4), 1.0f32));
let (vals, _) = x.max_pool().kernel_shape(&[2, 2]).strides(&[2, 2]).call().unwrap();
let shape: Vec<_> = vals.shape().unwrap().iter().map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, [1, 1, 2, 2]);

Source

pub fn lrn<'f1>(&'f1 self) -> TensorLrnBuilder<'f1>

Local Response Normalization (LRN).

Normalizes each element by dividing by a scaled sum of squares over a local neighborhood of size channels: y = x / (bias + alpha * avg_pool(x^2, size))^beta.

Input must be 4-D [N, C, H, W].

§Examples

let x = Tensor::from_ndarray(&Array4::from_elem((1, 3, 2, 2), 1.0f32));
let y = x.lrn().size(3).call().unwrap();
let shape: Vec<_> = y.shape().unwrap().iter().map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, [1, 3, 2, 2]);

Custom alpha, beta, and bias:

let x = Tensor::from_ndarray(&Array4::from_elem((1, 3, 2, 2), 1.0f32));
let y = x.lrn().size(3).alpha(0.001).beta(0.5).bias(2.0).call().unwrap();
let shape: Vec<_> = y.shape().unwrap().iter().map(|d| d.as_const().unwrap()).collect();
assert_eq!(shape, [1, 3, 2, 2]);

Source §

impl Tensor

Source

pub fn sequential(&self, layers: &[&dyn Layer]) -> Result<Tensor>

Apply a sequence of layers to this tensor.

Source §

impl Tensor

Source

pub fn uniform(shape: &[usize], low: f64, high: f64) -> Result<Tensor>

Uniform [low, high) random tensor, float32, on the default (CPU) device.

Convenience wrapper around Tensor::uniform_with_dtype with f32 output.

Source

pub fn uniform_with_dtype( shape: &[usize], low: f64, high: f64, dtype: DType, ) -> Result<Tensor>

Uniform [low, high) random tensor with explicit float dtype.

Generates a [0, 1) sample at f32, scales by (high - low), casts to the target dtype, then adds low. Casting before the offset keeps the addition honest in low-precision targets (f16/bf16) where low might otherwise be lost to rounding if applied at f32.

Source

pub fn randn(shape: &[usize]) -> Result<Tensor>

Standard normal N(0, 1) random tensor (float32, Box-Muller).

Each output element draws from two [0, 1) uniforms via one combined rand([2, *shape]) call, so the RNG counter advances exactly once per randn invocation regardless of shape.

Source

pub fn normal(shape: &[usize], mean: f64, std: f64) -> Result<Tensor>

Normal N(mean, std) random tensor. Requires std >= 0.

Source

pub fn randint(shape: &[usize], low: i64, high: i64) -> Result<Tensor>

Uniform integer tensor [low, high), dtype int32. Requires low < high.

Truncates (high - low) · rand to int32 before adding low. Casting after the add would truncate-toward-zero asymmetrically for negative low (e.g. low=-3, rand≈0.005 would yield -2 instead of the correct -3).

Source

pub fn scaled_uniform(shape: &[usize]) -> Result<Tensor>

uniform(-1, 1) · prod(shape)^(-½). Same dtype contract as uniform.

Source

pub fn glorot_uniform(shape: &[usize]) -> Result<Tensor>

Glorot/Xavier uniform initializer, float32 output.

Source

pub fn glorot_uniform_with_dtype( shape: &[usize], dtype: DType, ) -> Result<Tensor>

Glorot/Xavier uniform initializer with explicit dtype. bound = √(6 / (shape[0] + prod(shape[1..]))); uniform(-bound, bound).

Source

pub fn kaiming_uniform(shape: &[usize], a: f64) -> Result<Tensor>

Kaiming/He uniform initializer for ReLU-family activations, float32 output.

Source

pub fn kaiming_uniform_with_dtype( shape: &[usize], a: f64, dtype: DType, ) -> Result<Tensor>

Kaiming/He uniform initializer with explicit dtype.

bound = √(6 / ((1 + a²) · prod(shape[1..]))); uniform(-bound, bound).

a is the negative slope of the activation:

0.0 — plain ReLU (PyTorch default).
0.01 — leaky-ReLU with default slope.

Source

pub fn kaiming_normal(shape: &[usize], a: f64) -> Result<Tensor>

Kaiming/He normal initializer for ReLU-family activations. std = √(2 / ((1 + a²) · prod(shape[1..]))); randn · std.

Source §

impl Tensor

Source

pub fn rand_like_with_dtype(&self, dtype: DType) -> Result<Tensor>

rand_like with a dtype override (device and shape still inherited).

Source

pub fn rand_like(&self) -> Result<Tensor>

Uniform [0, 1) random tensor with the same shape/dtype/device as self.

Source

pub fn randn_like_with_dtype(&self, dtype: DType) -> Result<Tensor>

randn_like with a dtype override.

Internally generates f32 samples via Box-Muller, then casts to the target dtype. Using f32 inside Box-Muller keeps cos/log/sqrt accurate even when the caller wants low-precision output.

Source

pub fn randn_like(&self) -> Result<Tensor>

Standard normal N(0, 1) random tensor with the same shape/dtype/device as self.

Source

pub fn randint_like(&self, low: i64, high: i64) -> Result<Tensor>

Uniform integer [low, high) random tensor with the same shape/dtype/device as self.

The underlying Tensor::randint returns Int32; if self’s dtype differs the result is cast to match (e.g. Int64 template → Int64 result). Requires low < high.

Source §

impl Tensor

Source

pub fn rand(shape: &[usize]) -> Result<Tensor>

Uniform [0, 1) random tensor with float32 dtype on the default CPU device.

THREEFRY-backed; deterministic for a fixed seed (set via crate::rand::manual_seed).

Source

pub fn rand_with( shape: &[usize], dtype: DType, device: DeviceSpec, ) -> Result<Tensor>

Variant of Tensor::rand with explicit dtype and device.

Supported dtypes: Float16, BFloat16, Float32, Float64. Integer dtypes are not supported here — use Tensor::randint instead.

Source §

impl Tensor

Source

pub fn realize(&mut self) -> Result<()>

Realize (execute) this tensor’s computation graph.

This is a convenience method that prepares and executes in one call. For repeated executions of the same computation, use prepare() instead.

§Pipeline

Prepare: Creates an ExecutionPlan (compiles kernels, allocates buffers)
Execute: Runs all kernels in dependency order
Return: Links output buffer to this tensor’s UOp

§Example

let a = Tensor::from_slice(&[1.0f32, 2.0, 3.0]);
let b = Tensor::from_slice(&[4.0f32, 5.0, 6.0]);
let c = (&a + &b).realize()?;
// c's buffer now contains [5.0, 7.0, 9.0]

§Errors

Returns error if preparation or execution fails.

Source

pub fn realize_with(&mut self, config: &PrepareConfig) -> Result<()>

Realize tensor with custom configuration.

Like realize() but allows specifying optimization strategy and codegen backend.

§Example

use svod_tensor::PrepareConfig;
use svod_schedule::{OptStrategy, OptimizerConfig};

let c = a.matmul(&b)?;
let config = PrepareConfig::from(
    OptimizerConfig::builder()
        .strategy(OptStrategy::Beam { width: 4 })
        .build()
);
let c = c.realize_with(&config)?;

Source

pub fn prepare(&mut self) -> Result<ExecutionPlan>

Prepare an execution plan for this tensor’s computation graph.

This performs all one-time work:

Creates schedule from computation graph
Instantiates strict range-expanded callable schedule items
Compiles all kernels
Allocates all buffers
Builds dependency-ordered prepared op execution plan

The returned ExecutionPlan can then be executed multiple times without recompilation overhead.

§Example

let a = Tensor::from_slice(&[1.0f32, 2.0, 3.0]);
let b = Tensor::from_slice(&[4.0f32, 5.0, 6.0]);
let mut c = &a + &b;

// One-time preparation (wires output tensor to plan buffer)
let plan = c.prepare()?;

// Fast execution (can be called many times)
plan.execute()?;

// Get results
let output = plan.output_buffer();

§Errors

Returns error if:

Rangeify transformation fails
No kernels found after scheduling
Kernel compilation fails
Buffer allocation fails

Source

pub fn prepare_with(&mut self, config: &PrepareConfig) -> Result<ExecutionPlan>

Prepare an execution plan with explicit configuration.

This method allows fine-grained control over kernel optimization settings and codegen backend selection.

§Example

use svod_tensor::PrepareConfig;
use svod_schedule::{OptimizerConfig, OptStrategy, BeamConfig};

// Beam search with width 8 and 120s timeout
let config = PrepareConfig::from(
    OptimizerConfig::builder()
        .strategy(OptStrategy::Beam { width: 8 })
        .beam(BeamConfig::builder()
            .timeout_secs(120)
            .build())
        .build()
);

let plan = tensor.prepare_with(&config)?;
plan.execute()?;

Source

pub fn realize_batch<'a>( tensors: impl IntoIterator<Item = &'a mut Tensor>, ) -> Result<()>

Realize multiple tensors in a single batch, sharing computation.

Merges all tensor computation graphs into one SINK, enabling the scheduler to share kernels across outputs. More efficient than calling realize() individually when tensors share subgraphs.

Source

pub fn realize_batch_with<'a>( tensors: impl IntoIterator<Item = &'a mut Tensor>, config: &PrepareConfig, ) -> Result<()>

Realize multiple tensors with custom configuration.

Source

pub fn prepare_batch<'a>( tensors: impl IntoIterator<Item = &'a mut Tensor>, ) -> Result<ExecutionPlan>

Prepare a batch execution plan for multiple tensors.

Output tensors are wired to plan buffers — after execute/execute_with_vars, results are readable directly via tensor.as_vec() or tensor.array_view().

Source

pub fn prepare_batch_with<'a>( tensors: impl IntoIterator<Item = &'a mut Tensor>, config: &PrepareConfig, ) -> Result<ExecutionPlan>

Prepare a batch execution plan with custom configuration.

Source §

impl Tensor

Source

pub fn sum(&self, axes: impl Into<AxisSpec>) -> Result<Self>

Sum of tensor elements over given axes.

Auto-promotes accumulation dtype (bool→int32, float16→float32) like Tinygrad. Use sum_with().promote(false) to preserve input dtype.

Source

pub fn prod(&self, axes: impl Into<AxisSpec>) -> Result<Self>

Product of tensor elements over given axes.

Preserves input dtype. Use prod_with().promote(true) or .dtype(...) for different accumulation.

Source

pub fn max(&self, axes: impl Into<AxisSpec>) -> Result<Self>

Maximum of tensor elements over given axes.

Always preserves input dtype.

Source

pub fn min(&self, axes: impl Into<AxisSpec>) -> Result<Self>

Minimum of tensor elements over given axes.

Always preserves input dtype.

Source

pub fn mean(&self, axes: impl Into<AxisSpec>) -> Result<Self>

Mean of tensor elements over given axes.

For integer inputs, automatically uses float32 accumulation. For float inputs, preserves input dtype.

Source

pub fn var(&self, axes: impl Into<AxisSpec>) -> Result<Self>

Variance of tensor elements over given axes.

Computes unbiased sample variance (divides by N-1). For integer inputs, automatically uses float32 accumulation. For float inputs, preserves input dtype.

§Examples

let t = Tensor::from_slice(&[1.0f32, 2.0, 3.0, 4.0]);
let v = t.var(())?;  // Variance over all elements

Source

pub fn std(&self, axes: impl Into<AxisSpec>) -> Result<Self>

Standard deviation of tensor elements over given axes.

Computes unbiased sample standard deviation (divides by N-1). For integer inputs, automatically uses float32 accumulation. For float inputs, preserves input dtype.

§Examples

let t = Tensor::from_slice(&[1.0f32, 2.0, 3.0, 4.0]);
let s = t.std(())?;  // Std dev over all elements

Source

pub fn var_mean(&self, axes: impl Into<AxisSpec>) -> Result<(Self, Self)>

Variance and mean of tensor elements over given axes.

Returns (variance, mean) tuple. More efficient than computing separately. Computes unbiased sample variance (divides by N-1).

§Examples

let t = Tensor::from_slice(&[1.0f32, 2.0, 3.0, 4.0]);
let (v, m) = t.var_mean(())?;

Source

pub fn std_mean(&self, axes: impl Into<AxisSpec>) -> Result<(Self, Self)>

Standard deviation and mean of tensor elements over given axes.

Returns (std, mean) tuple. More efficient than computing separately. Computes unbiased sample standard deviation (divides by N-1).

§Examples

let t = Tensor::from_slice(&[1.0f32, 2.0, 3.0, 4.0]);
let (s, m) = t.std_mean(())?;

Source

pub fn sum_with<'f1, I1>(&'f1 self) -> TensorSumWithBuilder<'f1, I1>
where I1: Into<AxisSpec>,

Sum with additional options (keepdim, dtype, promote).

§Examples

// Explicit dtype
tensor.sum_with(0).dtype(DType::Float32).call()?;

// Auto-promote (int8→int32, etc.)
tensor.sum_with(0).promote(true).call()?;

// With keepdim
tensor.sum_with(0).keepdim(true).call()?;

Source

pub fn prod_with<'f1, I1>(&'f1 self) -> TensorProdWithBuilder<'f1, I1>
where I1: Into<AxisSpec>,

Product with additional options (keepdim, dtype, promote).

Source

pub fn max_with<'f1, I1>(&'f1 self) -> TensorMaxWithBuilder<'f1, I1>
where I1: Into<AxisSpec>,

Maximum with keepdim option.

Source

pub fn min_with<'f1, I1>(&'f1 self) -> TensorMinWithBuilder<'f1, I1>
where I1: Into<AxisSpec>,

Minimum with keepdim option.

Source

pub fn mean_with<'f1, I1>(&'f1 self) -> TensorMeanWithBuilder<'f1, I1>
where I1: Into<AxisSpec>,

Mean with keepdim option.

Source

pub fn var_with<'f1, I1>(&'f1 self) -> TensorVarWithBuilder<'f1, I1>
where I1: Into<AxisSpec>,

Variance with keepdim option.

Source

pub fn std_with<'f1, I1>(&'f1 self) -> TensorStdWithBuilder<'f1, I1>
where I1: Into<AxisSpec>,

Standard deviation with keepdim option.

Source

pub fn var_mean_with<'f1, I1>(&'f1 self) -> TensorVarMeanWithBuilder<'f1, I1>
where I1: Into<AxisSpec>,

Variance and mean with keepdim option.

Source

pub fn std_mean_with<'f1, I1>(&'f1 self) -> TensorStdMeanWithBuilder<'f1, I1>
where I1: Into<AxisSpec>,

Standard deviation and mean with keepdim option.

Source §

impl Tensor

Source

pub fn argmax(&self, axis: impl Into<Option<isize>>) -> Result<Self>

Index of maximum value along axis.

Returns int32 tensor with indices of maximum values. For ties, returns the index of the first occurrence.

§Arguments

axis - Axis to reduce (None = flatten first)

§Examples

let t = Tensor::from_slice(&[[1.0, 3.0, 2.0], [4.0, 2.0, 5.0]]);
t.argmax(None)?;      // 5 (flattened: max is at index 5)
t.argmax(Some(0))?;   // [1, 0, 1] (row indices of max per column)
t.argmax(Some(1))?;   // [1, 2] (column indices of max per row)

Source

pub fn hardmax(&self, axis: isize) -> Result<Self>

Hard maximum: one-hot encoding of the argmax along an axis.

Returns a tensor of the same shape with 1.0 at the position of the maximum value along axis and 0.0 elsewhere, cast to the input dtype.

Source

pub fn argmin(&self, axis: impl Into<Option<isize>>) -> Result<Self>

Index of minimum value along axis.

Returns int32 tensor with indices of minimum values. For ties, returns the index of the first occurrence.

Source

pub fn any(&self, axes: impl Into<AxisSpec>) -> Result<Self>

Test if any element is true along axes.

Logical OR reduction. Returns bool dtype. Non-zero values are treated as true.

§Examples

let t = Tensor::from_slice(&[[true, false], [false, false]]);
t.any(())?;           // true (any element is true)
t.any(0)?;            // [true, false] (any true per column)
t.any(1)?;            // [true, false] (any true per row)

Source

pub fn all(&self, axes: impl Into<AxisSpec>) -> Result<Self>

Test if all elements are true along axes.

Logical AND reduction. Returns bool dtype. Non-zero values are treated as true.

§Examples

let t = Tensor::from_slice(&[[true, true], [true, false]]);
t.all(())?;           // false (not all elements are true)
t.all(0)?;            // [true, false] (all true per column)
t.all(1)?;            // [true, false] (all true per row)

Source

pub fn argmax_with<'f1, I1>(&'f1 self) -> TensorArgmaxWithBuilder<'f1, I1>
where I1: Into<Option<isize>>,

Argmax with keepdim option.

Source

pub fn argmin_with<'f1, I1>(&'f1 self) -> TensorArgminWithBuilder<'f1, I1>
where I1: Into<Option<isize>>,

Argmin with keepdim option.

Source

pub fn any_with<'f1, I1>(&'f1 self) -> TensorAnyWithBuilder<'f1, I1>
where I1: Into<AxisSpec>,

Any with keepdim option.

Source

pub fn all_with<'f1, I1>(&'f1 self) -> TensorAllWithBuilder<'f1, I1>
where I1: Into<AxisSpec>,

All with keepdim option.

Source §

impl Tensor

Source

pub fn try_reshape( &self, new_shape: impl IntoIterator<Item = impl Into<SInt>>, ) -> Result<Tensor>

Reshape tensor to a new shape.

The total number of elements must remain the same. Supports negative indices: -1 means “infer this dimension”.

§Examples

let t = Tensor::from_slice(&[1.0f32, 2.0, 3.0, 4.0, 5.0, 6.0]);
let reshaped = t.try_reshape(&[2, 3]).unwrap();  // [6] -> [2, 3]
let inferred = t.try_reshape(&[-1, 2]).unwrap(); // [6] -> [3, 2]

§Errors

Returns error if:

Shape contains negative values other than -1
Multiple -1 dimensions specified
Total elements don’t match

Source

pub fn try_expand( &self, new_shape: impl IntoIterator<Item = impl Into<SInt>>, ) -> Result<Tensor>

Expand tensor to a new shape with mixed concrete/symbolic dimensions.

Source

pub fn try_permute(&self, axes: &[isize]) -> Result<Tensor>

Permute (reorder) tensor dimensions.

The axes parameter specifies the new order of dimensions. Each axis index 0..ndim must appear exactly once.

§Examples

// Tensor with shape [2, 3, 4]
// t.try_permute(&[2, 0, 1]) -> shape [4, 2, 3]
// t.try_permute(&[1, 0, 2]) -> shape [3, 2, 4]

§Errors

Returns error if:

Axes is not a valid permutation
Axis indices out of range

Source

pub fn try_transpose(&self, dim0: isize, dim1: isize) -> Result<Tensor>

Transpose two dimensions.

Convenience method for swapping two dimensions. Equivalent to permute with the two dimensions swapped.

§Examples

// Tensor with shape [2, 3, 4]
// t.try_transpose(0, 1) -> shape [3, 2, 4]
// t.try_transpose(-1, 0) -> shape [4, 3, 2]  (negative indices supported)

§Errors

Returns error if axis indices are out of range.

Source

pub fn try_squeeze(&self, dim: Option<isize>) -> Result<Tensor>

Expand (broadcast) dimensions.

Dimensions of size 1 can be expanded to larger sizes. Use -1 to keep the current dimension size.

§Examples

// Tensor with shape [1, 3, 1]
// t.try_expand(&[4, -1, 5]) -> shape [4, 3, 5]

Squeeze dimensions of size 1.

If dim is None, removes all dimensions of size 1. If dim is Some(axis), removes only that dimension if it’s size 1.

§Examples

// Tensor with shape [1, 3, 1, 4]
// t.try_squeeze(None) -> shape [3, 4]
// t.try_squeeze(Some(0)) -> shape [3, 1, 4]
// t.try_squeeze(Some(2)) -> shape [1, 3, 4]

§Errors

Returns error if:

Specified dimension is not size 1
Axis index out of range

Source

pub fn try_unsqueeze(&self, dim: isize) -> Result<Tensor>

Add a dimension of size 1.

Inserts a new dimension at the specified position. Supports negative indices: -1 means after the last dimension.

§Examples

// Tensor with shape [3, 4]
// t.try_unsqueeze(0) -> shape [1, 3, 4]
// t.try_unsqueeze(1) -> shape [3, 1, 4]
// t.try_unsqueeze(-1) -> shape [3, 4, 1]

§Errors

Returns error if axis index is out of range.

Source

pub fn flip(&self, axes: &[isize]) -> Result<Tensor>

Reverse elements along specified axes.

Each axis in the list is flipped (reversed). Supports negative indexing.

§Examples

let t = Tensor::from_slice(&[1.0f32, 2.0, 3.0, 4.0]).try_reshape(&[2, 2])?;
let flipped = t.flip(&[0])?;  // Flip along axis 0

Source

pub fn split(&self, sizes: &[usize], dim: isize) -> Result<Vec<Tensor>>

Split tensor into chunks along a dimension.

Returns a vector of tensors, each with the specified size along the split dimension.

§Examples

let t = Tensor::from_slice(&[1.0f32, 2.0, 3.0, 4.0, 5.0]);
let parts = t.split(&[2, 3], 0)?;  // [2] and [3]

Source

pub fn repeat(&self, repeats: &[SInt]) -> Result<Tensor>

Repeat tensor along each dimension.

repeats[i] is the number of times to repeat along dimension i. Accepts &[SInt] — supports both concrete and symbolic repeat counts.

§Examples

use svod_ir::SInt;
let t = Tensor::from_slice(&[1.0f32, 2.0, 3.0]).try_reshape(&[1, 3])?;
let tiled = t.repeat(&[SInt::from(3), SInt::from(2)])?;  // Shape [3, 6]

Source

pub fn flatten(&self) -> Result<Tensor>

Flatten tensor to 1D.

Reshapes tensor to have a single dimension containing all elements. Equivalent to try_reshape(&[-1]).

§Examples

let t = Tensor::from_slice(&[[1, 2], [3, 4]]);  // Shape [2, 2]
let flattened = t.flatten()?;  // Shape [4]

Source

pub fn try_pad(&self, padding: &[(isize, isize)]) -> Result<Tensor>

Pad tensor with zeros (or other padding value).

Each tuple in padding specifies (begin, end) padding for a dimension. Use 0 for no padding on that side.

§Examples

let t = Tensor::from_slice(&[1.0f32, 2.0, 3.0]);  // Shape [3]
let padded = t.try_pad(&[(1, 2)]).unwrap();  // Shape [6]: [0, 1, 2, 3, 0, 0]

§Errors

Returns error if:

Padding values are symbolic (not concrete)
Number of padding pairs doesn’t match dimensions

Source

pub fn cat(tensors: &[&Tensor], dim: isize) -> Result<Tensor>

Concatenate tensors along an axis.

All tensors must have the same shape except in the concatenating dimension.

§Examples

let a = Tensor::from_slice(&[1.0f32, 2.0, 3.0]).try_reshape(&[3]).unwrap();
let b = Tensor::from_slice(&[4.0f32, 5.0]).try_reshape(&[2]).unwrap();
let c = Tensor::cat(&[&a, &b], 0).unwrap();  // Shape [5]: [1, 2, 3, 4, 5]

§Errors

Returns error if:

Tensors have different number of dimensions
Non-concat dimensions don’t match

Source

pub fn stack(tensors: &[&Tensor], dim: isize) -> Result<Tensor>

Stack tensors along a new dimension.

Creates a new axis at dim by unsqueezing each tensor, then concatenating.

Source

pub fn unflatten(&self, dim: isize, sizes: &[isize]) -> Result<Tensor>

Replace a single dimension with multiple dimensions.

Inverse of flatten: splits dimension dim into the shape given by sizes.

Source

pub fn meshgrid( tensors: &[&Tensor], indexing: MeshgridIndexing, ) -> Result<Vec<Tensor>>

Create coordinate grids from 1D tensors.

indexing: Ij (matrix/default) or Xy (Cartesian, swaps first two inputs).

Source

pub fn shape_tensor(&self) -> Result<Tensor>

Get the shape of this tensor as a new tensor.

Returns a 1D tensor of int64 containing the shape dimensions. This is useful for ONNX Shape operator compatibility.

§Examples

let t = Tensor::from_slice(&[1.0f32; 6]).try_reshape(&[2, 3]).unwrap();
let shape_tensor = t.shape_tensor().unwrap();  // Tensor([2, 3]) with dtype int64

§Errors

Supports symbolic dimensions — symbolic dims produce scalar UOp tensors.

Source

pub fn try_shrink<R: IntoShrinkRange>( &self, ranges: impl IntoIterator<Item = R>, ) -> Result<Tensor>

Shrink (slice) tensor along each dimension.

Each tuple in ranges specifies (begin, end) for a dimension. Use (0, size) to keep full dimension.

§Examples

let t = Tensor::from_slice(&[1.0f32, 2.0, 3.0, 4.0, 5.0]);
let sliced = t.try_shrink(&[(1, 4)]).unwrap();  // Elements [2, 3, 4]

§Errors

Returns error if negative indices are used with symbolic shape dimensions.

Source

pub fn center_crop_pad( &self, target_shape: &[usize], axes: Option<&[usize]>, ) -> Result<Tensor>

Center-crop or center-pad each specified axis to the target size.

For axes where target < current, crops from the center. For axes where target > current, pads symmetrically around the center. Axes where target == current are unchanged.

axes specifies which dimensions to apply (default: all).

Source

pub fn shape(&self) -> Result<Shape>

Get the concrete shape of this tensor.

Source

pub fn ndim(&self) -> Result<usize>

Get the number of dimensions (rank).

Source

pub fn numel(&self) -> Result<usize>

Total number of elements. Fails if any dimension is symbolic.

Source

pub fn triu(&self, diagonal: i64) -> Result<Tensor>

Keep upper triangle, zero below. Matches Tinygrad Tensor.triu(diagonal).

Source

pub fn tril(&self, diagonal: i64) -> Result<Tensor>

Keep lower triangle, zero above. Matches Tinygrad Tensor.tril(diagonal).

Source §

impl Tensor

Source

pub fn slice_with<'f1, 'f2, 'f3, 'f4, 'f5>( &'f1 self, ) -> TensorSliceWithBuilder<'f1, 'f2, 'f3, 'f4, 'f5>

Slice tensor with Python-style indexing: negative indices, steps, and axis selection.

Source §

impl Tensor

Source

pub fn embedding(&self, indices: &Tensor) -> Result<Tensor>

Embedding lookup: self is the weight table [vocab_size, embed_dim]. Returns self[indices] with shape [*indices.shape, embed_dim].

Source

pub fn apply_rotary_emb( &self, cos: &Tensor, sin: &Tensor, interleaved: bool, ) -> Result<Tensor>

Apply rotary position embedding rotation. self: [..., rot_dim] tensor to rotate. cos, sin: broadcastable to self’s shape [..., rot_dim/2]. If interleaved: pairs are (even, odd) indices. If not interleaved: pairs are (first_half, second_half).

Source §

impl Tensor

Source

pub fn scaled_dot_product_attention<'f1, 'f2, 'f3, 'f4>( &'f1 self, ) -> TensorScaledDotProductAttentionBuilder<'f1, 'f2, 'f3, 'f4>

Scaled dot-product attention. self (Q): [B, H, Sq, D], key (K): [B, H, Sk, D], value (V): [B, H, Sk, Dv]. Returns [B, H, Sq, Dv].

Source §

impl Tensor

Source

pub fn from_lazy(uop: Arc<UOp>) -> Self

Create a lazy tensor from a UOp graph (no buffer allocated). Used for deferred computation graphs like ONNX weight views.

Source

pub fn from_path(path: &Path) -> Result<Self>

Create a file-backed tensor using the DISK device (Tinygrad: Tensor(pathlib.Path)). The file is memory-mapped lazily — no data is read until the tensor is realized. The resulting tensor has dtype uint8 and shape (file_size,).

Source

pub fn uop(&self) -> Arc<UOp>

Get the current UOp for this tensor.

This reads from the registry, so it reflects any global substitutions.

Source

pub fn kernels(&self) -> Vec<KernelInfo>

Get kernels for THIS tensor.

Note: Kernel tracking is not yet implemented with the new registry. This returns an empty list for now.

Source

pub fn empty(shape: &[usize], dtype: DType) -> Self

Create an uninitialized buffer-backed tensor with the given shape and dtype.

No device memory is allocated — only the BUFFER UOp is created. Use assign() to bind real data before realize(). Matches Tinygrad’s Tensor.empty(*shape).

Source

pub fn empty_dynamic(shape: &[SInt], dtype: DType) -> Self

Create an uninitialized buffer-backed tensor with symbolic (dynamic) dimensions.

Buffer is sized to prod(vmax) — each symbolic dim uses its Variable’s max_val for allocation. This enables rebinding to any value in [min, max] without reallocation. Matches Tinygrad’s prod([x.vmax if isinstance(x, UOp) else x for x in shape]).

Source

pub fn empty_zero(dtype: DType) -> Self

Create an empty 0-element tensor with the given dtype and shape [0].

Source

pub fn full( shape: &[usize], value: impl Into<ConstValue>, dtype: DType, ) -> Result<Self>

Create a tensor filled with a constant value, broadcast to the given shape.

Source

pub fn zeros(shape: &[usize], dtype: DType) -> Result<Self>

Create a zero-filled tensor with the given concrete shape.

Source

pub fn ones(shape: &[usize], dtype: DType) -> Result<Self>

Create a one-filled tensor with the given concrete shape.

Source

pub fn full_dynamic( shape: &[SInt], value: impl Into<ConstValue>, dtype: DType, ) -> Result<Self>

Create a tensor filled with a constant value, using symbolic (dynamic) dimensions.

Dimensions can be concrete (SInt::Const) or symbolic (SInt::Symbolic from Variable::bind()).

§Example

use svod_tensor::{Tensor, Variable};
use svod_dtype::DType;

let batch = Variable::new("batch", 1, 32);
let x = Tensor::full_dynamic(&[batch.bind(16)?.into(), 784.into()], 0.0, DType::Float32)?;

Source

pub fn zeros_dynamic(shape: &[SInt], dtype: DType) -> Result<Self>

Create a zero-filled tensor with symbolic (dynamic) dimensions.

Source

pub fn ones_dynamic(shape: &[SInt], dtype: DType) -> Result<Self>

Create a one-filled tensor with symbolic (dynamic) dimensions.

Source

pub fn cumsum(&self, axis: isize) -> Result<Self>

Cumulative sum along an axis.

Source

pub fn cumprod(&self, axis: isize) -> Result<Self>

Cumulative product along an axis.

Source

pub fn arange(start: i64, stop: Option<i64>, step: Option<i64>) -> Result<Self>

Create 1D tensor with evenly spaced Int32 values.

Source

pub fn arange_f64( start: f64, stop: f64, step: f64, dtype: DType, ) -> Result<Self>

Create 1D tensor with evenly spaced values (float parameters).

Source

pub fn linspace( start: f64, end: f64, steps: usize, dtype: DType, ) -> Result<Self>

Create 1D tensor with steps evenly spaced values from start to end (inclusive).

Source

pub fn const_<T: Into<ConstValue>>(value: T, dtype: DType) -> Self

Create a scalar constant tensor.

Creates a 0-dimensional tensor containing a single constant value. The constant is embedded directly in the IR and does not allocate a buffer until realized (if needed).

§Arguments

value - The constant value (will be converted to ConstValue)
dtype - The data type for the tensor

§Examples

// Float constant
let pi = Tensor::const_(3.14159, DType::Float32);

// Integer constant
let forty_two = Tensor::const_(42i64, DType::Int64);

Source

pub fn from_const<T: Into<ConstValue> + HasDType>(value: T) -> Self

Create a scalar constant tensor with dtype auto-inferred from value.

Convenience method that infers dtype from the Rust type.

§Examples

let f = Tensor::from_const(3.14f32);  // DType::Float32
let i = Tensor::from_const(42i32);    // DType::Int32
let b = Tensor::from_const(true);     // DType::Bool

Source

pub fn device(&self) -> DeviceSpec

Get device specification from underlying UOp graph.

Returns the device where this tensor’s data resides. For lazy tensors (not yet realized), returns the target device. Defaults to CPU if no device is found in the graph.

§Examples

let cpu_tensor = Tensor::from_slice(&[1.0f32, 2.0, 3.0]);
assert_eq!(cpu_tensor.device(), DeviceSpec::Cpu);

Source

pub fn to(&self, device: DeviceSpec) -> Self

Move tensor to a different device.

Creates a lazy COPY operation. Data is not transferred until realize(). If already on target device, returns a clone (no-op).

§Examples

let cpu_tensor = Tensor::from_slice(&[1.0f32, 2.0, 3.0]);
let mut gpu_tensor = cpu_tensor.to(DeviceSpec::Cuda { device_id: 0 });
gpu_tensor.realize()?;  // Actually transfers data

Source

pub fn cast(&self, dtype: DType) -> Result<Self>

Cast tensor to a different dtype.

§Examples

let t = Tensor::from_slice(&[1.0f32, 2.0, 3.0]);
let t_int = t.cast(DType::Int32)?;

Source

pub fn custom_kernel<F>( &self, others: &[&Tensor], fxn: F, ) -> Result<Vec<Tensor>>
where F: FnOnce(Vec<Arc<UOp>>) -> Arc<UOp>,

Build and apply a custom UOp kernel over this tensor and additional inputs.

The closure receives PARAM placeholders (as UOps) corresponding to [self, others...] and must return the kernel body UOp (typically a SINK). Returns tensors wrapped with AFTER(CALL) dependencies in argument order.

Source

pub fn custom_kernel_with<F>( &self, others: &[&Tensor], info: CallInfo, fxn: F, ) -> Result<Vec<Tensor>>
where F: FnOnce(Vec<Arc<UOp>>) -> Arc<UOp>,

custom_kernel with explicit CALL metadata.

Source

pub fn bitcast(&self, dtype: DType) -> Result<Self>

Bitcast tensor to a different dtype, reinterpreting bits.

For equal-itemsize dtypes (e.g. f32 ↔ i32) this is the pure IR-level reinterpretation. For different-itemsize dtypes (e.g. u32 → u16 or u32 → u64) the last axis is split or combined via shifts + reshape, matching Tinygrad’s tensor.py::bitcast. The total byte count is preserved; the last axis grows (src_size > dst_size) or shrinks (src_size < dst_size) by rate = max(...)/min(...).

Requires:

source and destination are both scalar (vector dtypes unsupported);
(shape[-1] * src_size) divides evenly by dst_size;
the last shape dim is concrete (not symbolic).

Source

pub fn arange_with_dtype() -> TensorArangeWithDtypeBuilder

Create 1D tensor with evenly spaced values and explicit dtype.

Matches Tinygrad’s Tensor.arange(): full(step) → cumsum → + (start - step). Accepts concrete i64 or symbolic Arc<UOp> for start/stop/step. If stop is None, treats start as stop and starts from 0.

Source §

impl Tensor

Source

pub fn try_assign(&self, value: &Tensor) -> Result<()>

Assign a value tensor to this tensor in-place.

Embeds the write as AFTER(target, STORE(target, value)).

§Example

let placeholder = Tensor::empty(&[2, 3], DType::Float32);
let real_data = Tensor::from_slice(&[1.0f32, 2.0, 3.0, 4.0, 5.0, 6.0])
    .try_reshape(&[2, 3]).unwrap();
placeholder.assign(&real_data);

Source

pub fn assign(&self, value: &Tensor)

Source

pub fn contiguous(&self) -> Self

Ensure this tensor has contiguous memory layout.

Creates a CONTIGUOUS UOp that forces materialization when realized. Following Tinygrad’s approach, calling .contiguous().realize() on a pure constant tensor will create an actual buffer.

§Examples

// Force a constant to be materialized
let mut c = Tensor::const_(5.0f32, DType::Float32).contiguous();
c.realize()?;
assert!(c.buffer().is_some());

Source §

impl Tensor

Source

pub fn zero(&self) -> Result<Self>

Broadcast a dtype-aware zero to match this tensor’s shape.

Source

pub fn one(&self) -> Result<Self>

Broadcast a dtype-aware one to match this tensor’s shape.

Source

pub fn eye(n: usize, m: usize, dtype: DType) -> Result<Self>

Identity matrix of shape [n, m] with the given dtype.

Source §

impl Tensor

Source

pub fn cumsum_with<'f1>(&'f1 self) -> TensorCumsumWithBuilder<'f1>

Cumulative sum with exclusive and reverse options.

Source

pub fn cumprod_with<'f1>(&'f1 self) -> TensorCumprodWithBuilder<'f1>

Cumulative product with exclusive and reverse options.

Trait Implementations§

Source §

impl Add<&Tensor> for Tensor

Source §

type Output = Tensor

The resulting type after applying the + operator.

Source §

fn add(self, other: &Tensor) -> Tensor

Performs the + operation. Read more

Source §

impl Add<Tensor> for &Tensor

Source §

type Output = Tensor

The resulting type after applying the + operator.

Source §

fn add(self, other: Tensor) -> Tensor

Performs the + operation. Read more

Source §

impl Add for &Tensor

Source §

type Output = Tensor

The resulting type after applying the + operator.

Source §

fn add(self, other: &Tensor) -> Tensor

Performs the + operation. Read more

Source §

impl Add for Tensor

Source §

type Output = Tensor

The resulting type after applying the + operator.

Source §

fn add(self, other: Tensor) -> Tensor

Performs the + operation. Read more

Source §

impl Clone for Tensor

Source §

fn clone(&self) -> Self

Returns a duplicate of the value. Read more

1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

Source §

impl Div<&Tensor> for Tensor

Source §

type Output = Tensor

The resulting type after applying the / operator.

Source §

fn div(self, other: &Tensor) -> Tensor

Performs the / operation. Read more

Source §

impl Div<Tensor> for &Tensor

Source §

type Output = Tensor

The resulting type after applying the / operator.

Source §

fn div(self, other: Tensor) -> Tensor

Performs the / operation. Read more

Source §

impl Div for &Tensor

Source §

type Output = Tensor

The resulting type after applying the / operator.

Source §

fn div(self, other: &Tensor) -> Tensor

Performs the / operation. Read more

Source §

impl Div for Tensor

Source §

type Output = Tensor

The resulting type after applying the / operator.

Source §

fn div(self, other: Tensor) -> Tensor

Performs the / operation. Read more

Source §

impl Mul<&Tensor> for Tensor

Source §

type Output = Tensor

The resulting type after applying the * operator.

Source §

fn mul(self, other: &Tensor) -> Tensor

Performs the * operation. Read more

Source §

impl Mul<Tensor> for &Tensor

Source §

type Output = Tensor

The resulting type after applying the * operator.

Source §

fn mul(self, other: Tensor) -> Tensor

Performs the * operation. Read more

Source §

impl Mul for &Tensor

Source §

type Output = Tensor

The resulting type after applying the * operator.

Source §

fn mul(self, other: &Tensor) -> Tensor

Performs the * operation. Read more

Source §

impl Mul for Tensor

Source §

type Output = Tensor

The resulting type after applying the * operator.

Source §

fn mul(self, other: Tensor) -> Tensor

Performs the * operation. Read more

Source §

impl Neg for &Tensor

Source §

type Output = Tensor

The resulting type after applying the - operator.

Source §

fn neg(self) -> Tensor

Performs the unary - operation. Read more

Source §

impl Neg for Tensor

Source §

type Output = Tensor

The resulting type after applying the - operator.

Source §

fn neg(self) -> Tensor

Performs the unary - operation. Read more

Source §

impl Sub<&Tensor> for Tensor

Source §

type Output = Tensor

The resulting type after applying the - operator.

Source §

fn sub(self, other: &Tensor) -> Tensor

Performs the - operation. Read more

Source §

impl Sub<Tensor> for &Tensor

Source §

type Output = Tensor

The resulting type after applying the - operator.

Source §

fn sub(self, other: Tensor) -> Tensor

Performs the - operation. Read more

Source §

impl Sub for &Tensor

Source §

type Output = Tensor

The resulting type after applying the - operator.

Source §

fn sub(self, other: &Tensor) -> Tensor

Performs the - operation. Read more

Source §

impl Sub for Tensor

Source §

type Output = Tensor

The resulting type after applying the - operator.

Source §

fn sub(self, other: Tensor) -> Tensor

Performs the - operation. Read more

Auto Trait Implementations§

§

impl Freeze for Tensor

§

impl !RefUnwindSafe for Tensor

§

impl Send for Tensor

§

impl Sync for Tensor

§

impl Unpin for Tensor

§

impl UnsafeUnpin for Tensor

§

impl !UnwindSafe for Tensor

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<T> CloneToUninit for T
where T: Clone,

Source §

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T> Instrument for T

Source §

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more

Source §

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §