Struct Buffer

Source

pub struct Buffer<T: Float> { /* private fields */ }

Expand description

A non-trainable tensor that is part of a module’s persistent state.

Like crate::Parameter, Buffer<T> derefs to Tensor<T> for all tensor operations and clones share the same underlying Arc identity. Unlike Parameter, requires_grad is always false.

Implementations§

Source §

impl<T: Float> Buffer<T>

Source

pub fn new(tensor: Tensor<T>) -> Self

Wrap a tensor as a buffer. requires_grad is forced to false.

Source

pub fn zeros(shape: &[usize]) -> FerrotorchResult<Self>

Create a zero-filled buffer with the given shape.

Source

pub fn ones(shape: &[usize]) -> FerrotorchResult<Self>

Create a one-filled buffer with the given shape.

Source

pub fn from_slice(data: &[T], shape: &[usize]) -> FerrotorchResult<Self>

Create a buffer from a slice + shape.

Source

pub fn tensor(&self) -> &Tensor<T>

Borrow the underlying tensor.

Source

pub fn into_tensor(self) -> Tensor<T>

Consume and return the underlying tensor.

Source

pub fn set_data(&mut self, tensor: Tensor<T>)

Replace the buffer’s data. The new tensor is set to requires_grad = false regardless of its input state.

Source

pub fn to(&self, device: Device) -> FerrotorchResult<Self>

Move this buffer to a device.

Methods from Deref<Target = Tensor<T>>§

Source

pub fn backward(&self) -> Result<(), FerrotorchError>

Compute gradients of all leaf tensors that contribute to this tensor.

This tensor must be scalar (0-dim or single-element). After this call, leaf tensors with requires_grad = true will have their .grad() set.

Source

pub fn backward_with_gradient( &self, gradient: &Tensor<T>, ) -> Result<(), FerrotorchError>

Run backward with an external gradient.

This allows backward on non-scalar tensors by providing the initial gradient explicitly. The gradient shape must match this tensor’s shape. Used for multi-head outputs, Jacobian computation, and custom loss functions.

Source

pub fn grad_wrt( &self, inputs: &[&Tensor<T>], retain_graph: bool, create_graph: bool, ) -> Result<Vec<Option<Tensor<T>>>, FerrotorchError>

Compute gradients of this tensor with respect to inputs, returning the gradient tensors directly (without accumulating on leaves).

See grad for full documentation.

Source

pub fn add_scalar_(&self, value: T) -> Result<&Tensor<T>, FerrotorchError>

Add a scalar to every element in-place: self += value.

Returns &Self for method chaining. Follows PyTorch’s Tensor.add_() semantics — the trailing underscore denotes mutation.

§Errors

Returns an error if the tensor is part of the computation graph or is a leaf with requires_grad = true.

Source

pub fn mul_scalar_(&self, value: T) -> Result<&Tensor<T>, FerrotorchError>

Multiply every element by a scalar in-place: self *= value.

§Errors

Returns an error if the tensor is part of the computation graph or is a leaf with requires_grad = true.

Source

pub fn fill_(&self, value: T) -> Result<&Tensor<T>, FerrotorchError>

Fill every element with value in-place.

§Errors

Returns an error if the tensor is part of the computation graph or is a leaf with requires_grad = true.

Source

pub fn zero_(&self) -> Result<&Tensor<T>, FerrotorchError>

Zero all elements in-place: self = 0.

Equivalent to self.fill_(T::zero()).

§Errors

Returns an error if the tensor is part of the computation graph or is a leaf with requires_grad = true.

Source

pub fn add_(&self, other: &Tensor<T>) -> Result<&Tensor<T>, FerrotorchError>

Add another tensor elementwise in-place: self += other.

Equivalent to PyTorch’s Tensor.add_(other) — i.e. add_scaled_ with alpha = 1.0. other may be broadcast to self.shape() as long as the broadcast result equals self.shape() (PyTorch invariant for all in-place ops).

For GPU f32 tensors on the same-shape fast path, uses the GPU add kernel and swaps the storage (no CPU round-trip).

§Errors

Returns an error if other cannot be broadcast to self.shape() (or if doing so would change self.shape()), or if the tensor is part of the computation graph or is a leaf with requires_grad = true.

Source

pub fn add_scaled_( &self, other: &Tensor<T>, alpha: f64, ) -> Result<&Tensor<T>, FerrotorchError>

In-place version of torch.add(input, other, *, alpha): self = self + alpha * other.

other may be broadcast to self.shape() (PyTorch parity); the broadcast result must equal self.shape() — an in-place op cannot change the tensor’s shape. The fast same-shape, alpha == 1.0 path uses the GPU add kernel directly when applicable; broadcast or scaled paths route through grad_fns::arithmetic::add_scaled (which itself dispatches CPU/GPU + broadcasting) and swap the resulting storage in.

§Errors

Returns an error if shapes are not broadcast-compatible, if the broadcast result differs from self.shape(), or if the tensor is part of the computation graph or is a leaf with requires_grad = true.

Source

pub fn sub_scaled_( &self, other: &Tensor<T>, alpha: f64, ) -> Result<&Tensor<T>, FerrotorchError>

In-place version of torch.sub(input, other, *, alpha): self = self - alpha * other.

Delegates to Tensor::add_scaled_ with -alpha. PyTorch’s own sub_out at aten/src/ATen/native/BinaryOps.cpp:434-439 does the same: add_stub(device_type(), *this, -alpha). This is the in-place sibling of crate::grad_fns::arithmetic::sub_scaled and the non-test production consumer of that out-of-place entry point (it invokes add_scaled_, which routes through arithmetic::add_scaled; sub_scaled is the symmetric forward caller wired through the parity-sweep "sub" dispatch arm).

other may be broadcast to self.shape(); the broadcast result must equal self.shape() — an in-place op cannot resize the target tensor (PyTorch invariant for all _ ops).

§Errors

Returns an error if shapes are not broadcast-compatible, if the broadcast result differs from self.shape(), or if the tensor is part of the computation graph or is a leaf with requires_grad = true.

Source

pub fn sub_(&self, other: &Tensor<T>) -> Result<&Tensor<T>, FerrotorchError>

Subtract another tensor elementwise in-place: self -= other.

Equivalent to PyTorch’s Tensor.sub_(other) — i.e. sub_scaled_ with alpha = 1.0. Mirrors upstream’s aten/src/ATen/native/BinaryOps.cpp:434-439 TORCH_IMPL_FUNC(sub_out) { add_stub(device_type(), *this, -alpha); } with alpha = 1.0, i.e. self += -1.0 * other == self -= other. Delegating here gives sub_scaled_ a non-test production consumer transitively for free (every caller of sub_ becomes a caller of sub_scaled_), and brings sub_ to PyTorch parity with the sub_(other, *, alpha=1) docstring at torch/_tensor_docs.py:5113 (broadcasting from add_scaled_ is inherited; in-place ops cannot resize self).

§Errors

Returns an error if other cannot be broadcast to self.shape() (or if doing so would change self.shape()), or if the tensor is part of the computation graph or is a leaf with requires_grad = true.

Source

pub fn mul_(&self, other: &Tensor<T>) -> Result<&Tensor<T>, FerrotorchError>

Multiply another tensor elementwise in-place: self *= other.

other may be broadcast to self.shape() (PyTorch parity for Tensor.mul_(other) — aten/src/ATen/native/BinaryOps.cpp:441 TORCH_IMPL_FUNC(mul_out) inherits broadcasting via TensorIterator); the broadcast result must equal self.shape() — an in-place op cannot resize the target tensor.

The same-shape, both-on-CUDA, T == f32 path takes the GPU mul_f32 kernel and swaps the storage (no CPU round-trip). Anything else (broadcasting or non-f32 or CPU) routes through grad_fns::arithmetic::mul (which itself handles CPU + GPU broadcasting via binary_broadcast / broadcast_mul_*) and swaps the resulting storage in.

§Errors

Returns an error if shapes are not broadcast-compatible, if the broadcast result differs from self.shape(), or if the tensor is part of the computation graph or is a leaf with requires_grad = true.

Source

pub fn div_(&self, other: &Tensor<T>) -> Result<&Tensor<T>, FerrotorchError>

Divide by another tensor elementwise in-place: self /= other.

other may be broadcast to self.shape() (PyTorch parity for Tensor.div_(other) — aten/src/ATen/native/BinaryOps.cpp:447 TORCH_IMPL_FUNC(div_out) inherits broadcasting via TensorIterator); the broadcast result must equal self.shape() — an in-place op cannot resize the target tensor.

The same-shape, both-on-CUDA, T == f32 path takes the GPU div_f32 kernel and swaps the storage (no CPU round-trip). Anything else routes through grad_fns::arithmetic::div.

True-division semantics (PyTorch parity, no rounding). For floor / trunc rounding modes use Tensor::div_rounding_.

§Errors

Returns an error if shapes are not broadcast-compatible, if the broadcast result differs from self.shape(), or if the tensor is part of the computation graph or is a leaf with requires_grad = true.

Source

pub fn div_rounding_( &self, other: &Tensor<T>, rounding_mode: &str, ) -> Result<&Tensor<T>, FerrotorchError>

In-place division with a rounding_mode kwarg, mirroring torch.Tensor.div_(other, *, rounding_mode=...) per torch/_tensor_docs.py:1746 and aten/src/ATen/native/BinaryOps.cpp:176 TORCH_META_FUNC2(div, Tensor_mode).

Accepted modes:

"trunc" — self = (self / other).trunc() (rounds toward zero).
"floor" — self = (self / other).floor() (rounds toward negative infinity).

For true-division (no rounding), use Tensor::div_ directly. Any other mode string returns InvalidArgument matching upstream:

div expected rounding_mode to be one of None, 'trunc', or 'floor' but found '...' (BinaryOps.cpp:186)

Broadcasting follows div_ semantics — other may broadcast to self.shape() and the broadcast result must equal self.shape().

§Errors

Returns an error if mode is unrecognized, if shapes are not broadcast-compatible, or if the tensor is part of the computation graph or is a leaf with requires_grad = true.

Source

pub fn clamp_(&self, min: T, max: T) -> Result<&Tensor<T>, FerrotorchError>

Clamp every element to [min, max] in-place.

Each element x is replaced with min.max(x.min(max)), matching PyTorch’s Tensor.clamp_().

This is the both-bounds-required overload; for the (Option<T>, Option<T>) overload that mirrors torch’s clamp_(min=None, max=None) see Tensor::clamp_opt_.

§Errors

Returns an error if min > max.
Returns an error if the tensor is part of the computation graph or is a leaf with requires_grad = true.

Source

pub fn clamp_opt_( &self, min: Option<T>, max: Option<T>, ) -> Result<&Tensor<T>, FerrotorchError>

Clamp with optional bounds — Tensor.clamp_(min=None, max=None) parity.

Mirrors torch.Tensor.clamp_(min=None, max=None) -> Tensor per torch/_tensor_docs.py:1141 and the structured kernel TORCH_IMPL_FUNC(clamp_out) at aten/src/ATen/native/TensorCompare.cpp:831. Either bound may be None:

clamp_opt_(Some(lo), Some(hi)) — equivalent to clamp_(lo, hi).
clamp_opt_(Some(lo), None) — clamp_min_ (lower bound only).
clamp_opt_(None, Some(hi)) — clamp_max_ (upper bound only).
clamp_opt_(None, None) — rejected with InvalidArgument matching upstream “torch.clamp: At least one of ‘min’ or ‘max’ must not be None” (TensorCompare.cpp:106).

NaN-bound parity: if either supplied bound is NaN, the entire tensor is filled with NaN (PyTorch’s at::fill_(result, NaN) branch at TensorCompare.cpp:844, executed when min.isNan() || max.isNan()).

Per-element NaN inputs propagate (matching the kernel’s std::min(std::max(a, min), max) semantics — when a is NaN, both comparisons evaluate false in this implementation and a is left unchanged, which propagates NaN through).

§Errors

Returns an error if both min and max are None.
Returns an error if min > max (when both are Some).
Returns an error if the tensor is part of the computation graph or is a leaf with requires_grad = true.

Source

pub fn add_t(&self, other: &Tensor<T>) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn sub_t(&self, other: &Tensor<T>) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn rsub_t( &self, other: &Tensor<T>, alpha: f64, ) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.rsub(other, *, alpha=1) — reverse subtract: self - alpha * other is the sub_t semantic; rsub is the operand-swapped variant returning other - alpha * self.

Per upstream aten/src/ATen/native/BinaryOps.cpp:1169 Tensor rsub( const Tensor& self, const Tensor& other, const Scalar& alpha) { return at::sub(other, self, alpha); } — a literal operand-swap delegation. The non-test production consumer wiring for arithmetic::rsub per R-DEFER-1: this method is the public, chainable surface that closes the consumer requirement.

Source

pub fn rsqrt_t(&self) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.rsqrt() — reciprocal square root: 1 / sqrt(self).

Mirrors torch.rsqrt(input, *, out=None) per torch/_torch_docs.py:9656 and the upstream impl macro at aten/src/ATen/native/UnaryOps.cpp:346 CREATE_UNARY_TORCH_IMPL_FUNC(rsqrt_out, rsqrt_stub). The non-test production consumer wiring for arithmetic::rsqrt per R-DEFER-1: this method is the public, chainable surface that closes the consumer requirement.

Source

pub fn reciprocal_t(&self) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.reciprocal() — elementwise reciprocal: 1 / self.

Mirrors torch.reciprocal(input, *, out=None) per torch/_torch_docs.py:2584 and the upstream impl macro at aten/src/ATen/native/UnaryOps.cpp:345 CREATE_UNARY_TORCH_IMPL_FUNC(reciprocal_out, reciprocal_stub). The non-test production consumer wiring for arithmetic::reciprocal per R-DEFER-1: this method is the public, chainable surface that closes the consumer requirement.

Source

pub fn abs_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn remainder_t( &self, other: &Tensor<T>, ) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.remainder(other) — elementwise remainder with the sign of the divisor (Python % / NumPy semantics).

Mirrors torch.remainder(input, other, *, out=None) per torch/_torch_docs.py:9453-9472 and the upstream C++ entry at aten/src/ATen/native/BinaryOps.cpp:1184 Tensor remainder(const Tensor& self, const Scalar& other). The float-tensor CPU implementation is at aten/src/ATen/native/cpu/BinaryOpsKernel.cpp: 391-409 remainder_kernel. Registration at torch/overrides.py:1100 torch.remainder: lambda input, other, out=None: -1.

Distinct from fmod_t (dividend-sign / C99 semantics, REQ-14 NOT- STARTED): for remainder(-5, 3) ferrotorch returns 1 (sign matches divisor +3); fmod(-5, 3) returns -2 (sign matches dividend -5).

The non-test production consumer wiring for arithmetic::remainder per R-DEFER-1: this method is the public, chainable surface that closes the consumer requirement.

Source

pub fn fmod_t(&self, other: &Tensor<T>) -> Result<Tensor<T>, FerrotorchError>

torch.fmod(input, other, *, out=None) — elementwise remainder with the sign of the dividend (C99 std::fmod semantics).

Mirrors torch.Tensor.fmod via the same upstream registration torch/overrides.py:666 torch.fmod: lambda input, other, out=None: -1.

Distinct from remainder_t (divisor-sign, REQ-13 SHIPPED): for fmod(-5, 3) ferrotorch returns -2 (sign matches dividend -5); remainder(-5, 3) returns 1 (sign matches divisor +3). See arithmetic::fmod docs for the per-quadrant table.

The non-test production consumer wiring for arithmetic::fmod per R-DEFER-1: this method is the public, chainable surface that closes the consumer requirement.

Source

pub fn floor_divide_t( &self, other: &Tensor<T>, ) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.floor_divide(other) — elementwise floor division (true floor, toward -infinity).

Mirrors torch.floor_divide(input, other, *, out=None) per torch/_torch_docs.py:4265-4296:

Computes :attr:input divided by :attr:other, elementwise, and floors the result.

.. math:: out_i = floor(input_i / other_i)

Upstream entry at aten/src/ATen/native/BinaryOps.cpp:979 Tensor floor_divide(const Tensor& self, const Tensor& other) dispatching to div_floor_stub -> div_floor_kernel at aten/src/ATen/native/cpu/BinaryOpsKernel.cpp:297-349 -> c10::div_floor_floating at c10/util/generic_math.h:34-58. Registration at torch/overrides.py:664 torch.floor_divide: lambda input, other: -1.

torch.floor_divide was historically broken (performed trunc, NOT floor) and torch/_torch_docs.py:4267-4271 explicitly notes:

.. note:: Before PyTorch 1.13 :func:torch.floor_divide incorrectly performed truncation division. To restore the previous behavior use :func:torch.div with rounding_mode='trunc'.

As of PyTorch 1.13+ (and as of the upstream pin this ferrotorch is translated against), torch.floor_divide performs TRUE FLOOR. Verified live on 2026-05-25: torch.floor_divide(-7.0, 3.0).item() == -3.0.

Distinct from remainder_t and fmod_t. The 3-way identity a == floor_divide(a,b) * b + remainder(a,b) holds; the fmod sibling is the trunc-division remainder. For a=-7, b=3:

floor_divide(-7, 3) = -3 (true floor)
remainder(-7, 3) = 2 (sign of divisor)
fmod(-7, 3) = -1 (sign of dividend / trunc remainder)

Backward: torch.floor_divide has no derivative — verified live grad_fn=<NotImplemented object> raises derivative for aten::floor_divide is not implemented. FloorDivideBackward mirrors that by erroring on .backward().

The non-test production consumer wiring for arithmetic::floor_divide per R-DEFER-1: this method is the public, chainable surface that closes the consumer requirement.

Source

pub fn addcmul_t( &self, tensor1: &Tensor<T>, tensor2: &Tensor<T>, value: f64, ) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.addcmul(tensor1, tensor2, *, value=1) — fused self + value * tensor1 * tensor2 (receiver is input).

Mirrors torch.addcmul(input, tensor1, tensor2, *, value=1, out=None) per torch/_torch_docs.py:510-544:

Performs the element-wise multiplication of :attr:tensor1 by :attr:tensor2, multiplies the result by the scalar :attr:value and adds it to :attr:input.

.. math:: \text{out}_i = \text{input}_i + \text{value} \times \text{tensor1}_i \times \text{tensor2}_i

Upstream C++ entry at aten/src/ATen/native/PointwiseOps.cpp:57-64 TORCH_IMPL_FUNC(addcmul_out). Registration at torch/overrides.py:462 torch.addcmul: lambda input, tensor1, tensor2, value=1, out=None: -1.

Broadcasting: the 3 input tensors (self, tensor1, tensor2) are jointly broadcast to a common output shape. Backward: per tools/autograd/derivatives.yaml, d_input = grad, d_tensor1 = grad * value * tensor2, d_tensor2 = grad * value * tensor1 (no gradient with respect to the scalar value).

The non-test production consumer wiring for arithmetic::addcmul per R-DEFER-1: this method is the public, chainable surface that closes the consumer requirement.

Source

pub fn addcdiv_t( &self, tensor1: &Tensor<T>, tensor2: &Tensor<T>, value: f64, ) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.addcdiv(tensor1, tensor2, *, value=1) — fused self + value * tensor1 / tensor2 (receiver is input).

Mirrors torch.addcdiv(input, tensor1, tensor2, *, value=1, out=None) per torch/_torch_docs.py:461-473:

Performs the element-wise division of :attr:tensor1 by :attr:tensor2, multiplies the result by the scalar :attr:value and adds it to :attr:input.

.. math:: \text{out}_i = \text{input}_i + \text{value} \times \frac{\text{tensor1}_i}{\text{tensor2}_i}

Upstream C++ entry at aten/src/ATen/native/PointwiseOps.cpp:66-73 TORCH_IMPL_FUNC(addcdiv_out). The integer-dtype deprecation block at PointwiseOps.cpp:38-50 TORCH_META_FUNC(addcdiv) is unreachable for the Tensor<T: Float> family.

Broadcasting: the 3 input tensors (self, tensor1, tensor2) are jointly broadcast to a common output shape. Backward: per tools/autograd/derivatives.yaml, d_input = grad, d_tensor1 = grad * value / tensor2, d_tensor2 = -grad * value * tensor1 / (tensor2 * tensor2) (no gradient with respect to the scalar value). At tensor2=0 the d_tensor2 path produces NaN / ±Inf via IEEE-754 — matches upstream (R-DEV-1).

The non-test production consumer wiring for arithmetic::addcdiv per R-DEFER-1: this method is the public, chainable surface that closes the consumer requirement.

Source

pub fn cumsum_t(&self, dim: i64) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.cumsum(dim) — cumulative sum along dim.

Mirrors torch.cumsum(input, dim, *, dtype=None, out=None) per torch/_torch_docs.py:3429 cumsum(input, dim, *, dtype=None, out=None) -> Tensor and the torch.Tensor method docstring at torch/_tensor_docs.py:1500-1506 add_docstr_all("cumsum", r""" cumsum(dim, dtype=None) -> Tensor [...] See :func:torch.cumsum``. Upstream C++ entry at aten/src/ATen/native/ReduceOps.cpp:511 TORCH_IMPL_FUNC(cumsum_out) dispatching cumsum_stub. Autograd VJP per tools/autograd/derivatives.yaml:529-531 (name: cumsum( Tensor self, int dim, *, ScalarType? dtype=None) -> Tensor; self: cumsum_backward(grad.to(self.scalar_type()), dim)) which is the reverse_cumsum (flip → cumsum → flip) upper-triangular multiplication at ReduceOps.cpp:527-529 static Tensor reversed_cumsum(const Tensor& w, int64_t dim).

ferrotorch does NOT accept the dtype kwarg (the dtype-promotion branch at ReduceOps.cpp:267 is unreachable for the Tensor<T: Float> family — see .design/ferrotorch-core/grad_fns/ cumulative.md REQ-1).

The non-test production consumer wiring for grad_fns::cumulative::cumsum per R-DEFER-1: this method is the public, chainable surface that closes the consumer requirement (blocker #1232).

Source

pub fn cumprod_t(&self, dim: i64) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.cumprod(dim) — cumulative product along dim.

Mirrors torch.cumprod(input, dim, *, dtype=None, out=None) per torch/_torch_docs.py:3390 cumprod(input, dim, *, dtype=None, out=None) -> Tensor and the torch.Tensor method docstring at torch/_tensor_docs.py:1482-1488 add_docstr_all("cumprod", r""" cumprod(dim, dtype=None) -> Tensor [...] See :func:torch.cumprod. Upstream C++ entry at aten/src/ATen/native/ReduceOps.cpp:519 TORCH_IMPL_FUNC(cumprod_out). Autograd VJP per tools/autograd/derivatives.yaml:525-527 (name: cumprod(Tensor self, int dim, *, ScalarType? dtype=None) -> Tensor; self: cumprod_backward(grad.to(self.scalar_type()), self, dim, result))routing throughcumprod_backwardatReduceOps.cpp:531-790` with the zeros-aware reverse-cumsum-divide algorithm.

ferrotorch does NOT accept the dtype kwarg; the zeros-present path uses an O(n^3) brute-force backward rather than upstream’s composite-compliance masked-fill (numerically identical, slower, not second-order-differentiable — see .design/ferrotorch-core/grad_fns/cumulative.md REQ-2).

The non-test production consumer wiring for grad_fns::cumulative::cumprod per R-DEFER-1: this method is the public, chainable surface that closes the consumer requirement (blocker #1232).

Source

pub fn logcumsumexp_t(&self, dim: i64) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.logcumsumexp(dim) — numerically stable log(cumsum(exp(self))) along dim.

Mirrors torch.logcumsumexp(input, dim, *, out=None) per torch/_torch_docs.py:3298 logcumsumexp(input, dim, *, out=None) -> Tensor and the torch.Tensor method docstring at torch/_tensor_docs.py:1455-1462 add_docstr_all("logcumsumexp", r""" logcumsumexp(dim) -> Tensor [...] See :func:torch.logcumsumexp``. Upstream C++ entry at aten/src/ATen/native/ReduceOps.cpp:475 Tensor logcumsumexp(const Tensor& self, int64_t dim) dispatching _logcumsumexp_cpu at :465-468 → logcumsumexp_stub at :471. Autograd VJP per tools/autograd/derivatives.yaml:521-523 (name: logcumsumexp( Tensor self, int dim) -> Tensor; self: logcumsumexp_backward(grad, self, result, dim)) factors as grad_input[i] = exp(input[i]) * reverse_cumsum(grad_output * exp(-output)) (softmax-weighted reverse cumsum).

The numerical-stability invariant (large inputs ~1000.0 stay finite) is preserved by the two-pass max-rescaling forward algorithm at ops/cumulative.rs:378-410. See .design/ferrotorch-core/grad_fns/cumulative.md REQ-5.

The non-test production consumer wiring for grad_fns::cumulative::logcumsumexp per R-DEFER-1: this method is the public, chainable surface that closes the consumer requirement (blocker #1232).

Source

pub fn exp_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn log_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn sin_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn cos_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn clamp_t(&self, min: T, max: T) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn clip_t(&self, min: T, max: T) -> Result<Tensor<T>, FerrotorchError>

clip is a literal alias of clamp per upstream aten/src/ATen/native/TensorCompare.cpp:918-930 Tensor clip(...) (pass-through to at::clamp(self, min, max)).

Source

pub fn tan_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn asin_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn acos_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn atan_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn sinh_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn cosh_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn asinh_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn acosh_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn atanh_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn exp2_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn expm1_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn log2_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn log10_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn log1p_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn ceil_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn floor_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn round_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn trunc_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn frac_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn sign_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn sinc_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn relu(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn sigmoid(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn tanh_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn gelu(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn gelu_with( &self, approximate: GeluApproximate, ) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn silu(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn softmax(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn log_softmax(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn threshold_t( &self, threshold: f64, value: f64, ) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.threshold(threshold, value) — replace each element below (or equal to) threshold with value, leave the rest unchanged.

Mirrors torch.nn.functional.threshold(input, threshold, value) per torch/nn/functional.py:1682-1700 and TORCH_IMPL_FUNC(threshold_out) at aten/src/ATen/native/Activation.cpp:688-690. The non-test production consumer wiring for grad_fns::activation::threshold per R-DEFER-1: this method is the public, chainable surface that closes the consumer requirement (closes #1341 REQ-19).

Source

pub fn rrelu_t( &self, lower: f64, upper: f64, training: bool, ) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.rrelu(lower, upper, training) — randomized leaky ReLU.

Mirrors torch.nn.functional.rrelu(input, lower, upper, training, inplace) per torch/nn/functional.py:1962-1989 and Tensor& rrelu_with_noise_out_cpu(...) at aten/src/ATen/native/Activation.cpp:611-654. The non-test production consumer wiring for grad_fns::activation::rrelu per R-DEFER-1: this method is the public, chainable surface that closes the consumer requirement (closes #1341 REQ-20).

Note: training=true falls back to the deterministic mean-slope inference path (per the GradFn docs at activation.rs). The RNG-stateful training-mode VJP is a separately-tracked follow-up.

Source

pub fn celu_t(&self, alpha: f64) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.celu(alpha) — celu(x) = max(0, x) + min(0, alpha * (exp(x / alpha) - 1)).

Mirrors torch.nn.functional.celu(input, alpha=1.0) per torch/nn/functional.py:1874-1894 and Tensor celu(const Tensor& self, const Scalar& alpha) at aten/src/ATen/native/Activation.cpp:540-545. The non-test production consumer wiring for grad_fns::activation::celu per R-DEFER-1: this method is the public, chainable surface that closes the consumer requirement (closes #1341 REQ-21).

Source

pub fn softmin_t(&self) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.softmin() — softmin(x) = softmax(-x) along the last axis (fused single-GradFn variant).

Mirrors torch.nn.functional.softmin(input, dim=None, dtype=None) per torch/nn/functional.py:2095-2125. The non-test production consumer wiring for grad_fns::activation::softmin per R-DEFER-1: this method is the public, chainable surface that closes the consumer requirement (closes #1341 REQ-22). The composition-route variant (ferrotorch_nn::functional::softmin = neg -> softmax, two GradFn nodes) remains available; this method routes through the fused VJP.

Source

pub fn sum_all(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn mean_all(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn prod_all(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn amin(&self) -> Result<Tensor<T>, FerrotorchError>

Global minimum across all elements. Mirrors torch.amin(self) with no dim argument. Returns a 0-d tensor. On CUDA f32/f64, dispatches to the native PTX reduce_min kernel; on CPU walks the buffer. (#627)

Source

pub fn amax(&self) -> Result<Tensor<T>, FerrotorchError>

Global maximum across all elements. Mirrors torch.amax(self). (#627)

Source

pub fn lu_factor(&self) -> Result<(Tensor<T>, Vec<i32>), FerrotorchError>

LU factorization in cuSOLVER’s packed form: returns (LU_packed, pivots). Mirrors torch.linalg.lu_factor. On CUDA f32/f64, runs natively via cuSOLVER getrf with no host bounce for the matrix; pivots come back as a host Vec<i32> (O(n)). (#604)

Source

pub fn matmul(&self, other: &Tensor<T>) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn mm(&self, other: &Tensor<T>) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn mm_bt(&self, other: &Tensor<T>) -> Result<Tensor<T>, FerrotorchError>

Fused A @ B^T — avoids materializing the transpose of B. A: [M, K], B: [N, K] -> [M, N].

Source

pub fn bmm(&self, other: &Tensor<T>) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn mv_t(&self, other: &Tensor<T>) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn dot_t(&self, other: &Tensor<T>) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn einsum( &self, equation: &str, others: &[&Tensor<T>], ) -> Result<Tensor<T>, FerrotorchError>

Einstein summation with this tensor as the first operand.

others contains the remaining input tensors (if any). The equation must include subscripts for self followed by the others.

// Matrix multiply: self @ other
let c = a.einsum("ij,jk->ik", &[&b])?;

// Trace of self
let t = a.einsum("ii->", &[])?;

Source

pub fn sum_dim( &self, dim: i64, keepdim: bool, ) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn mean_dim( &self, dim: i64, keepdim: bool, ) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn logsumexp_t(&self) -> Result<Tensor<T>, FerrotorchError>

Differentiable full-reduction logsumexp. Mirrors torch.logsumexp(self) — numerically stable log(sum(exp(self))) to a 0-D scalar. Backward grad * exp(self - result). Closes #1310.

Source

pub fn logsumexp_dim_t( &self, dim: i64, keepdim: bool, ) -> Result<Tensor<T>, FerrotorchError>

Differentiable dim-keyed logsumexp. Mirrors torch.logsumexp(self, dim, keepdim).

Source

pub fn argmax_t(&self) -> Result<IntTensor<i64>, FerrotorchError>

Non-differentiable global argmax. Mirrors torch.argmax(self). Returns a 0-D IntTensor with the flat index of the largest element. Closes #1304 (argmax).

Source

pub fn argmax_dim_t( &self, dim: i64, keepdim: bool, ) -> Result<IntTensor<i64>, FerrotorchError>

Non-differentiable dim-keyed argmax.

Source

pub fn argmin_t(&self) -> Result<IntTensor<i64>, FerrotorchError>

Non-differentiable global argmin. Mirrors torch.argmin(self).

Source

pub fn argmin_dim_t( &self, dim: i64, keepdim: bool, ) -> Result<IntTensor<i64>, FerrotorchError>

Non-differentiable dim-keyed argmin.

Source

pub fn var_t(&self, unbiased: bool) -> Result<Tensor<T>, FerrotorchError>

Differentiable full-reduction variance with optional Bessel correction. unbiased=true divides by n-1; false divides by n. Closes #1301 (var).

Source

pub fn std_t(&self, unbiased: bool) -> Result<Tensor<T>, FerrotorchError>

Differentiable full-reduction standard deviation. Closes #1301 (std).

Source

pub fn var_with_correction_t( &self, correction: f64, ) -> Result<Tensor<T>, FerrotorchError>

Differentiable full-reduction variance with arbitrary Bessel correction. Mirrors torch.var(input, correction=...) — denom = max(0, n - correction). Closes #1346 (audit 7cef63f88 REQ-8 full-reduction correction-API gap).

Source

pub fn std_with_correction_t( &self, correction: f64, ) -> Result<Tensor<T>, FerrotorchError>

Differentiable full-reduction standard deviation with arbitrary correction. Mirrors torch.std(input, correction=...). Closes #1346 (audit 7cef63f88 REQ-8 full-reduction correction-API gap).

Source

pub fn any_t(&self) -> Result<BoolTensor, FerrotorchError>

Non-differentiable full-reduction any. Returns a 0-D BoolTensor holding true iff any element is non-zero. Closes #1312 (any).

Source

pub fn all_t(&self) -> Result<BoolTensor, FerrotorchError>

Non-differentiable full-reduction all. Closes #1312 (all).

Source

pub fn count_nonzero_t(&self) -> Result<IntTensor<i64>, FerrotorchError>

Non-differentiable full-reduction count_nonzero. Returns a 0-D IntTensor with the count of non-zero elements. Closes #1312 (count_nonzero).

Source

pub fn reshape_t(&self, shape: &[isize]) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn flatten_t(&self) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn squeeze_t(&self, axis: isize) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn unsqueeze_t(&self, axis: isize) -> Result<Tensor<T>, FerrotorchError>

Source

pub fn permute(&self, dims: &[usize]) -> Result<Tensor<T>, FerrotorchError>

Permute tensor dimensions. Like PyTorch’s tensor.permute(dims).

Zero-copy: returns a view with permuted shape and strides. dims must be a valid permutation of 0..ndim.

Source

pub fn transpose( &self, dim0: usize, dim1: usize, ) -> Result<Tensor<T>, FerrotorchError>

Swap two dimensions. Like PyTorch’s tensor.transpose(dim0, dim1).

Zero-copy: returns a view with swapped strides.

Source

pub fn swapaxes( &self, axis0: usize, axis1: usize, ) -> Result<Tensor<T>, FerrotorchError>

Swap two axes. Like PyTorch’s tensor.swapaxes(axis0, axis1) — a literal alias of transpose per upstream aten/src/ATen/native/TensorShape.cpp:4776.

Source

pub fn swapdims( &self, dim0: usize, dim1: usize, ) -> Result<Tensor<T>, FerrotorchError>

Swap two dims. Like PyTorch’s tensor.swapdims(dim0, dim1) — a literal alias of transpose per upstream aten/src/ATen/native/TensorShape.cpp:4784.

Source

pub fn unflatten_t( &self, dim: isize, sizes: &[isize], ) -> Result<Tensor<T>, FerrotorchError>

Reshape a single dimension dim into multiple sizes. Like PyTorch’s tensor.unflatten(dim, sizes) per upstream aten/src/ATen/native/TensorShape.cpp:4350. At most one -1 inference slot is allowed in sizes.

Source

pub fn expand_as_t( &self, other: &Tensor<T>, ) -> Result<Tensor<T>, FerrotorchError>

Broadcast this tensor to the shape of other. Like PyTorch’s tensor.expand_as(other) per upstream aten/src/ATen/native/TensorShape.cpp:1374.

Source

pub fn flip_t(&self, dims: &[isize]) -> Result<Tensor<T>, FerrotorchError>

Reverse element order along each axis in dims. Like PyTorch’s torch.flip(input, dims) per upstream aten/src/ATen/native/TensorTransformations.cpp:36.

Source

pub fn fliplr_t(&self) -> Result<Tensor<T>, FerrotorchError>

Flip left-to-right (along dim 1). Like PyTorch’s torch.fliplr per upstream aten/src/ATen/native/TensorTransformations.cpp:180.

Source

pub fn flipud_t(&self) -> Result<Tensor<T>, FerrotorchError>

Flip up-to-down (along dim 0). Like PyTorch’s torch.flipud per upstream aten/src/ATen/native/TensorTransformations.cpp:186.

Source

pub fn rot90_t( &self, k: i64, dims: &[isize], ) -> Result<Tensor<T>, FerrotorchError>

Rotate 90° k times in the plane spanned by dims. Like PyTorch’s torch.rot90(input, k, dims) per upstream aten/src/ATen/native/TensorTransformations.cpp:134.

Source

pub fn movedim_t( &self, source: &[isize], destination: &[isize], ) -> Result<Tensor<T>, FerrotorchError>

Reposition dims from source to destination. Like PyTorch’s torch.movedim(input, source, destination) per upstream aten/src/ATen/native/TensorShape.cpp:4657.

Source

pub fn moveaxis_t( &self, source: &[isize], destination: &[isize], ) -> Result<Tensor<T>, FerrotorchError>

Reposition axes from source to destination. Like PyTorch’s torch.moveaxis (an alias of movedim) per upstream aten/src/ATen/native/TensorShape.cpp:4768.

Source

pub fn broadcast_to_t( &self, shape: &[usize], ) -> Result<Tensor<T>, FerrotorchError>

Broadcast this tensor to shape. Like PyTorch’s torch.broadcast_to(input, shape) (an alias of expand) per upstream aten/src/ATen/native/TensorShape.cpp:652.

Source

pub fn repeat_t(&self, repeats: &[isize]) -> Result<Tensor<T>, FerrotorchError>

Tile this tensor repeats[i] times along each axis. Like PyTorch’s tensor.repeat(*repeats) per upstream aten/src/ATen/native/TensorShape.cpp:1909.

Source

pub fn tile_t(&self, reps: &[isize]) -> Result<Tensor<T>, FerrotorchError>

NumPy-style tile. Like PyTorch’s torch.tile(input, reps) per upstream aten/src/ATen/native/TensorShape.cpp:1971.

Source

pub fn repeat_interleave_t( &self, repeats: usize, dim: isize, ) -> Result<Tensor<T>, FerrotorchError>

Repeat each element repeats times consecutively along dim. Like PyTorch’s torch.repeat_interleave(input, repeats, dim).

Source

pub fn unbind_t(&self, dim: isize) -> Result<Vec<Tensor<T>>, FerrotorchError>

Split into size(dim) slices with dim removed. Like PyTorch’s torch.unbind(input, dim) per upstream aten/src/ATen/native/TensorShape.cpp:4367.

Source

pub fn tensor_split_t( &self, indices: &[usize], dim: isize, ) -> Result<Vec<Tensor<T>>, FerrotorchError>

Split at the integer section boundaries indices along dim. Like PyTorch’s torch.tensor_split(input, indices, dim) per upstream aten/src/ATen/native/TensorShape.cpp:1167.

Source

pub fn narrow( &self, dim: usize, start: usize, length: usize, ) -> Result<Tensor<T>, FerrotorchError>

Return a narrowed view along dim starting at start with length elements. Like PyTorch’s tensor.narrow(dim, start, length).

Zero-copy: shares storage with the original tensor.

Source

pub fn view(&self, shape: &[i64]) -> Result<Tensor<T>, FerrotorchError>

View tensor with new shape. Like PyTorch’s tensor.view(shape).

Exactly one dimension may be -1, in which case it is inferred. Requires the tensor to be contiguous.

Source

pub fn contiguous(&self) -> Result<Tensor<T>, FerrotorchError>

Make tensor contiguous — if already contiguous, returns a cheap clone. Otherwise materializes a new contiguous buffer.

Source

pub fn chunk( &self, chunks: usize, dim: usize, ) -> Result<Vec<Tensor<T>>, FerrotorchError>

Split tensor into chunks roughly equal pieces along dim.

Source

pub fn split( &self, split_sizes: &[usize], dim: usize, ) -> Result<Vec<Tensor<T>>, FerrotorchError>

Split tensor into pieces of given sizes along dim.

Source

pub fn fake_quantize_per_tensor_affine_t( &self, scale: f64, zero_point: i64, quant_min: i64, quant_max: i64, ) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.fake_quantize_per_tensor_affine(scale, zero_point, quant_min, quant_max) — per-tensor affine fake quantization with autograd-tracked clipped STE backward.

Mirrors torch.fake_quantize_per_tensor_affine per torch/overrides.py:622 torch.fake_quantize_per_tensor_affine: lambda input, scale, zero_point, quant_min, quant_max: -1 and the upstream implementation at aten/src/ATen/native/quantized/ FakeQuantPerTensorAffine.cpp:31-40 Tensor fake_quantize_per_tensor_affine( const Tensor& self, double scale, int64_t zero_point, int64_t quant_min, int64_t quant_max). Backward per tools/autograd/derivatives.yaml:673-674 fake_quantize_per_tensor_affine_cachemask_backward(grad, mask) returning dY * mask where the mask is 1 iff quant_min <= round_ties_even(input/scale) + zero_point <= quant_max.

The non-test production consumer wiring for grad_fns::quantize_grad::fake_quantize_per_tensor_affine per R-DEFER-1: this method is the public, chainable surface that closes the consumer requirement for the per-tensor variant (blocker #1238).

Source

pub fn fake_quantize_per_channel_affine_t( &self, scale: &Tensor<T>, zero_point: &IntTensor<i64>, axis: i64, quant_min: i64, quant_max: i64, ) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.fake_quantize_per_channel_affine(scale, zero_point, axis, quant_min, quant_max) — per-channel affine fake quantization with autograd-tracked clipped STE backward.

Mirrors torch.fake_quantize_per_channel_affine per torch/overrides.py:621 torch.fake_quantize_per_channel_affine: lambda input, scale, zero_point, axis, quant_min, quant_max: -1 and the upstream implementation at aten/src/ATen/native/quantized/ FakeQuantPerChannelAffine.cpp:32-42 Tensor fake_quantize_per_channel_affine( const Tensor& self, const Tensor& scale, const Tensor& zero_point, int64_t axis, int64_t quant_min, int64_t quant_max). Backward per tools/autograd/derivatives.yaml fake_quantize_per_channel_affine_cachemask_backward( grad, mask) returning dY * mask where the per-channel mask is 1 iff quant_min <= round_ties_even(input/scale[c]) + zero_point[c] <= quant_max for the channel c along axis.

The non-test production consumer wiring for grad_fns::quantize_grad::fake_quantize_per_channel_affine per R-DEFER-1: this method is the public, chainable surface that closes the consumer requirement for the per-channel variant (blocker #1239).

Source

pub fn index_fill_t( &self, dim: i64, index: &IntTensor<i64>, value: f64, ) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.index_fill(dim, index, value) — overwrite slices along dim at index positions with the scalar value.

Mirrors torch.index_fill(input, dim, index, value) per the upstream docstring at torch/_torch_docs.py:6563-6567 index_fill(dim, index, value) -> Tensor [...] Out-of-place version of :meth:torch.Tensor. index_fill_`` and torch/_tensor_docs.py:2489-2509 which gives the canonical example

>>> x = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=torch.float)
>>> index = torch.tensor([0, 2])
>>> x.index_fill_(1, index, -1)
tensor([[-1.,  2., -1.],
        [-1.,  5., -1.],
        [-1.,  8., -1.]])

Upstream C++ entry at aten/src/ATen/native/TensorAdvancedIndexing.cpp: 1979 Tensor index_fill(const Tensor& self, int64_t dim, const Tensor& index, const Scalar& source) { return self.clone(at::MemoryFormat:: Preserve).index_fill_(dim, index, source); }. Registration at torch/overrides.py:710 torch.index_fill: lambda input, dim, index, value: -1.

Backward per tools/autograd/derivatives.yaml:884-887: - name: index_fill.int_Scalar(Tensor self, int dim, Tensor index, Scalar value) -> Tensor / self: grad.index_fill(dim, index, 0) / index: non_differentiable / result: self_t.index_fill(dim, index, 0) — gradient is zeroed at every position the fill overwrote (those positions were replaced by a constant and no longer depend on the input).

dim follows PyTorch’s negative-wrapping convention (at::maybe_wrap_dim at TensorAdvancedIndexing.cpp:1919). The index tensor must be 1-D or scalar (upstream TORCH_CHECK(index.dim() <= 1) at :1920). Negative index values are accepted and wrapped per upstream’s index_fill_kernel at aten/src/ATen/native/cpu/IndexKernel.cpp: 224-229 (TORCH_CHECK_INDEX(idx >= -self_dim_size && idx < self_dim_size, ...); if (idx < 0) { idx += self_dim_size; }). Indices strictly outside [-dim_size, dim_size) raise IndexOutOfBounds matching upstream’s TORCH_CHECK_INDEX. A 0-d input is accepted: the implementation mirrors upstream’s self.unsqueeze(-1) at TensorAdvancedIndexing.cpp:1917 by treating the scalar as a length-1 1-d tensor for the fill (only dim ∈ {-1, 0} and index ∈ {-1, 0} are in range for that case).

The non-test production consumer wiring for grad_fns::indexing:: index_fill per R-DEFER-1: this method is the public, chainable surface that closes the consumer requirement (blocker #1249).

Source

pub fn scatter_reduce_t( &self, dim: i64, index: &[usize], index_shape: &[usize], src: &Tensor<T>, reduce: &str, include_self: bool, ) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.scatter_reduce(dim, index, src, reduce, *, include_self=True) — reduce-mode scatter onto a clone of self. Mirrors upstream Tensor scatter_reduce(...) at aten/src/ATen/native/ TensorAdvancedIndexing.cpp:2354 TORCH_IMPL_FUNC(scatter_reduce_two). reduce ∈ {"sum" SHIPPED, "prod", "amax", "amin"}; backward is implemented only for "sum" per tools/autograd/derivatives.yaml: 3074-3077 (other modes return a no-grad tensor — the op_db characterization sweep emits only "sum").

Non-test production consumer wiring for grad_fns::indexing:: scatter_reduce per R-DEFER-1: this method is the chainable surface. Closes blocker #1245.

Source

pub fn index_add_t( &self, dim: i64, index: &IntTensor<i64>, source: &Tensor<T>, alpha: f64, ) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.index_add(dim, index, source, *, alpha=1) — out = self.clone(); out[..., index[i], ...] += alpha * source[..., i, ...] along dim. Mirrors upstream Tensor index_add(const Tensor& self, int64_t dim, const Tensor& index, const Tensor& source, const Scalar& alpha) at aten/src/ATen/native/TensorAdvancedIndexing.cpp:1153 TORCH_IMPL_FUNC(index_add_cpu_out). Backward per tools/autograd/derivatives.yaml:862-869 self: grad / source: maybe_multiply(grad.index_select(dim, index).expand_as(source), alpha).

Non-test production consumer wiring for grad_fns::indexing:: index_add per R-DEFER-1: this method is the chainable surface. Closes blocker #1247.

Source

pub fn index_copy_t( &self, dim: i64, index: &IntTensor<i64>, source: &Tensor<T>, ) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.index_copy(dim, index, source) — out = self.clone(); out[..., index[i], ...] = source[..., i, ...] along dim. Mirrors upstream Tensor index_copy(...) at aten/src/ATen/native/ TensorAdvancedIndexing.cpp:1082 TORCH_IMPL_FUNC(index_copy_out). Backward per tools/autograd/derivatives.yaml:875-883 self: grad.index_fill(dim, index, 0) / source: grad.index_select(dim, index).expand_as(source).

Non-test production consumer wiring for grad_fns::indexing:: index_copy per R-DEFER-1: this method is the chainable surface. Closes blocker #1248.

Source

pub fn masked_scatter_t( &self, mask: &BoolTensor, source: &Tensor<T>, ) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.masked_scatter(mask, source) — copy elements from source into a clone of self at positions where mask is true, in C-order. Mirrors upstream Tensor masked_scatter(const Tensor& self, const Tensor& mask, const Tensor& source) at aten/src/ATen/native/TensorAdvancedIndexing.cpp:2402-2409. Backward per tools/autograd/derivatives.yaml:1105-1108 self: grad.masked_fill(mask, 0) / source: masked_scatter_backward(...).

Non-test production consumer wiring for grad_fns::indexing:: masked_scatter per R-DEFER-1: this method is the chainable surface. Closes blocker #1252.

Source

pub fn take_t( &self, index: &IntTensor<i64>, ) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.take(index) — out[i] = self.view(-1)[index[i]], a flat-index gather producing a tensor of shape index.shape(). Mirrors upstream Tensor take(const Tensor& self, const Tensor& index) at aten/src/ATen/native/TensorAdvancedIndexing.cpp:1067-1071. Backward per tools/autograd/derivatives.yaml:1766-1769 self: take_backward(grad, self, index) — scatter-add grad into a zeros buffer at the flat index positions.

Non-test production consumer wiring for grad_fns::indexing::take per R-DEFER-1: this method is the chainable surface. Closes blocker #1253.

Source

pub fn put_t( &self, index: &IntTensor<i64>, source: &Tensor<T>, accumulate: bool, ) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.put(index, source, accumulate=False) — flat-index scatter into a clone of self: out.view(-1)[index[i]] = source[i] (or += source[i] when accumulate=true). Mirrors upstream Tensor put(const Tensor& self, const Tensor& index, const Tensor& source, const bool accumulate) at aten/src/ATen/native/ TensorAdvancedIndexing.cpp:928-934. Backward per tools/autograd/derivatives.yaml:1421-1424.

Non-test production consumer wiring for grad_fns::indexing::put per R-DEFER-1: this method is the chainable surface. Closes blocker #1254.

Source

pub fn where_t( &self, condition: &[bool], other: &Tensor<T>, ) -> Result<Tensor<T>, FerrotorchError>

torch.where(condition, self, other) — pointwise ternary selection taking a host &[bool] mask. Returns a tensor where each element is self[i] if condition[i] is true, else other[i]. Differentiable — a WhereBackward node is attached when grad tracking is enabled on either input.

Mirrors torch.where(condition, input, other) per torch/_torch_docs.py:13089 and the upstream impl macro at aten/src/ATen/native/TensorCompare.cpp:646 TORCH_IMPL_FUNC(where_out) — the self-vs-other dispatch shape.

Non-test production consumer wiring for grad_fns::comparison::where_ per R-DEFER-1 (closes blocker #1295): this method is the public, chainable surface that closes the consumer requirement. The boolean-tensor variant is where_bt_t.

Source

pub fn where_bt_t( &self, condition: &BoolTensor, other: &Tensor<T>, ) -> Result<Tensor<T>, FerrotorchError>

torch.where(condition, self, other) — BoolTensor overload.

Pointwise ternary selection where condition is a first-class BoolTensor. The condition must match self.numel() and self.shape() == other.shape(). Delegates to grad_fns::comparison::where_bt which validates shape + materialises the host mask and dispatches to where_ for autograd-aware forward.

Mirrors torch.where(cond, x, y) for cond: BoolTensor per torch/_torch_docs.py:13089.

Non-test production consumer wiring for grad_fns::comparison::where_bt per R-DEFER-1 (closes blocker #1297): this method is the public, chainable surface that closes the consumer requirement.

Source

pub fn scatter_value_t( &self, dim: i64, index: &[usize], index_shape: &[usize], value: T, ) -> Result<Tensor<T>, FerrotorchError>

torch.Tensor.scatter_(dim, index, value) (scalar-src overload) — scatter a single scalar value into a clone of self at the positions named by index along dim. Mirrors the upstream scalar overload Tensor& scatter_(int64_t dim, const Tensor& index, const Scalar& value) at aten/src/ATen/native/TensorAdvancedIndexing.cpp:2278 — the scatter.value dispatch arm that op_db emits as a distinct sample family alongside the tensor-src overload.

Equivalent to self.scatter_(dim, index, full_like(index, value)) but avoids the temporary src allocation. No autograd is attached because the scalar value is not a differentiable input.

Non-test production consumer wiring for crate::ops::indexing::scatter_value per R-DEFER-1 (closes blocker #1258): this method is the public, chainable surface that closes the consumer requirement.

Source

pub fn size(&self) -> &[usize]

Alias for shape(). Returns the tensor dimensions like PyTorch’s Tensor.size().

Source

pub fn dim(&self) -> usize

Alias for ndim(). Returns the number of dimensions like PyTorch’s Tensor.dim().

Source

pub fn print(&self) -> &Tensor<T>

Log the tensor’s Display form and return self for chaining.

Emits a tracing::info! event on target ferrotorch::tensor. Behaviour change vs. earlier versions: this no longer writes directly to stdout — callers must install a tracing subscriber (e.g. tracing_subscriber) to see the output. Library code should not write to stdout; downstream consumers control logging policy.

Source

pub fn argmax( &self, dim: Option<isize>, ) -> Result<IntTensor<i64>, FerrotorchError>

Index of the maximum value (PyTorch torch.argmax), as IntTensor<i64>.

dim = None flattens and returns a 0-d index. dim = Some(d) reduces along d (negative indices allowed). Ties resolve to the FIRST (lowest) index. GPU-resident result when self is on CUDA.

Source

pub fn argmin( &self, dim: Option<isize>, ) -> Result<IntTensor<i64>, FerrotorchError>

Index of the minimum value (PyTorch torch.argmin). See Self::argmax.

Source

pub fn index_select( &self, dim: isize, indices: &IntTensor, ) -> Result<Tensor<T>, FerrotorchError>
where I: IntElement,

index_select(dim, indices) (PyTorch torch.index_select) using a GPU-resident-or-CPU IntTensor index. The indices tensor must be 1-D. Output keeps self’s dtype; shape is self.shape with shape[dim] replaced by indices.numel(). On CUDA, self and indices must be on the same device; the result stays GPU-resident.

Source

pub fn gather( &self, dim: isize, index: &IntTensor, ) -> Result<Tensor<T>, FerrotorchError>
where I: IntElement,

gather(dim, index) (PyTorch torch.gather) using a GPU-resident-or-CPU IntTensor index. index must have the same ndim as self; output has index’s shape and self’s dtype. On CUDA the result stays resident.

Source

pub fn to_int(&self) -> Result<IntTensor, FerrotorchError>
where I: IntElement,

Cast this float tensor to IntTensor (PyTorch .to(int)): truncate toward zero. GPU-resident result when self is on CUDA.

Source

pub fn as_strided( &self, size: &[usize], stride: &[isize], storage_offset: Option<usize>, ) -> Result<Tensor<T>, FerrotorchError>

Build a zero-copy view with the given shape, strides (element units), and storage offset. If storage_offset is None, the input’s existing offset is used.

Equivalent to torch.Tensor.as_strided(size, stride, storage_offset). Works on any device — no data movement.

Validates that every reachable offset stays inside the underlying storage. Does not reject overlapping views: those are useful for constructing Toeplitz matrices, sliding windows, broadcast views, etc. As in torch, in-place writes against an overlapping view have undefined behaviour.

Source

pub fn as_strided_copy( &self, size: &[usize], stride: &[isize], storage_offset: Option<usize>, ) -> Result<Tensor<T>, FerrotorchError>

Materialised strided copy: returns a new contiguous tensor whose values are the elements that as_strided(size, stride, offset) would read.

On CUDA tensors this dispatches to the existing strided_copy_f32 / strided_copy_f64 GPU kernels (no host bounce). On CPU it walks the multi-index. On other devices (e.g. XPU) it returns FerrotorchError::NotImplementedOnCuda — install a kernel before using this on those devices.

Source

pub fn as_strided_scatter( &self, src: &Tensor<T>, size: &[usize], stride: &[isize], storage_offset: Option<usize>, ) -> Result<Tensor<T>, FerrotorchError>

Inverse of as_strided: return a copy of self with src written into the strided positions described by (size, stride, offset). Positions outside that view retain self’s values.

Equivalent to torch.as_strided_scatter. The CUDA path dispatches through the GPU backend (via the strided_copy + strided_scatter kernels) — no host bounce.

Source

pub fn view_reshape( &self, new_shape: Vec<usize>, ) -> Result<Tensor<T>, FerrotorchError>

Create a view of this tensor with a different shape, sharing the same underlying storage. Zero-copy — no data movement.

The new shape must have the same total number of elements. Non-contiguous tensors are materialized first (requires a copy).

Source

pub fn view_operation( &self, new_shape: Vec<usize>, grad_fn: Arc<dyn GradFn<T>>, ) -> Result<Tensor<T>, FerrotorchError>

Create a zero-copy view with a grad_fn attached. Used for shape ops (squeeze, unsqueeze, reshape, etc.) that don’t change data layout. Shares the underlying storage with the source tensor.

Non-contiguous tensors are materialized first (requires a copy).

Source

pub fn stride_view( &self, new_shape: Vec<usize>, new_strides: Vec<isize>, new_offset: usize, ) -> Tensor<T>

Create a zero-copy view with explicit shape, strides, and offset.

This is the lowest-level view constructor — used by permute, transpose, narrow, and other operations that change the logical layout without copying data. The caller is responsible for ensuring that the given shape + strides + offset are valid for the underlying storage.

Source

pub fn stride_view_operation( &self, new_shape: Vec<usize>, new_strides: Vec<isize>, new_offset: usize, grad_fn: Arc<dyn GradFn<T>>, ) -> Tensor<T>

Create a zero-copy view with explicit shape, strides, and offset, with an attached gradient function for autograd.

Source

pub fn id(&self) -> TensorId

Source

pub fn shape(&self) -> &[usize]

Source

pub fn ndim(&self) -> usize

Source

pub fn numel(&self) -> usize

Source

pub fn strides(&self) -> &[isize]

Source

pub fn storage_offset(&self) -> usize

Offset (in number of elements) into the underlying storage.

Non-zero for views created by narrow, select, or other subregion ops.

Source

pub fn storage_len(&self) -> usize

Number of elements in the underlying storage buffer.

May be larger than numel() for views (transpose, narrow, as_strided, etc.) that address only a subset of the storage. Used by stride-manipulation ops (as_strided, as_strided_copy) for bounds validation.

Source

pub fn storage(&self) -> &TensorStorage<T>

Borrow the underlying TensorStorage. Used by ops that need access to the GPU buffer handle or to share storage Arc-wise.

Source

pub fn device(&self) -> Device

Source

pub fn requires_grad(&self) -> bool

Source

pub fn is_leaf(&self) -> bool

Source

pub fn grad_fn(&self) -> Option<&Arc<dyn GradFn<T>>>

Source

pub fn register_hook<F>(&self, func: F) -> Result<HookHandle, FerrotorchError>
where F: Fn(&Tensor<T>) -> Option<Tensor<T>> + Send + Sync + 'static,

Register a gradient hook on this tensor.

The hook is called during backward whenever a gradient is computed for this tensor. It receives the gradient and may return Some(new_grad) to replace it, or None to keep the original.

Returns a HookHandle that can be used to remove the hook later via remove_hook.

Source

pub fn register_post_accumulate_grad_hook<F>( &self, func: F, ) -> Result<HookHandle, FerrotorchError>
where F: Fn(&Tensor<T>) + Send + Sync + 'static,

Register a post-accumulate-grad hook on this tensor.

The hook is called after gradient accumulation completes on a leaf tensor. It receives a reference to the tensor itself (so the hook can read .grad()). Cannot modify the gradient — use register_hook for that.

Source

pub fn remove_hook(&self, handle: HookHandle) -> Result<bool, FerrotorchError>

Remove a previously registered hook by its handle.

Returns true if the hook was found and removed.

Source

pub fn grad(&self) -> Result<Option<Tensor<T>>, FerrotorchError>

Read the accumulated gradient. Returns None if no gradient has been computed yet.

Source

pub fn set_grad(&self, grad: Option<Tensor<T>>) -> Result<(), FerrotorchError>

Set or replace the accumulated gradient.

Source

pub fn zero_grad(&self) -> Result<(), FerrotorchError>

Zero out the gradient of this tensor.

Equivalent to self.set_grad(None). Typically called before each training iteration to prevent gradient accumulation across steps.

Source

pub fn data(&self) -> Result<&[T], FerrotorchError>

Borrow the underlying data as a flat slice.

Returns Err(GpuTensorNotAccessible) if the tensor is on a GPU. Call .cpu() first to transfer it.

Returns Err if the tensor is not contiguous — the raw storage slice would not correspond to the logical element order. Use data_vec() or call .contiguous() first.

Source

pub fn data_ref(&self) -> Result<&[T], FerrotorchError>

Borrow the underlying data as a flat slice (CPU-only alias for data()).

Identical to data() — returns a zero-copy &[T] reference to the tensor’s storage. Returns Err(GpuTensorNotAccessible) if the tensor lives on a GPU; call .cpu() first to transfer.

This alias exists for call-site clarity: use data_ref() when you want to emphasise that no copy is made, vs data_vec() which always copies.

Source

pub fn data_vec(&self) -> Result<Vec<T>, FerrotorchError>

Get tensor data as an owned Vec<T>, transparently transferring from GPU if needed and correctly handling non-contiguous tensors.

For contiguous CPU tensors this copies the slice. For non-contiguous CPU tensors it gathers elements in logical (C-order) sequence. For GPU tensors it performs a device-to-host transfer.

Source

pub fn to(&self, device: Device) -> Result<Tensor<T>, FerrotorchError>

Move this tensor to a device, returning a new tensor.

If the tensor is already on the target device, returns a cheap clone (shared Arc storage).

Source

pub fn to_pinned(&self, device: Device) -> Result<Tensor<T>, FerrotorchError>

Like to, but uses pinned (page-locked) host memory for the CPU→CUDA transfer when applicable.

On CPU→CUDA, allocates a temporary pinned host buffer, copies the tensor data into it, and uses DMA to transfer to the device. This is roughly 2x faster than the regular to() path for large buffers because it avoids one extra page-locked staging copy inside the CUDA driver. For small buffers (< ~64KB) the pinning overhead may outweigh the gain — measure before defaulting to this path.

Behaves identically to to for CPU→CPU, CUDA→CPU, and cross-GPU paths (which all bypass pinned memory).

Used by ferrotorch_data::DataLoader when pin_memory(true) is set alongside a target device.

Source

pub fn cuda(&self) -> Result<Tensor<T>, FerrotorchError>

Move to CUDA device 0.

Source

pub fn cpu(&self) -> Result<Tensor<T>, FerrotorchError>

Move to CPU.

Source

pub fn to_dtype(&self) -> Result<Tensor, FerrotorchError>
where U: Float,

Cast this tensor to a different float dtype, preserving device + shape.

U: Float — any of f32 / f64 / bf16 / f16. PyTorch parity: tensor.to(dtype) / tensor.to(torch.float32).

Same dtype (T == U): zero-copy Arc-shared clone.
CPU: per-element cast via crate::numeric_cast::cast (fallible — returns Err(InvalidArgument) if a finite source value saturates to ±∞ in a narrower target, per issue #815).
GPU: dispatched through crate::gpu_dispatch::GpuBackend::cast_f_to_f; stays GPU-resident. Initial implementation covers bf16 ↔ f32 (issue #29); other float pairs return Err until the follow-up issue lands.

§Autograd

The returned tensor has requires_grad = false regardless of self. A CastBackward grad_fn that propagates gradients through the cast is follow-up work tracked alongside the remaining float-pair kernels.

Source

pub fn is_cpu(&self) -> bool

Returns true if this tensor is on CPU.

Source

pub fn is_meta(&self) -> bool

Returns true if this tensor is on the meta device (no backing data).

Source

pub fn meta_fill_value(&self) -> Option<&T>

Recorded fill value for a meta tensor, if it was constructed with one (e.g. via crate::creation::full_meta). Returns None for any non-meta tensor and for meta tensors created without a fill (e.g. via crate::creation::zeros_meta / crate::creation::ones_meta / crate::creation::meta_like).

Meta tensors carry no element-wise data, so the per-element fill cannot be read back — this is metadata only — but it lets callers distinguish a full_meta(shape, 2.5) tensor from a full_meta(shape, 0.0) tensor (or from a plain zeros_meta(shape)), which closes the “_value is silently ignored” gap.

Source

pub fn is_cuda(&self) -> bool

Returns true if this tensor is on a CUDA GPU.

Source

pub fn is_xpu(&self) -> bool

Returns true if this tensor is on an XPU (CubeCL / Intel GPU) device.

Source

pub fn gpu_handle(&self) -> Result<&GpuBufferHandle, FerrotorchError>

Get the GPU buffer handle. Returns Err for CPU tensors.

Source

pub fn masked_fill( &self, mask: &BoolTensor, value: T, ) -> Result<Tensor<T>, FerrotorchError>

masked_fill(mask, value) — out[i] = mask[i] ? value : self[i], returning a new tensor of the same shape (mask convention “true → fill”, matching torch.Tensor.masked_fill). mask must have the same numel as self and live on the same device.

When both self and mask are CUDA-resident, the fill runs on the GPU (real PTX kernel dispatched on self’s dtype) and the result stays GPU-resident — NO host crossing (crosslink #1185 Phase 3c). Otherwise it takes the CPU path. Carries a MaskedFillBackward grad_fn when grad is required.

Source

pub fn masked_select( &self, mask: &BoolTensor, ) -> Result<Tensor<T>, FerrotorchError>

masked_select(mask) — return a 1-D tensor of the elements of self where mask is true, in flat C-order (torch.Tensor.masked_select).

On CUDA (self + mask resident, same device) this runs a GPU stream compaction; the result stays GPU-resident. The single output-length integer crosses to the host to size the data-dependent output (the result shape, not a data round-trip — PyTorch parity).

Source

pub unsafe fn data_mut(&self) -> Result<&mut [T], FerrotorchError>

Borrow the underlying data as a mutable flat slice.

§Safety

The caller must ensure exclusive access to this tensor’s storage. No other references to this tensor’s data may exist concurrently. Optimizer step() methods satisfy this requirement: they run inside no_grad() (no graph is being built) and hold &mut self (exclusive access to the optimizer’s parameter copies).

Source

pub unsafe fn update_data(&self, new_data: &[T]) -> Result<(), FerrotorchError>

Write new_data into this tensor’s storage, preserving tensor identity.

CPU: copies data into the existing storage Vec.
GPU: uploads data to GPU and replaces the storage buffer.

This is the device-transparent alternative to data_mut() for optimizer step implementations.

§Safety

Same requirements as data_mut() — caller must ensure exclusive access. No concurrent reads or writes to this tensor’s storage may exist. Optimizer step() methods satisfy this by running inside no_grad() with &mut self.

Source

pub unsafe fn update_storage_and_shape( &self, new_storage: TensorStorage<T>, new_shape: Vec<usize>, ) -> Result<(), FerrotorchError>

Replace this tensor’s storage AND shape/strides in-place, matching PyTorch’s Tensor.resize_(new_shape) + storage swap.

This is the rare case where both the underlying buffer and the shape metadata in TensorInner need to change in lockstep — used by the out= write path of torch.add(a, b, *, out=out) when out.shape() != broadcast_shape (PyTorch silently resizes out, with a deprecation warning, in current versions). The new strides are computed as C-contiguous for new_shape.

§Safety

Same as [update_storage]: caller must ensure exclusive access. The new storage’s numel must equal new_shape.iter().product(). The new storage must reside on the same device as the tensor.

The caller must also guarantee that no other Tensor clone is concurrently observing this tensor’s shape — Tensor is Arc<TensorInner>-shared, and a resize changes the observable shape for every clone. This is the same invariant update_storage already implicitly relies on for buffer-length changes; the out=-style call sites this method exists for own a unique &Tensor for the duration of the write.

Source

pub unsafe fn update_storage( &self, new_storage: TensorStorage<T>, ) -> Result<(), FerrotorchError>

Replace this tensor’s storage with a new TensorStorage in-place.

Used by GPU-native optimizer steps that compute the updated parameter entirely on-device and need to swap the underlying buffer without a CPU round-trip.

§Safety

Same as [update_data]: caller must ensure exclusive access. The new storage must have the same number of elements as the tensor and reside on the same device.

Source

pub fn with_gpu_handle_mut<R>( &self, f: impl FnOnce(&mut GpuBufferHandle) -> Result<R, FerrotorchError>, ) -> Result<R, FerrotorchError>

Run f with mutable access to this tensor’s underlying [GpuBufferHandle], in-place.

This is a safe wrapper for the optimizer fast-path that fuses the parameter update directly into a GPU kernel: the kernel needs an &mut GpuBufferHandle aliased into the param tensor’s storage, but Tensor is Arc-shared so a naïve &self -> &mut Storage route requires unsafe at every call site. By centralizing the Arc::as_ptr -> *mut TensorStorage<T> cast inside this single method and returning Err(FerrotorchError::DeviceUnavailable) for non-GPU storage, callers do not need to write any unsafe of their own.

§Errors

Returns FerrotorchError::DeviceUnavailable when this tensor’s storage is not GPU-resident.

§Safety contract (encapsulated)

This method is safe because the caller cannot violate any invariant exposed through it: the closure receives a fresh &mut GpuBufferHandle whose lifetime is bounded by the body of this method, and no other reference to the storage can be created concurrently from within the closure (the closure is FnOnce). The only remaining hazard is concurrent access to the same Arc from another thread — Tensor is not Sync for storage mutation purposes, and the optimizer step that drives this method holds &mut self on the outer Optimizer for the whole step, so no other handle can be observing or mutating this storage during the call. This is the same exclusive-access guarantee that [update_data] and [update_storage] depend on; this method simply lets the optimizer mutate the GPU handle in place rather than swap the entire TensorStorage.

Source

pub fn detach(&self) -> Tensor<T>

Detach this tensor from the computation graph, returning a new tensor that shares storage but has no grad_fn.

Source

pub fn is_contiguous(&self) -> bool

Whether this tensor is contiguous in memory (C-order).

Dimensions with size 1 can have any stride without affecting contiguity, since they contribute no index offset.

Source

pub fn is_contiguous_for(&self, format: MemoryFormat) -> bool

Check whether this tensor is contiguous in a specific memory format.

MemoryFormat::Contiguous — standard C-order (NCHW for 4D).
MemoryFormat::ChannelsLast — NHWC stride pattern for 4D tensors.
MemoryFormat::ChannelsLast3d — NDHWC stride pattern for 5D tensors.

Dimensions of size 1 are treated as matching any stride, consistent with PyTorch behaviour.

[CL-309] WU-05: channels-last memory format support

Source

pub fn to_memory_format( &self, format: MemoryFormat, ) -> Result<Tensor<T>, FerrotorchError>

Rearrange this tensor to the target memory format.

If the tensor is already contiguous in the target format, returns a cheap clone (shared storage). Otherwise, physically rearranges the data and returns a new tensor with the correct strides.

The shape is never changed — only the strides (and possibly the underlying data order) are altered.

[CL-309] WU-05: channels-last memory format support

Source

pub fn contiguous_in( &self, format: MemoryFormat, ) -> Result<Tensor<T>, FerrotorchError>

Return a tensor that is contiguous in the given memory format, materializing (copying) the data if necessary.

Equivalent to .to_memory_format(format) — both names are provided for API familiarity: contiguous() is the PyTorch-style entry point while to_memory_format() is the explicit variant.

[CL-309] WU-05: channels-last memory format support

Source

pub fn is_scalar(&self) -> bool

Returns true if this is a scalar (0-dimensional) tensor.

Source

pub fn item(&self) -> Result<T, FerrotorchError>

For a scalar tensor, extract the single value.

Source

pub fn is_same(&self, other: &Tensor<T>) -> bool

Returns true if two tensors are the same object (same Arc).

Source

pub fn inner_storage_arc(&self) -> &Arc<TensorStorage<T>>

Get a reference to the inner storage Arc.

Exposed for optimizer kernels that need to modify the param’s GPU buffer in-place via unsafe pointer cast (same pattern as update_data).

Trait Implementations§

Source §

impl<T: Clone + Float> Clone for Buffer<T>

Source §

fn clone(&self) -> Buffer<T>

Returns a duplicate of the value. Read more

1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

Source §

impl<T: Debug + Float> Debug for Buffer<T>

Source §

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Source §

impl<T: Float> Deref for Buffer<T>

Source §

type Target = Tensor<T>

The resulting type after dereferencing.

Source §

fn deref(&self) -> &Self::Target

Dereferences the value.

Auto Trait Implementations§

§

impl<T> !RefUnwindSafe for Buffer<T>

§

impl<T> !UnwindSafe for Buffer<T>

§

impl<T> Freeze for Buffer<T>

§

impl<T> Send for Buffer<T>

§

impl<T> Sync for Buffer<T>

§

impl<T> Unpin for Buffer<T>

§

impl<T> UnsafeUnpin for Buffer<T>

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<T> ByRef<T> for T

Source §

fn by_ref(&self) -> &T

Source §

impl<T> CloneToUninit for T
where T: Clone,

Source §

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

Source §

impl<T> DistributionExt for T
where T: ?Sized,

Source §

fn rand<T>(&self, rng: &mut (impl Rng + ?Sized)) -> T
where Self: Distribution<T>,

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T, U> Imply<T> for U
where T: ?Sized, U: ?Sized,

Source §

impl<T> Instrument for T

Source §

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more

Source §

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §