pub struct Buffer<T: Float> { /* private fields */ }Expand description
A non-trainable tensor that is part of a module’s persistent state.
Like crate::Parameter, Buffer<T> derefs to Tensor<T> for all
tensor operations and clones share the same underlying Arc identity.
Unlike Parameter, requires_grad is always false.
Implementations§
Source§impl<T: Float> Buffer<T>
impl<T: Float> Buffer<T>
Sourcepub fn new(tensor: Tensor<T>) -> Self
pub fn new(tensor: Tensor<T>) -> Self
Wrap a tensor as a buffer. requires_grad is forced to false.
Sourcepub fn zeros(shape: &[usize]) -> FerrotorchResult<Self>
pub fn zeros(shape: &[usize]) -> FerrotorchResult<Self>
Create a zero-filled buffer with the given shape.
Sourcepub fn ones(shape: &[usize]) -> FerrotorchResult<Self>
pub fn ones(shape: &[usize]) -> FerrotorchResult<Self>
Create a one-filled buffer with the given shape.
Sourcepub fn from_slice(data: &[T], shape: &[usize]) -> FerrotorchResult<Self>
pub fn from_slice(data: &[T], shape: &[usize]) -> FerrotorchResult<Self>
Create a buffer from a slice + shape.
Sourcepub fn into_tensor(self) -> Tensor<T>
pub fn into_tensor(self) -> Tensor<T>
Consume and return the underlying tensor.
Sourcepub fn set_data(&mut self, tensor: Tensor<T>)
pub fn set_data(&mut self, tensor: Tensor<T>)
Replace the buffer’s data. The new tensor is set to
requires_grad = false regardless of its input state.
Sourcepub fn to(&self, device: Device) -> FerrotorchResult<Self>
pub fn to(&self, device: Device) -> FerrotorchResult<Self>
Move this buffer to a device.
Methods from Deref<Target = Tensor<T>>§
Sourcepub fn backward(&self) -> Result<(), FerrotorchError>
pub fn backward(&self) -> Result<(), FerrotorchError>
Compute gradients of all leaf tensors that contribute to this tensor.
This tensor must be scalar (0-dim or single-element). After this call,
leaf tensors with requires_grad = true will have their .grad() set.
Sourcepub fn backward_with_gradient(
&self,
gradient: &Tensor<T>,
) -> Result<(), FerrotorchError>
pub fn backward_with_gradient( &self, gradient: &Tensor<T>, ) -> Result<(), FerrotorchError>
Run backward with an external gradient.
This allows backward on non-scalar tensors by providing the initial gradient explicitly. The gradient shape must match this tensor’s shape. Used for multi-head outputs, Jacobian computation, and custom loss functions.
Sourcepub fn grad_wrt(
&self,
inputs: &[&Tensor<T>],
retain_graph: bool,
create_graph: bool,
) -> Result<Vec<Option<Tensor<T>>>, FerrotorchError>
pub fn grad_wrt( &self, inputs: &[&Tensor<T>], retain_graph: bool, create_graph: bool, ) -> Result<Vec<Option<Tensor<T>>>, FerrotorchError>
Compute gradients of this tensor with respect to inputs, returning
the gradient tensors directly (without accumulating on leaves).
See grad for full documentation.
Sourcepub fn add_scalar_(&self, value: T) -> Result<&Tensor<T>, FerrotorchError>
pub fn add_scalar_(&self, value: T) -> Result<&Tensor<T>, FerrotorchError>
Add a scalar to every element in-place: self += value.
Returns &Self for method chaining. Follows PyTorch’s Tensor.add_()
semantics — the trailing underscore denotes mutation.
§Errors
Returns an error if the tensor is part of the computation graph or is a
leaf with requires_grad = true.
Sourcepub fn mul_scalar_(&self, value: T) -> Result<&Tensor<T>, FerrotorchError>
pub fn mul_scalar_(&self, value: T) -> Result<&Tensor<T>, FerrotorchError>
Multiply every element by a scalar in-place: self *= value.
§Errors
Returns an error if the tensor is part of the computation graph or is a
leaf with requires_grad = true.
Sourcepub fn fill_(&self, value: T) -> Result<&Tensor<T>, FerrotorchError>
pub fn fill_(&self, value: T) -> Result<&Tensor<T>, FerrotorchError>
Fill every element with value in-place.
§Errors
Returns an error if the tensor is part of the computation graph or is a
leaf with requires_grad = true.
Sourcepub fn zero_(&self) -> Result<&Tensor<T>, FerrotorchError>
pub fn zero_(&self) -> Result<&Tensor<T>, FerrotorchError>
Zero all elements in-place: self = 0.
Equivalent to self.fill_(T::zero()).
§Errors
Returns an error if the tensor is part of the computation graph or is a
leaf with requires_grad = true.
Sourcepub fn add_(&self, other: &Tensor<T>) -> Result<&Tensor<T>, FerrotorchError>
pub fn add_(&self, other: &Tensor<T>) -> Result<&Tensor<T>, FerrotorchError>
Add another tensor elementwise in-place: self += other.
Equivalent to PyTorch’s Tensor.add_(other) — i.e. add_scaled_
with alpha = 1.0. other may be broadcast to self.shape() as
long as the broadcast result equals self.shape() (PyTorch
invariant for all in-place ops).
For GPU f32 tensors on the same-shape fast path, uses the GPU add kernel and swaps the storage (no CPU round-trip).
§Errors
Returns an error if other cannot be broadcast to self.shape()
(or if doing so would change self.shape()), or if the tensor is
part of the computation graph or is a leaf with requires_grad = true.
Sourcepub fn add_scaled_(
&self,
other: &Tensor<T>,
alpha: f64,
) -> Result<&Tensor<T>, FerrotorchError>
pub fn add_scaled_( &self, other: &Tensor<T>, alpha: f64, ) -> Result<&Tensor<T>, FerrotorchError>
In-place version of torch.add(input, other, *, alpha):
self = self + alpha * other.
other may be broadcast to self.shape() (PyTorch parity); the
broadcast result must equal self.shape() — an in-place op cannot
change the tensor’s shape. The fast same-shape, alpha == 1.0
path uses the GPU add kernel directly when applicable; broadcast
or scaled paths route through grad_fns::arithmetic::add_scaled
(which itself dispatches CPU/GPU + broadcasting) and swap the
resulting storage in.
§Errors
Returns an error if shapes are not broadcast-compatible, if the
broadcast result differs from self.shape(), or if the tensor is
part of the computation graph or is a leaf with requires_grad = true.
Sourcepub fn sub_scaled_(
&self,
other: &Tensor<T>,
alpha: f64,
) -> Result<&Tensor<T>, FerrotorchError>
pub fn sub_scaled_( &self, other: &Tensor<T>, alpha: f64, ) -> Result<&Tensor<T>, FerrotorchError>
In-place version of torch.sub(input, other, *, alpha):
self = self - alpha * other.
Delegates to Tensor::add_scaled_ with -alpha. PyTorch’s own
sub_out at aten/src/ATen/native/BinaryOps.cpp:434-439 does the
same: add_stub(device_type(), *this, -alpha). This is the
in-place sibling of crate::grad_fns::arithmetic::sub_scaled
and the non-test production consumer of that out-of-place entry
point (it invokes add_scaled_, which routes through
arithmetic::add_scaled; sub_scaled is the symmetric forward
caller wired through the parity-sweep "sub" dispatch arm).
other may be broadcast to self.shape(); the broadcast result
must equal self.shape() — an in-place op cannot resize the
target tensor (PyTorch invariant for all _ ops).
§Errors
Returns an error if shapes are not broadcast-compatible, if the
broadcast result differs from self.shape(), or if the tensor is
part of the computation graph or is a leaf with requires_grad = true.
Sourcepub fn sub_(&self, other: &Tensor<T>) -> Result<&Tensor<T>, FerrotorchError>
pub fn sub_(&self, other: &Tensor<T>) -> Result<&Tensor<T>, FerrotorchError>
Subtract another tensor elementwise in-place: self -= other.
Equivalent to PyTorch’s Tensor.sub_(other) — i.e. sub_scaled_
with alpha = 1.0. Mirrors upstream’s
aten/src/ATen/native/BinaryOps.cpp:434-439
TORCH_IMPL_FUNC(sub_out) { add_stub(device_type(), *this, -alpha); }
with alpha = 1.0, i.e. self += -1.0 * other == self -= other.
Delegating here gives sub_scaled_ a non-test production consumer
transitively for free (every caller of sub_ becomes a caller of
sub_scaled_), and brings sub_ to PyTorch parity with the
sub_(other, *, alpha=1) docstring at torch/_tensor_docs.py:5113
(broadcasting from add_scaled_ is inherited; in-place ops cannot
resize self).
§Errors
Returns an error if other cannot be broadcast to self.shape()
(or if doing so would change self.shape()), or if the tensor is
part of the computation graph or is a leaf with requires_grad = true.
Sourcepub fn mul_(&self, other: &Tensor<T>) -> Result<&Tensor<T>, FerrotorchError>
pub fn mul_(&self, other: &Tensor<T>) -> Result<&Tensor<T>, FerrotorchError>
Multiply another tensor elementwise in-place: self *= other.
other may be broadcast to self.shape() (PyTorch parity for
Tensor.mul_(other) — aten/src/ATen/native/BinaryOps.cpp:441 TORCH_IMPL_FUNC(mul_out) inherits broadcasting via TensorIterator);
the broadcast result must equal self.shape() — an in-place op
cannot resize the target tensor.
The same-shape, both-on-CUDA, T == f32 path takes the GPU mul_f32
kernel and swaps the storage (no CPU round-trip). Anything else
(broadcasting or non-f32 or CPU) routes through
grad_fns::arithmetic::mul (which itself handles CPU + GPU broadcasting
via binary_broadcast / broadcast_mul_*) and swaps the resulting
storage in.
§Errors
Returns an error if shapes are not broadcast-compatible, if the
broadcast result differs from self.shape(), or if the tensor is
part of the computation graph or is a leaf with requires_grad = true.
Sourcepub fn div_(&self, other: &Tensor<T>) -> Result<&Tensor<T>, FerrotorchError>
pub fn div_(&self, other: &Tensor<T>) -> Result<&Tensor<T>, FerrotorchError>
Divide by another tensor elementwise in-place: self /= other.
other may be broadcast to self.shape() (PyTorch parity for
Tensor.div_(other) — aten/src/ATen/native/BinaryOps.cpp:447 TORCH_IMPL_FUNC(div_out) inherits broadcasting via TensorIterator);
the broadcast result must equal self.shape() — an in-place op
cannot resize the target tensor.
The same-shape, both-on-CUDA, T == f32 path takes the GPU div_f32
kernel and swaps the storage (no CPU round-trip). Anything else routes
through grad_fns::arithmetic::div.
True-division semantics (PyTorch parity, no rounding). For
floor / trunc rounding modes use Tensor::div_rounding_.
§Errors
Returns an error if shapes are not broadcast-compatible, if the
broadcast result differs from self.shape(), or if the tensor is
part of the computation graph or is a leaf with requires_grad = true.
Sourcepub fn div_rounding_(
&self,
other: &Tensor<T>,
rounding_mode: &str,
) -> Result<&Tensor<T>, FerrotorchError>
pub fn div_rounding_( &self, other: &Tensor<T>, rounding_mode: &str, ) -> Result<&Tensor<T>, FerrotorchError>
In-place division with a rounding_mode kwarg, mirroring
torch.Tensor.div_(other, *, rounding_mode=...) per
torch/_tensor_docs.py:1746 and aten/src/ATen/native/BinaryOps.cpp:176
TORCH_META_FUNC2(div, Tensor_mode).
Accepted modes:
"trunc"—self = (self / other).trunc()(rounds toward zero)."floor"—self = (self / other).floor()(rounds toward negative infinity).
For true-division (no rounding), use Tensor::div_ directly. Any other
mode string returns InvalidArgument matching upstream:
div expected rounding_mode to be one of None, 'trunc', or 'floor' but found '...'(BinaryOps.cpp:186)
Broadcasting follows div_ semantics — other may broadcast to
self.shape() and the broadcast result must equal self.shape().
§Errors
Returns an error if mode is unrecognized, if shapes are not
broadcast-compatible, or if the tensor is part of the computation graph
or is a leaf with requires_grad = true.
Sourcepub fn clamp_(&self, min: T, max: T) -> Result<&Tensor<T>, FerrotorchError>
pub fn clamp_(&self, min: T, max: T) -> Result<&Tensor<T>, FerrotorchError>
Clamp every element to [min, max] in-place.
Each element x is replaced with min.max(x.min(max)), matching
PyTorch’s Tensor.clamp_().
This is the both-bounds-required overload; for the
(Option<T>, Option<T>) overload that mirrors torch’s
clamp_(min=None, max=None) see Tensor::clamp_opt_.
§Errors
- Returns an error if
min > max. - Returns an error if the tensor is part of the computation graph or is
a leaf with
requires_grad = true.
Sourcepub fn clamp_opt_(
&self,
min: Option<T>,
max: Option<T>,
) -> Result<&Tensor<T>, FerrotorchError>
pub fn clamp_opt_( &self, min: Option<T>, max: Option<T>, ) -> Result<&Tensor<T>, FerrotorchError>
Clamp with optional bounds — Tensor.clamp_(min=None, max=None) parity.
Mirrors torch.Tensor.clamp_(min=None, max=None) -> Tensor per
torch/_tensor_docs.py:1141 and the structured kernel
TORCH_IMPL_FUNC(clamp_out) at
aten/src/ATen/native/TensorCompare.cpp:831. Either bound may be
None:
clamp_opt_(Some(lo), Some(hi))— equivalent toclamp_(lo, hi).clamp_opt_(Some(lo), None)—clamp_min_(lower bound only).clamp_opt_(None, Some(hi))—clamp_max_(upper bound only).clamp_opt_(None, None)— rejected withInvalidArgumentmatching upstream “torch.clamp: At least one of ‘min’ or ‘max’ must not be None” (TensorCompare.cpp:106).
NaN-bound parity: if either supplied bound is NaN, the entire tensor
is filled with NaN (PyTorch’s at::fill_(result, NaN) branch at
TensorCompare.cpp:844, executed when min.isNan() || max.isNan()).
Per-element NaN inputs propagate (matching the kernel’s
std::min(std::max(a, min), max) semantics — when a is NaN, both
comparisons evaluate false in this implementation and a is left
unchanged, which propagates NaN through).
§Errors
- Returns an error if both
minandmaxareNone. - Returns an error if
min > max(when both areSome). - Returns an error if the tensor is part of the computation graph or
is a leaf with
requires_grad = true.
pub fn add_t(&self, other: &Tensor<T>) -> Result<Tensor<T>, FerrotorchError>
pub fn sub_t(&self, other: &Tensor<T>) -> Result<Tensor<T>, FerrotorchError>
Sourcepub fn rsub_t(
&self,
other: &Tensor<T>,
alpha: f64,
) -> Result<Tensor<T>, FerrotorchError>
pub fn rsub_t( &self, other: &Tensor<T>, alpha: f64, ) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.rsub(other, *, alpha=1) — reverse subtract:
self - alpha * other is the sub_t semantic; rsub is the
operand-swapped variant returning other - alpha * self.
Per upstream aten/src/ATen/native/BinaryOps.cpp:1169 Tensor rsub( const Tensor& self, const Tensor& other, const Scalar& alpha) { return at::sub(other, self, alpha); } — a literal operand-swap
delegation. The non-test production consumer wiring for
arithmetic::rsub per R-DEFER-1: this method is the public,
chainable surface that closes the consumer requirement.
pub fn mul_t(&self, other: &Tensor<T>) -> Result<Tensor<T>, FerrotorchError>
pub fn div_t(&self, other: &Tensor<T>) -> Result<Tensor<T>, FerrotorchError>
pub fn neg_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn pow_t(&self, exponent: f64) -> Result<Tensor<T>, FerrotorchError>
pub fn sqrt_t(&self) -> Result<Tensor<T>, FerrotorchError>
Sourcepub fn rsqrt_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn rsqrt_t(&self) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.rsqrt() — reciprocal square root: 1 / sqrt(self).
Mirrors torch.rsqrt(input, *, out=None) per torch/_torch_docs.py:9656
and the upstream impl macro at
aten/src/ATen/native/UnaryOps.cpp:346 CREATE_UNARY_TORCH_IMPL_FUNC(rsqrt_out, rsqrt_stub). The non-test
production consumer wiring for arithmetic::rsqrt per R-DEFER-1:
this method is the public, chainable surface that closes the
consumer requirement.
Sourcepub fn reciprocal_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn reciprocal_t(&self) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.reciprocal() — elementwise reciprocal: 1 / self.
Mirrors torch.reciprocal(input, *, out=None) per
torch/_torch_docs.py:2584 and the upstream impl macro at
aten/src/ATen/native/UnaryOps.cpp:345 CREATE_UNARY_TORCH_IMPL_FUNC(reciprocal_out, reciprocal_stub). The
non-test production consumer wiring for arithmetic::reciprocal per
R-DEFER-1: this method is the public, chainable surface that closes
the consumer requirement.
pub fn abs_t(&self) -> Result<Tensor<T>, FerrotorchError>
Sourcepub fn remainder_t(
&self,
other: &Tensor<T>,
) -> Result<Tensor<T>, FerrotorchError>
pub fn remainder_t( &self, other: &Tensor<T>, ) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.remainder(other) — elementwise remainder with the
sign of the divisor (Python % / NumPy semantics).
Mirrors torch.remainder(input, other, *, out=None) per
torch/_torch_docs.py:9453-9472 and the upstream C++ entry at
aten/src/ATen/native/BinaryOps.cpp:1184 Tensor remainder(const Tensor& self, const Scalar& other). The float-tensor CPU
implementation is at aten/src/ATen/native/cpu/BinaryOpsKernel.cpp: 391-409 remainder_kernel. Registration at
torch/overrides.py:1100 torch.remainder: lambda input, other, out=None: -1.
Distinct from fmod_t (dividend-sign / C99 semantics, REQ-14 NOT-
STARTED): for remainder(-5, 3) ferrotorch returns 1 (sign
matches divisor +3); fmod(-5, 3) returns -2 (sign matches
dividend -5).
The non-test production consumer wiring for arithmetic::remainder
per R-DEFER-1: this method is the public, chainable surface that
closes the consumer requirement.
Sourcepub fn fmod_t(&self, other: &Tensor<T>) -> Result<Tensor<T>, FerrotorchError>
pub fn fmod_t(&self, other: &Tensor<T>) -> Result<Tensor<T>, FerrotorchError>
torch.fmod(input, other, *, out=None) — elementwise remainder
with the sign of the dividend (C99 std::fmod semantics).
Mirrors torch.Tensor.fmod via the same upstream registration
torch/overrides.py:666 torch.fmod: lambda input, other, out=None: -1.
Distinct from remainder_t (divisor-sign, REQ-13 SHIPPED): for
fmod(-5, 3) ferrotorch returns -2 (sign matches dividend
-5); remainder(-5, 3) returns 1 (sign matches divisor
+3). See arithmetic::fmod docs for the per-quadrant table.
The non-test production consumer wiring for arithmetic::fmod
per R-DEFER-1: this method is the public, chainable surface that
closes the consumer requirement.
Sourcepub fn floor_divide_t(
&self,
other: &Tensor<T>,
) -> Result<Tensor<T>, FerrotorchError>
pub fn floor_divide_t( &self, other: &Tensor<T>, ) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.floor_divide(other) — elementwise floor division
(true floor, toward -infinity).
Mirrors torch.floor_divide(input, other, *, out=None) per
torch/_torch_docs.py:4265-4296:
Computes :attr:
inputdivided by :attr:other, elementwise, and floors the result... math:: out_i = floor(input_i / other_i)
Upstream entry at aten/src/ATen/native/BinaryOps.cpp:979 Tensor floor_divide(const Tensor& self, const Tensor& other) dispatching
to div_floor_stub -> div_floor_kernel at
aten/src/ATen/native/cpu/BinaryOpsKernel.cpp:297-349 ->
c10::div_floor_floating at c10/util/generic_math.h:34-58.
Registration at torch/overrides.py:664 torch.floor_divide: lambda input, other: -1.
torch.floor_divide was historically broken (performed trunc, NOT
floor) and torch/_torch_docs.py:4267-4271 explicitly notes:
.. note:: Before PyTorch 1.13 :func:
torch.floor_divideincorrectly performed truncation division. To restore the previous behavior use :func:torch.divwithrounding_mode='trunc'.
As of PyTorch 1.13+ (and as of the upstream pin this ferrotorch is
translated against), torch.floor_divide performs TRUE FLOOR.
Verified live on 2026-05-25:
torch.floor_divide(-7.0, 3.0).item() == -3.0.
Distinct from remainder_t and fmod_t. The 3-way identity
a == floor_divide(a,b) * b + remainder(a,b) holds; the
fmod sibling is the trunc-division remainder. For a=-7, b=3:
floor_divide(-7, 3) = -3(true floor)remainder(-7, 3) = 2(sign of divisor)fmod(-7, 3) = -1(sign of dividend / trunc remainder)
Backward: torch.floor_divide has no derivative — verified live
grad_fn=<NotImplemented object> raises derivative for aten::floor_divide is not implemented. FloorDivideBackward
mirrors that by erroring on .backward().
The non-test production consumer wiring for
arithmetic::floor_divide per R-DEFER-1: this method is the
public, chainable surface that closes the consumer requirement.
Sourcepub fn addcmul_t(
&self,
tensor1: &Tensor<T>,
tensor2: &Tensor<T>,
value: f64,
) -> Result<Tensor<T>, FerrotorchError>
pub fn addcmul_t( &self, tensor1: &Tensor<T>, tensor2: &Tensor<T>, value: f64, ) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.addcmul(tensor1, tensor2, *, value=1) — fused
self + value * tensor1 * tensor2 (receiver is input).
Mirrors torch.addcmul(input, tensor1, tensor2, *, value=1, out=None)
per torch/_torch_docs.py:510-544:
Performs the element-wise multiplication of :attr:
tensor1by :attr:tensor2, multiplies the result by the scalar :attr:valueand adds it to :attr:input... math:: \text{out}_i = \text{input}_i + \text{value} \times \text{tensor1}_i \times \text{tensor2}_i
Upstream C++ entry at aten/src/ATen/native/PointwiseOps.cpp:57-64 TORCH_IMPL_FUNC(addcmul_out). Registration at
torch/overrides.py:462 torch.addcmul: lambda input, tensor1, tensor2, value=1, out=None: -1.
Broadcasting: the 3 input tensors (self, tensor1, tensor2) are
jointly broadcast to a common output shape. Backward: per
tools/autograd/derivatives.yaml, d_input = grad, d_tensor1 = grad * value * tensor2, d_tensor2 = grad * value * tensor1 (no
gradient with respect to the scalar value).
The non-test production consumer wiring for arithmetic::addcmul
per R-DEFER-1: this method is the public, chainable surface that
closes the consumer requirement.
Sourcepub fn addcdiv_t(
&self,
tensor1: &Tensor<T>,
tensor2: &Tensor<T>,
value: f64,
) -> Result<Tensor<T>, FerrotorchError>
pub fn addcdiv_t( &self, tensor1: &Tensor<T>, tensor2: &Tensor<T>, value: f64, ) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.addcdiv(tensor1, tensor2, *, value=1) — fused
self + value * tensor1 / tensor2 (receiver is input).
Mirrors torch.addcdiv(input, tensor1, tensor2, *, value=1, out=None)
per torch/_torch_docs.py:461-473:
Performs the element-wise division of :attr:
tensor1by :attr:tensor2, multiplies the result by the scalar :attr:valueand adds it to :attr:input... math:: \text{out}_i = \text{input}_i + \text{value} \times \frac{\text{tensor1}_i}{\text{tensor2}_i}
Upstream C++ entry at aten/src/ATen/native/PointwiseOps.cpp:66-73 TORCH_IMPL_FUNC(addcdiv_out). The integer-dtype deprecation block at
PointwiseOps.cpp:38-50 TORCH_META_FUNC(addcdiv) is unreachable for
the Tensor<T: Float> family.
Broadcasting: the 3 input tensors (self, tensor1, tensor2) are
jointly broadcast to a common output shape. Backward: per
tools/autograd/derivatives.yaml, d_input = grad, d_tensor1 = grad * value / tensor2, d_tensor2 = -grad * value * tensor1 / (tensor2 * tensor2) (no gradient with respect to the scalar
value). At tensor2=0 the d_tensor2 path produces NaN / ±Inf via
IEEE-754 — matches upstream (R-DEV-1).
The non-test production consumer wiring for arithmetic::addcdiv
per R-DEFER-1: this method is the public, chainable surface that
closes the consumer requirement.
Sourcepub fn cumsum_t(&self, dim: i64) -> Result<Tensor<T>, FerrotorchError>
pub fn cumsum_t(&self, dim: i64) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.cumsum(dim) — cumulative sum along dim.
Mirrors torch.cumsum(input, dim, *, dtype=None, out=None) per
torch/_torch_docs.py:3429 cumsum(input, dim, *, dtype=None, out=None) -> Tensor and the torch.Tensor method docstring at
torch/_tensor_docs.py:1500-1506 add_docstr_all("cumsum", r""" cumsum(dim, dtype=None) -> Tensor [...] See :func:torch.cumsum``.
Upstream C++ entry at aten/src/ATen/native/ReduceOps.cpp:511 TORCH_IMPL_FUNC(cumsum_out) dispatching cumsum_stub. Autograd
VJP per tools/autograd/derivatives.yaml:529-531 (name: cumsum( Tensor self, int dim, *, ScalarType? dtype=None) -> Tensor; self: cumsum_backward(grad.to(self.scalar_type()), dim)) which is the
reverse_cumsum (flip → cumsum → flip) upper-triangular
multiplication at ReduceOps.cpp:527-529 static Tensor reversed_cumsum(const Tensor& w, int64_t dim).
ferrotorch does NOT accept the dtype kwarg (the dtype-promotion
branch at ReduceOps.cpp:267 is unreachable for the Tensor<T: Float> family — see .design/ferrotorch-core/grad_fns/ cumulative.md REQ-1).
The non-test production consumer wiring for
grad_fns::cumulative::cumsum per R-DEFER-1: this method is the
public, chainable surface that closes the consumer requirement
(blocker #1232).
Sourcepub fn cumprod_t(&self, dim: i64) -> Result<Tensor<T>, FerrotorchError>
pub fn cumprod_t(&self, dim: i64) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.cumprod(dim) — cumulative product along dim.
Mirrors torch.cumprod(input, dim, *, dtype=None, out=None) per
torch/_torch_docs.py:3390 cumprod(input, dim, *, dtype=None, out=None) -> Tensor and the torch.Tensor method docstring at
torch/_tensor_docs.py:1482-1488 add_docstr_all("cumprod", r""" cumprod(dim, dtype=None) -> Tensor [...] See :func:torch.cumprod. Upstream C++ entry at aten/src/ATen/native/ReduceOps.cpp:519
TORCH_IMPL_FUNC(cumprod_out). Autograd VJP per tools/autograd/derivatives.yaml:525-527 (name: cumprod(Tensor
self, int dim, *, ScalarType? dtype=None) -> Tensor; self:
cumprod_backward(grad.to(self.scalar_type()), self, dim, result))routing throughcumprod_backwardatReduceOps.cpp:531-790`
with the zeros-aware reverse-cumsum-divide algorithm.
ferrotorch does NOT accept the dtype kwarg; the zeros-present
path uses an O(n^3) brute-force backward rather than upstream’s
composite-compliance masked-fill (numerically identical, slower,
not second-order-differentiable — see
.design/ferrotorch-core/grad_fns/cumulative.md REQ-2).
The non-test production consumer wiring for
grad_fns::cumulative::cumprod per R-DEFER-1: this method is the
public, chainable surface that closes the consumer requirement
(blocker #1232).
Sourcepub fn logcumsumexp_t(&self, dim: i64) -> Result<Tensor<T>, FerrotorchError>
pub fn logcumsumexp_t(&self, dim: i64) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.logcumsumexp(dim) — numerically stable
log(cumsum(exp(self))) along dim.
Mirrors torch.logcumsumexp(input, dim, *, out=None) per
torch/_torch_docs.py:3298 logcumsumexp(input, dim, *, out=None) -> Tensor and the torch.Tensor method docstring at
torch/_tensor_docs.py:1455-1462 add_docstr_all("logcumsumexp", r""" logcumsumexp(dim) -> Tensor [...] See :func:torch.logcumsumexp``. Upstream C++ entry at
aten/src/ATen/native/ReduceOps.cpp:475 Tensor logcumsumexp(const Tensor& self, int64_t dim) dispatching _logcumsumexp_cpu at
:465-468 → logcumsumexp_stub at :471. Autograd VJP per
tools/autograd/derivatives.yaml:521-523 (name: logcumsumexp( Tensor self, int dim) -> Tensor; self: logcumsumexp_backward(grad, self, result, dim)) factors as grad_input[i] = exp(input[i]) * reverse_cumsum(grad_output * exp(-output)) (softmax-weighted
reverse cumsum).
The numerical-stability invariant (large inputs ~1000.0 stay
finite) is preserved by the two-pass max-rescaling forward
algorithm at ops/cumulative.rs:378-410. See
.design/ferrotorch-core/grad_fns/cumulative.md REQ-5.
The non-test production consumer wiring for
grad_fns::cumulative::logcumsumexp per R-DEFER-1: this method
is the public, chainable surface that closes the consumer
requirement (blocker #1232).
pub fn exp_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn log_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn sin_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn cos_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn clamp_t(&self, min: T, max: T) -> Result<Tensor<T>, FerrotorchError>
Sourcepub fn clip_t(&self, min: T, max: T) -> Result<Tensor<T>, FerrotorchError>
pub fn clip_t(&self, min: T, max: T) -> Result<Tensor<T>, FerrotorchError>
clip is a literal alias of clamp per upstream
aten/src/ATen/native/TensorCompare.cpp:918-930 Tensor clip(...)
(pass-through to at::clamp(self, min, max)).
pub fn tan_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn asin_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn acos_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn atan_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn sinh_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn cosh_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn asinh_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn acosh_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn atanh_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn exp2_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn expm1_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn log2_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn log10_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn log1p_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn ceil_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn floor_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn round_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn trunc_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn frac_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn sign_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn sinc_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn relu(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn sigmoid(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn tanh_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn gelu(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn gelu_with( &self, approximate: GeluApproximate, ) -> Result<Tensor<T>, FerrotorchError>
pub fn silu(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn softmax(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn log_softmax(&self) -> Result<Tensor<T>, FerrotorchError>
Sourcepub fn threshold_t(
&self,
threshold: f64,
value: f64,
) -> Result<Tensor<T>, FerrotorchError>
pub fn threshold_t( &self, threshold: f64, value: f64, ) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.threshold(threshold, value) — replace each element below
(or equal to) threshold with value, leave the rest unchanged.
Mirrors torch.nn.functional.threshold(input, threshold, value) per
torch/nn/functional.py:1682-1700 and
TORCH_IMPL_FUNC(threshold_out) at
aten/src/ATen/native/Activation.cpp:688-690. The non-test production
consumer wiring for grad_fns::activation::threshold per R-DEFER-1:
this method is the public, chainable surface that closes the
consumer requirement (closes #1341 REQ-19).
Sourcepub fn rrelu_t(
&self,
lower: f64,
upper: f64,
training: bool,
) -> Result<Tensor<T>, FerrotorchError>
pub fn rrelu_t( &self, lower: f64, upper: f64, training: bool, ) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.rrelu(lower, upper, training) — randomized leaky ReLU.
Mirrors torch.nn.functional.rrelu(input, lower, upper, training, inplace) per torch/nn/functional.py:1962-1989 and
Tensor& rrelu_with_noise_out_cpu(...) at
aten/src/ATen/native/Activation.cpp:611-654. The non-test production
consumer wiring for grad_fns::activation::rrelu per R-DEFER-1:
this method is the public, chainable surface that closes the
consumer requirement (closes #1341 REQ-20).
Note: training=true falls back to the deterministic mean-slope
inference path (per the GradFn docs at activation.rs). The
RNG-stateful training-mode VJP is a separately-tracked follow-up.
Sourcepub fn celu_t(&self, alpha: f64) -> Result<Tensor<T>, FerrotorchError>
pub fn celu_t(&self, alpha: f64) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.celu(alpha) —
celu(x) = max(0, x) + min(0, alpha * (exp(x / alpha) - 1)).
Mirrors torch.nn.functional.celu(input, alpha=1.0) per
torch/nn/functional.py:1874-1894 and
Tensor celu(const Tensor& self, const Scalar& alpha) at
aten/src/ATen/native/Activation.cpp:540-545. The non-test production
consumer wiring for grad_fns::activation::celu per R-DEFER-1:
this method is the public, chainable surface that closes the
consumer requirement (closes #1341 REQ-21).
Sourcepub fn softmin_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn softmin_t(&self) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.softmin() — softmin(x) = softmax(-x) along the last
axis (fused single-GradFn variant).
Mirrors torch.nn.functional.softmin(input, dim=None, dtype=None) per
torch/nn/functional.py:2095-2125. The non-test production consumer
wiring for grad_fns::activation::softmin per R-DEFER-1: this method
is the public, chainable surface that closes the consumer requirement
(closes #1341 REQ-22). The composition-route variant
(ferrotorch_nn::functional::softmin = neg -> softmax, two GradFn
nodes) remains available; this method routes through the fused VJP.
pub fn sum_all(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn mean_all(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn prod_all(&self) -> Result<Tensor<T>, FerrotorchError>
Sourcepub fn amin(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn amin(&self) -> Result<Tensor<T>, FerrotorchError>
Global minimum across all elements. Mirrors torch.amin(self) with
no dim argument. Returns a 0-d tensor. On CUDA f32/f64, dispatches
to the native PTX reduce_min kernel; on CPU walks the buffer. (#627)
Sourcepub fn amax(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn amax(&self) -> Result<Tensor<T>, FerrotorchError>
Global maximum across all elements. Mirrors torch.amax(self). (#627)
Sourcepub fn lu_factor(&self) -> Result<(Tensor<T>, Vec<i32>), FerrotorchError>
pub fn lu_factor(&self) -> Result<(Tensor<T>, Vec<i32>), FerrotorchError>
LU factorization in cuSOLVER’s packed form: returns
(LU_packed, pivots). Mirrors torch.linalg.lu_factor. On CUDA
f32/f64, runs natively via cuSOLVER getrf with no host bounce
for the matrix; pivots come back as a host Vec<i32> (O(n)). (#604)
pub fn matmul(&self, other: &Tensor<T>) -> Result<Tensor<T>, FerrotorchError>
pub fn mm(&self, other: &Tensor<T>) -> Result<Tensor<T>, FerrotorchError>
Sourcepub fn mm_bt(&self, other: &Tensor<T>) -> Result<Tensor<T>, FerrotorchError>
pub fn mm_bt(&self, other: &Tensor<T>) -> Result<Tensor<T>, FerrotorchError>
Fused A @ B^T — avoids materializing the transpose of B. A: [M, K], B: [N, K] -> [M, N].
pub fn bmm(&self, other: &Tensor<T>) -> Result<Tensor<T>, FerrotorchError>
pub fn mv_t(&self, other: &Tensor<T>) -> Result<Tensor<T>, FerrotorchError>
pub fn dot_t(&self, other: &Tensor<T>) -> Result<Tensor<T>, FerrotorchError>
pub fn t(&self) -> Result<Tensor<T>, FerrotorchError>
Sourcepub fn einsum(
&self,
equation: &str,
others: &[&Tensor<T>],
) -> Result<Tensor<T>, FerrotorchError>
pub fn einsum( &self, equation: &str, others: &[&Tensor<T>], ) -> Result<Tensor<T>, FerrotorchError>
Einstein summation with this tensor as the first operand.
others contains the remaining input tensors (if any). The equation
must include subscripts for self followed by the others.
// Matrix multiply: self @ other
let c = a.einsum("ij,jk->ik", &[&b])?;
// Trace of self
let t = a.einsum("ii->", &[])?;pub fn sum_dim( &self, dim: i64, keepdim: bool, ) -> Result<Tensor<T>, FerrotorchError>
pub fn mean_dim( &self, dim: i64, keepdim: bool, ) -> Result<Tensor<T>, FerrotorchError>
Sourcepub fn logsumexp_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn logsumexp_t(&self) -> Result<Tensor<T>, FerrotorchError>
Differentiable full-reduction logsumexp. Mirrors
torch.logsumexp(self) — numerically stable log(sum(exp(self)))
to a 0-D scalar. Backward grad * exp(self - result). Closes #1310.
Sourcepub fn logsumexp_dim_t(
&self,
dim: i64,
keepdim: bool,
) -> Result<Tensor<T>, FerrotorchError>
pub fn logsumexp_dim_t( &self, dim: i64, keepdim: bool, ) -> Result<Tensor<T>, FerrotorchError>
Differentiable dim-keyed logsumexp. Mirrors
torch.logsumexp(self, dim, keepdim).
Sourcepub fn argmax_t(&self) -> Result<IntTensor<i64>, FerrotorchError>
pub fn argmax_t(&self) -> Result<IntTensor<i64>, FerrotorchError>
Non-differentiable global argmax. Mirrors torch.argmax(self).
Returns a 0-D IntTensor
Sourcepub fn argmax_dim_t(
&self,
dim: i64,
keepdim: bool,
) -> Result<IntTensor<i64>, FerrotorchError>
pub fn argmax_dim_t( &self, dim: i64, keepdim: bool, ) -> Result<IntTensor<i64>, FerrotorchError>
Non-differentiable dim-keyed argmax.
Sourcepub fn argmin_t(&self) -> Result<IntTensor<i64>, FerrotorchError>
pub fn argmin_t(&self) -> Result<IntTensor<i64>, FerrotorchError>
Non-differentiable global argmin. Mirrors torch.argmin(self).
Sourcepub fn argmin_dim_t(
&self,
dim: i64,
keepdim: bool,
) -> Result<IntTensor<i64>, FerrotorchError>
pub fn argmin_dim_t( &self, dim: i64, keepdim: bool, ) -> Result<IntTensor<i64>, FerrotorchError>
Non-differentiable dim-keyed argmin.
Sourcepub fn var_t(&self, unbiased: bool) -> Result<Tensor<T>, FerrotorchError>
pub fn var_t(&self, unbiased: bool) -> Result<Tensor<T>, FerrotorchError>
Differentiable full-reduction variance with optional Bessel
correction. unbiased=true divides by n-1; false divides by
n. Closes #1301 (var).
Sourcepub fn std_t(&self, unbiased: bool) -> Result<Tensor<T>, FerrotorchError>
pub fn std_t(&self, unbiased: bool) -> Result<Tensor<T>, FerrotorchError>
Differentiable full-reduction standard deviation. Closes #1301 (std).
Sourcepub fn var_with_correction_t(
&self,
correction: f64,
) -> Result<Tensor<T>, FerrotorchError>
pub fn var_with_correction_t( &self, correction: f64, ) -> Result<Tensor<T>, FerrotorchError>
Differentiable full-reduction variance with arbitrary Bessel
correction. Mirrors torch.var(input, correction=...) —
denom = max(0, n - correction). Closes #1346 (audit 7cef63f88
REQ-8 full-reduction correction-API gap).
Sourcepub fn std_with_correction_t(
&self,
correction: f64,
) -> Result<Tensor<T>, FerrotorchError>
pub fn std_with_correction_t( &self, correction: f64, ) -> Result<Tensor<T>, FerrotorchError>
Differentiable full-reduction standard deviation with arbitrary
correction. Mirrors torch.std(input, correction=...). Closes
#1346 (audit 7cef63f88 REQ-8 full-reduction correction-API gap).
Sourcepub fn any_t(&self) -> Result<BoolTensor, FerrotorchError>
pub fn any_t(&self) -> Result<BoolTensor, FerrotorchError>
Non-differentiable full-reduction any. Returns a 0-D BoolTensor
holding true iff any element is non-zero. Closes #1312 (any).
Sourcepub fn all_t(&self) -> Result<BoolTensor, FerrotorchError>
pub fn all_t(&self) -> Result<BoolTensor, FerrotorchError>
Non-differentiable full-reduction all. Closes #1312 (all).
Sourcepub fn count_nonzero_t(&self) -> Result<IntTensor<i64>, FerrotorchError>
pub fn count_nonzero_t(&self) -> Result<IntTensor<i64>, FerrotorchError>
Non-differentiable full-reduction count_nonzero. Returns a 0-D
IntTensor
pub fn reshape_t(&self, shape: &[isize]) -> Result<Tensor<T>, FerrotorchError>
pub fn flatten_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn squeeze_t(&self, axis: isize) -> Result<Tensor<T>, FerrotorchError>
pub fn unsqueeze_t(&self, axis: isize) -> Result<Tensor<T>, FerrotorchError>
Sourcepub fn permute(&self, dims: &[usize]) -> Result<Tensor<T>, FerrotorchError>
pub fn permute(&self, dims: &[usize]) -> Result<Tensor<T>, FerrotorchError>
Permute tensor dimensions. Like PyTorch’s tensor.permute(dims).
Zero-copy: returns a view with permuted shape and strides.
dims must be a valid permutation of 0..ndim.
Sourcepub fn transpose(
&self,
dim0: usize,
dim1: usize,
) -> Result<Tensor<T>, FerrotorchError>
pub fn transpose( &self, dim0: usize, dim1: usize, ) -> Result<Tensor<T>, FerrotorchError>
Swap two dimensions. Like PyTorch’s tensor.transpose(dim0, dim1).
Zero-copy: returns a view with swapped strides.
Sourcepub fn swapaxes(
&self,
axis0: usize,
axis1: usize,
) -> Result<Tensor<T>, FerrotorchError>
pub fn swapaxes( &self, axis0: usize, axis1: usize, ) -> Result<Tensor<T>, FerrotorchError>
Swap two axes. Like PyTorch’s tensor.swapaxes(axis0, axis1) — a
literal alias of transpose per upstream
aten/src/ATen/native/TensorShape.cpp:4776.
Sourcepub fn swapdims(
&self,
dim0: usize,
dim1: usize,
) -> Result<Tensor<T>, FerrotorchError>
pub fn swapdims( &self, dim0: usize, dim1: usize, ) -> Result<Tensor<T>, FerrotorchError>
Swap two dims. Like PyTorch’s tensor.swapdims(dim0, dim1) — a literal
alias of transpose per upstream
aten/src/ATen/native/TensorShape.cpp:4784.
Sourcepub fn unflatten_t(
&self,
dim: isize,
sizes: &[isize],
) -> Result<Tensor<T>, FerrotorchError>
pub fn unflatten_t( &self, dim: isize, sizes: &[isize], ) -> Result<Tensor<T>, FerrotorchError>
Reshape a single dimension dim into multiple sizes. Like PyTorch’s
tensor.unflatten(dim, sizes) per upstream
aten/src/ATen/native/TensorShape.cpp:4350. At most one -1
inference slot is allowed in sizes.
Sourcepub fn expand_as_t(
&self,
other: &Tensor<T>,
) -> Result<Tensor<T>, FerrotorchError>
pub fn expand_as_t( &self, other: &Tensor<T>, ) -> Result<Tensor<T>, FerrotorchError>
Broadcast this tensor to the shape of other. Like PyTorch’s
tensor.expand_as(other) per upstream
aten/src/ATen/native/TensorShape.cpp:1374.
Sourcepub fn flip_t(&self, dims: &[isize]) -> Result<Tensor<T>, FerrotorchError>
pub fn flip_t(&self, dims: &[isize]) -> Result<Tensor<T>, FerrotorchError>
Reverse element order along each axis in dims. Like PyTorch’s
torch.flip(input, dims) per upstream
aten/src/ATen/native/TensorTransformations.cpp:36.
Sourcepub fn fliplr_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn fliplr_t(&self) -> Result<Tensor<T>, FerrotorchError>
Flip left-to-right (along dim 1). Like PyTorch’s torch.fliplr per
upstream aten/src/ATen/native/TensorTransformations.cpp:180.
Sourcepub fn flipud_t(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn flipud_t(&self) -> Result<Tensor<T>, FerrotorchError>
Flip up-to-down (along dim 0). Like PyTorch’s torch.flipud per
upstream aten/src/ATen/native/TensorTransformations.cpp:186.
Sourcepub fn rot90_t(
&self,
k: i64,
dims: &[isize],
) -> Result<Tensor<T>, FerrotorchError>
pub fn rot90_t( &self, k: i64, dims: &[isize], ) -> Result<Tensor<T>, FerrotorchError>
Rotate 90° k times in the plane spanned by dims. Like PyTorch’s
torch.rot90(input, k, dims) per upstream
aten/src/ATen/native/TensorTransformations.cpp:134.
Sourcepub fn movedim_t(
&self,
source: &[isize],
destination: &[isize],
) -> Result<Tensor<T>, FerrotorchError>
pub fn movedim_t( &self, source: &[isize], destination: &[isize], ) -> Result<Tensor<T>, FerrotorchError>
Reposition dims from source to destination. Like PyTorch’s
torch.movedim(input, source, destination) per upstream
aten/src/ATen/native/TensorShape.cpp:4657.
Sourcepub fn moveaxis_t(
&self,
source: &[isize],
destination: &[isize],
) -> Result<Tensor<T>, FerrotorchError>
pub fn moveaxis_t( &self, source: &[isize], destination: &[isize], ) -> Result<Tensor<T>, FerrotorchError>
Reposition axes from source to destination. Like PyTorch’s
torch.moveaxis (an alias of movedim) per upstream
aten/src/ATen/native/TensorShape.cpp:4768.
Sourcepub fn broadcast_to_t(
&self,
shape: &[usize],
) -> Result<Tensor<T>, FerrotorchError>
pub fn broadcast_to_t( &self, shape: &[usize], ) -> Result<Tensor<T>, FerrotorchError>
Broadcast this tensor to shape. Like PyTorch’s
torch.broadcast_to(input, shape) (an alias of expand) per upstream
aten/src/ATen/native/TensorShape.cpp:652.
Sourcepub fn repeat_t(&self, repeats: &[isize]) -> Result<Tensor<T>, FerrotorchError>
pub fn repeat_t(&self, repeats: &[isize]) -> Result<Tensor<T>, FerrotorchError>
Tile this tensor repeats[i] times along each axis. Like PyTorch’s
tensor.repeat(*repeats) per upstream
aten/src/ATen/native/TensorShape.cpp:1909.
Sourcepub fn tile_t(&self, reps: &[isize]) -> Result<Tensor<T>, FerrotorchError>
pub fn tile_t(&self, reps: &[isize]) -> Result<Tensor<T>, FerrotorchError>
NumPy-style tile. Like PyTorch’s torch.tile(input, reps) per upstream
aten/src/ATen/native/TensorShape.cpp:1971.
Sourcepub fn repeat_interleave_t(
&self,
repeats: usize,
dim: isize,
) -> Result<Tensor<T>, FerrotorchError>
pub fn repeat_interleave_t( &self, repeats: usize, dim: isize, ) -> Result<Tensor<T>, FerrotorchError>
Repeat each element repeats times consecutively along dim. Like
PyTorch’s torch.repeat_interleave(input, repeats, dim).
Sourcepub fn unbind_t(&self, dim: isize) -> Result<Vec<Tensor<T>>, FerrotorchError>
pub fn unbind_t(&self, dim: isize) -> Result<Vec<Tensor<T>>, FerrotorchError>
Split into size(dim) slices with dim removed. Like PyTorch’s
torch.unbind(input, dim) per upstream
aten/src/ATen/native/TensorShape.cpp:4367.
Sourcepub fn tensor_split_t(
&self,
indices: &[usize],
dim: isize,
) -> Result<Vec<Tensor<T>>, FerrotorchError>
pub fn tensor_split_t( &self, indices: &[usize], dim: isize, ) -> Result<Vec<Tensor<T>>, FerrotorchError>
Split at the integer section boundaries indices along dim. Like
PyTorch’s torch.tensor_split(input, indices, dim) per upstream
aten/src/ATen/native/TensorShape.cpp:1167.
Sourcepub fn narrow(
&self,
dim: usize,
start: usize,
length: usize,
) -> Result<Tensor<T>, FerrotorchError>
pub fn narrow( &self, dim: usize, start: usize, length: usize, ) -> Result<Tensor<T>, FerrotorchError>
Return a narrowed view along dim starting at start with length
elements. Like PyTorch’s tensor.narrow(dim, start, length).
Zero-copy: shares storage with the original tensor.
Sourcepub fn view(&self, shape: &[i64]) -> Result<Tensor<T>, FerrotorchError>
pub fn view(&self, shape: &[i64]) -> Result<Tensor<T>, FerrotorchError>
View tensor with new shape. Like PyTorch’s tensor.view(shape).
Exactly one dimension may be -1, in which case it is inferred.
Requires the tensor to be contiguous.
Sourcepub fn contiguous(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn contiguous(&self) -> Result<Tensor<T>, FerrotorchError>
Make tensor contiguous — if already contiguous, returns a cheap clone. Otherwise materializes a new contiguous buffer.
Sourcepub fn chunk(
&self,
chunks: usize,
dim: usize,
) -> Result<Vec<Tensor<T>>, FerrotorchError>
pub fn chunk( &self, chunks: usize, dim: usize, ) -> Result<Vec<Tensor<T>>, FerrotorchError>
Split tensor into chunks roughly equal pieces along dim.
Sourcepub fn split(
&self,
split_sizes: &[usize],
dim: usize,
) -> Result<Vec<Tensor<T>>, FerrotorchError>
pub fn split( &self, split_sizes: &[usize], dim: usize, ) -> Result<Vec<Tensor<T>>, FerrotorchError>
Split tensor into pieces of given sizes along dim.
Sourcepub fn fake_quantize_per_tensor_affine_t(
&self,
scale: f64,
zero_point: i64,
quant_min: i64,
quant_max: i64,
) -> Result<Tensor<T>, FerrotorchError>
pub fn fake_quantize_per_tensor_affine_t( &self, scale: f64, zero_point: i64, quant_min: i64, quant_max: i64, ) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.fake_quantize_per_tensor_affine(scale, zero_point, quant_min, quant_max) — per-tensor affine fake quantization with
autograd-tracked clipped STE backward.
Mirrors torch.fake_quantize_per_tensor_affine per
torch/overrides.py:622 torch.fake_quantize_per_tensor_affine: lambda input, scale, zero_point, quant_min, quant_max: -1 and the upstream
implementation at aten/src/ATen/native/quantized/ FakeQuantPerTensorAffine.cpp:31-40 Tensor fake_quantize_per_tensor_affine( const Tensor& self, double scale, int64_t zero_point, int64_t quant_min, int64_t quant_max). Backward per tools/autograd/derivatives.yaml:673-674 fake_quantize_per_tensor_affine_cachemask_backward(grad, mask) returning
dY * mask where the mask is 1 iff
quant_min <= round_ties_even(input/scale) + zero_point <= quant_max.
The non-test production consumer wiring for
grad_fns::quantize_grad::fake_quantize_per_tensor_affine per
R-DEFER-1: this method is the public, chainable surface that closes
the consumer requirement for the per-tensor variant (blocker #1238).
Sourcepub fn fake_quantize_per_channel_affine_t(
&self,
scale: &Tensor<T>,
zero_point: &IntTensor<i64>,
axis: i64,
quant_min: i64,
quant_max: i64,
) -> Result<Tensor<T>, FerrotorchError>
pub fn fake_quantize_per_channel_affine_t( &self, scale: &Tensor<T>, zero_point: &IntTensor<i64>, axis: i64, quant_min: i64, quant_max: i64, ) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.fake_quantize_per_channel_affine(scale, zero_point, axis, quant_min, quant_max) — per-channel affine fake quantization with
autograd-tracked clipped STE backward.
Mirrors torch.fake_quantize_per_channel_affine per
torch/overrides.py:621 torch.fake_quantize_per_channel_affine: lambda input, scale, zero_point, axis, quant_min, quant_max: -1 and the
upstream implementation at aten/src/ATen/native/quantized/ FakeQuantPerChannelAffine.cpp:32-42 Tensor fake_quantize_per_channel_affine( const Tensor& self, const Tensor& scale, const Tensor& zero_point, int64_t axis, int64_t quant_min, int64_t quant_max). Backward per
tools/autograd/derivatives.yaml fake_quantize_per_channel_affine_cachemask_backward( grad, mask) returning dY * mask where the per-channel mask is 1
iff quant_min <= round_ties_even(input/scale[c]) + zero_point[c] <= quant_max for the channel c along axis.
The non-test production consumer wiring for
grad_fns::quantize_grad::fake_quantize_per_channel_affine per
R-DEFER-1: this method is the public, chainable surface that closes
the consumer requirement for the per-channel variant (blocker #1239).
Sourcepub fn index_fill_t(
&self,
dim: i64,
index: &IntTensor<i64>,
value: f64,
) -> Result<Tensor<T>, FerrotorchError>
pub fn index_fill_t( &self, dim: i64, index: &IntTensor<i64>, value: f64, ) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.index_fill(dim, index, value) — overwrite slices along
dim at index positions with the scalar value.
Mirrors torch.index_fill(input, dim, index, value) per the upstream
docstring at torch/_torch_docs.py:6563-6567 index_fill(dim, index, value) -> Tensor [...] Out-of-place version of :meth:torch.Tensor.
index_fill_`` and torch/_tensor_docs.py:2489-2509 which gives the
canonical example
>>> x = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=torch.float)
>>> index = torch.tensor([0, 2])
>>> x.index_fill_(1, index, -1)
tensor([[-1., 2., -1.],
[-1., 5., -1.],
[-1., 8., -1.]])Upstream C++ entry at aten/src/ATen/native/TensorAdvancedIndexing.cpp: 1979 Tensor index_fill(const Tensor& self, int64_t dim, const Tensor& index, const Scalar& source) { return self.clone(at::MemoryFormat:: Preserve).index_fill_(dim, index, source); }. Registration at
torch/overrides.py:710 torch.index_fill: lambda input, dim, index, value: -1.
Backward per tools/autograd/derivatives.yaml:884-887:
- name: index_fill.int_Scalar(Tensor self, int dim, Tensor index, Scalar value) -> Tensor
/ self: grad.index_fill(dim, index, 0) /
index: non_differentiable /
result: self_t.index_fill(dim, index, 0)
— gradient is zeroed at every position the fill overwrote (those
positions were replaced by a constant and no longer depend on the
input).
dim follows PyTorch’s negative-wrapping convention (at::maybe_wrap_dim
at TensorAdvancedIndexing.cpp:1919). The index tensor must be 1-D
or scalar (upstream TORCH_CHECK(index.dim() <= 1) at :1920).
Negative index values are accepted and wrapped per upstream’s
index_fill_kernel at aten/src/ATen/native/cpu/IndexKernel.cpp: 224-229 (TORCH_CHECK_INDEX(idx >= -self_dim_size && idx < self_dim_size, ...); if (idx < 0) { idx += self_dim_size; }). Indices
strictly outside [-dim_size, dim_size) raise IndexOutOfBounds
matching upstream’s TORCH_CHECK_INDEX. A 0-d input is accepted: the
implementation mirrors upstream’s self.unsqueeze(-1) at
TensorAdvancedIndexing.cpp:1917 by treating the scalar as a length-1
1-d tensor for the fill (only dim ∈ {-1, 0} and index ∈ {-1, 0}
are in range for that case).
The non-test production consumer wiring for grad_fns::indexing:: index_fill per R-DEFER-1: this method is the public, chainable
surface that closes the consumer requirement (blocker #1249).
Sourcepub fn scatter_reduce_t(
&self,
dim: i64,
index: &[usize],
index_shape: &[usize],
src: &Tensor<T>,
reduce: &str,
include_self: bool,
) -> Result<Tensor<T>, FerrotorchError>
pub fn scatter_reduce_t( &self, dim: i64, index: &[usize], index_shape: &[usize], src: &Tensor<T>, reduce: &str, include_self: bool, ) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.scatter_reduce(dim, index, src, reduce, *, include_self=True)
— reduce-mode scatter onto a clone of self. Mirrors upstream
Tensor scatter_reduce(...) at aten/src/ATen/native/ TensorAdvancedIndexing.cpp:2354 TORCH_IMPL_FUNC(scatter_reduce_two).
reduce ∈ {"sum" SHIPPED, "prod", "amax", "amin"}; backward
is implemented only for "sum" per tools/autograd/derivatives.yaml: 3074-3077 (other modes return a no-grad tensor — the
op_db characterization sweep emits only "sum").
Non-test production consumer wiring for grad_fns::indexing:: scatter_reduce per R-DEFER-1: this method is the chainable surface.
Closes blocker #1245.
Sourcepub fn index_add_t(
&self,
dim: i64,
index: &IntTensor<i64>,
source: &Tensor<T>,
alpha: f64,
) -> Result<Tensor<T>, FerrotorchError>
pub fn index_add_t( &self, dim: i64, index: &IntTensor<i64>, source: &Tensor<T>, alpha: f64, ) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.index_add(dim, index, source, *, alpha=1) —
out = self.clone(); out[..., index[i], ...] += alpha * source[..., i, ...]
along dim. Mirrors upstream Tensor index_add(const Tensor& self, int64_t dim, const Tensor& index, const Tensor& source, const Scalar& alpha) at aten/src/ATen/native/TensorAdvancedIndexing.cpp:1153 TORCH_IMPL_FUNC(index_add_cpu_out). Backward per
tools/autograd/derivatives.yaml:862-869 self: grad / source: maybe_multiply(grad.index_select(dim, index).expand_as(source), alpha).
Non-test production consumer wiring for grad_fns::indexing:: index_add per R-DEFER-1: this method is the chainable surface.
Closes blocker #1247.
Sourcepub fn index_copy_t(
&self,
dim: i64,
index: &IntTensor<i64>,
source: &Tensor<T>,
) -> Result<Tensor<T>, FerrotorchError>
pub fn index_copy_t( &self, dim: i64, index: &IntTensor<i64>, source: &Tensor<T>, ) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.index_copy(dim, index, source) — out = self.clone(); out[..., index[i], ...] = source[..., i, ...] along dim. Mirrors
upstream Tensor index_copy(...) at aten/src/ATen/native/ TensorAdvancedIndexing.cpp:1082 TORCH_IMPL_FUNC(index_copy_out).
Backward per tools/autograd/derivatives.yaml:875-883 self: grad.index_fill(dim, index, 0) / source: grad.index_select(dim, index).expand_as(source).
Non-test production consumer wiring for grad_fns::indexing:: index_copy per R-DEFER-1: this method is the chainable surface.
Closes blocker #1248.
Sourcepub fn masked_scatter_t(
&self,
mask: &BoolTensor,
source: &Tensor<T>,
) -> Result<Tensor<T>, FerrotorchError>
pub fn masked_scatter_t( &self, mask: &BoolTensor, source: &Tensor<T>, ) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.masked_scatter(mask, source) — copy elements from
source into a clone of self at positions where mask is true,
in C-order. Mirrors upstream Tensor masked_scatter(const Tensor& self, const Tensor& mask, const Tensor& source) at
aten/src/ATen/native/TensorAdvancedIndexing.cpp:2402-2409.
Backward per tools/autograd/derivatives.yaml:1105-1108 self: grad.masked_fill(mask, 0) / source: masked_scatter_backward(...).
Non-test production consumer wiring for grad_fns::indexing:: masked_scatter per R-DEFER-1: this method is the chainable surface.
Closes blocker #1252.
Sourcepub fn take_t(
&self,
index: &IntTensor<i64>,
) -> Result<Tensor<T>, FerrotorchError>
pub fn take_t( &self, index: &IntTensor<i64>, ) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.take(index) — out[i] = self.view(-1)[index[i]], a
flat-index gather producing a tensor of shape index.shape().
Mirrors upstream Tensor take(const Tensor& self, const Tensor& index)
at aten/src/ATen/native/TensorAdvancedIndexing.cpp:1067-1071.
Backward per tools/autograd/derivatives.yaml:1766-1769 self: take_backward(grad, self, index) — scatter-add grad into a
zeros buffer at the flat index positions.
Non-test production consumer wiring for grad_fns::indexing::take
per R-DEFER-1: this method is the chainable surface.
Closes blocker #1253.
Sourcepub fn put_t(
&self,
index: &IntTensor<i64>,
source: &Tensor<T>,
accumulate: bool,
) -> Result<Tensor<T>, FerrotorchError>
pub fn put_t( &self, index: &IntTensor<i64>, source: &Tensor<T>, accumulate: bool, ) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.put(index, source, accumulate=False) — flat-index
scatter into a clone of self: out.view(-1)[index[i]] = source[i]
(or += source[i] when accumulate=true). Mirrors upstream
Tensor put(const Tensor& self, const Tensor& index, const Tensor& source, const bool accumulate) at aten/src/ATen/native/ TensorAdvancedIndexing.cpp:928-934. Backward per
tools/autograd/derivatives.yaml:1421-1424.
Non-test production consumer wiring for grad_fns::indexing::put
per R-DEFER-1: this method is the chainable surface.
Closes blocker #1254.
Sourcepub fn where_t(
&self,
condition: &[bool],
other: &Tensor<T>,
) -> Result<Tensor<T>, FerrotorchError>
pub fn where_t( &self, condition: &[bool], other: &Tensor<T>, ) -> Result<Tensor<T>, FerrotorchError>
torch.where(condition, self, other) — pointwise ternary selection
taking a host &[bool] mask. Returns a tensor where each element is
self[i] if condition[i] is true, else other[i]. Differentiable
— a WhereBackward node is attached when grad tracking is enabled
on either input.
Mirrors torch.where(condition, input, other) per
torch/_torch_docs.py:13089 and the upstream impl macro at
aten/src/ATen/native/TensorCompare.cpp:646 TORCH_IMPL_FUNC(where_out) — the self-vs-other dispatch shape.
Non-test production consumer wiring for
grad_fns::comparison::where_ per R-DEFER-1 (closes blocker #1295):
this method is the public, chainable surface that closes the
consumer requirement. The boolean-tensor variant is where_bt_t.
Sourcepub fn where_bt_t(
&self,
condition: &BoolTensor,
other: &Tensor<T>,
) -> Result<Tensor<T>, FerrotorchError>
pub fn where_bt_t( &self, condition: &BoolTensor, other: &Tensor<T>, ) -> Result<Tensor<T>, FerrotorchError>
torch.where(condition, self, other) — BoolTensor overload.
Pointwise ternary selection where condition is a first-class
BoolTensor. The condition must
match self.numel() and self.shape() == other.shape(). Delegates
to grad_fns::comparison::where_bt which validates shape +
materialises the host mask and dispatches to where_ for
autograd-aware forward.
Mirrors torch.where(cond, x, y) for cond: BoolTensor per
torch/_torch_docs.py:13089.
Non-test production consumer wiring for
grad_fns::comparison::where_bt per R-DEFER-1 (closes blocker
#1297): this method is the public, chainable surface that closes
the consumer requirement.
Sourcepub fn scatter_value_t(
&self,
dim: i64,
index: &[usize],
index_shape: &[usize],
value: T,
) -> Result<Tensor<T>, FerrotorchError>
pub fn scatter_value_t( &self, dim: i64, index: &[usize], index_shape: &[usize], value: T, ) -> Result<Tensor<T>, FerrotorchError>
torch.Tensor.scatter_(dim, index, value) (scalar-src overload) —
scatter a single scalar value into a clone of self at the
positions named by index along dim. Mirrors the upstream
scalar overload Tensor& scatter_(int64_t dim, const Tensor& index, const Scalar& value) at
aten/src/ATen/native/TensorAdvancedIndexing.cpp:2278 —
the scatter.value dispatch arm that op_db emits as a distinct
sample family alongside the tensor-src overload.
Equivalent to self.scatter_(dim, index, full_like(index, value))
but avoids the temporary src allocation. No autograd is attached
because the scalar value is not a differentiable input.
Non-test production consumer wiring for
crate::ops::indexing::scatter_value per R-DEFER-1 (closes blocker
#1258): this method is the public, chainable surface that closes
the consumer requirement.
Sourcepub fn size(&self) -> &[usize]
pub fn size(&self) -> &[usize]
Alias for shape(). Returns the tensor dimensions like PyTorch’s Tensor.size().
Sourcepub fn dim(&self) -> usize
pub fn dim(&self) -> usize
Alias for ndim(). Returns the number of dimensions like PyTorch’s Tensor.dim().
Sourcepub fn print(&self) -> &Tensor<T>
pub fn print(&self) -> &Tensor<T>
Log the tensor’s Display form and return self for chaining.
Emits a tracing::info! event on target ferrotorch::tensor. Behaviour
change vs. earlier versions: this no longer writes directly to stdout —
callers must install a tracing subscriber (e.g. tracing_subscriber)
to see the output. Library code should not write to stdout; downstream
consumers control logging policy.
Sourcepub fn argmax(
&self,
dim: Option<isize>,
) -> Result<IntTensor<i64>, FerrotorchError>
pub fn argmax( &self, dim: Option<isize>, ) -> Result<IntTensor<i64>, FerrotorchError>
Index of the maximum value (PyTorch torch.argmax), as IntTensor<i64>.
dim = None flattens and returns a 0-d index. dim = Some(d) reduces
along d (negative indices allowed). Ties resolve to the FIRST (lowest)
index. GPU-resident result when self is on CUDA.
Sourcepub fn argmin(
&self,
dim: Option<isize>,
) -> Result<IntTensor<i64>, FerrotorchError>
pub fn argmin( &self, dim: Option<isize>, ) -> Result<IntTensor<i64>, FerrotorchError>
Index of the minimum value (PyTorch torch.argmin). See Self::argmax.
Sourcepub fn index_select<I>(
&self,
dim: isize,
indices: &IntTensor<I>,
) -> Result<Tensor<T>, FerrotorchError>where
I: IntElement,
pub fn index_select<I>(
&self,
dim: isize,
indices: &IntTensor<I>,
) -> Result<Tensor<T>, FerrotorchError>where
I: IntElement,
index_select(dim, indices) (PyTorch torch.index_select) using a
GPU-resident-or-CPU IntTensor index. The indices tensor must be 1-D.
Output keeps self’s dtype; shape is self.shape with shape[dim]
replaced by indices.numel(). On CUDA, self and indices must be on
the same device; the result stays GPU-resident.
Sourcepub fn gather<I>(
&self,
dim: isize,
index: &IntTensor<I>,
) -> Result<Tensor<T>, FerrotorchError>where
I: IntElement,
pub fn gather<I>(
&self,
dim: isize,
index: &IntTensor<I>,
) -> Result<Tensor<T>, FerrotorchError>where
I: IntElement,
gather(dim, index) (PyTorch torch.gather) using a GPU-resident-or-CPU
IntTensor index. index must have the same ndim as self; output has
index’s shape and self’s dtype. On CUDA the result stays resident.
Sourcepub fn to_int<I>(&self) -> Result<IntTensor<I>, FerrotorchError>where
I: IntElement,
pub fn to_int<I>(&self) -> Result<IntTensor<I>, FerrotorchError>where
I: IntElement,
Cast this float tensor to IntTensor<I> (PyTorch .to(int)):
truncate toward zero. GPU-resident result when self is on CUDA.
Sourcepub fn as_strided(
&self,
size: &[usize],
stride: &[isize],
storage_offset: Option<usize>,
) -> Result<Tensor<T>, FerrotorchError>
pub fn as_strided( &self, size: &[usize], stride: &[isize], storage_offset: Option<usize>, ) -> Result<Tensor<T>, FerrotorchError>
Build a zero-copy view with the given shape, strides (element units),
and storage offset. If storage_offset is None, the input’s
existing offset is used.
Equivalent to torch.Tensor.as_strided(size, stride, storage_offset).
Works on any device — no data movement.
Validates that every reachable offset stays inside the underlying storage. Does not reject overlapping views: those are useful for constructing Toeplitz matrices, sliding windows, broadcast views, etc. As in torch, in-place writes against an overlapping view have undefined behaviour.
Sourcepub fn as_strided_copy(
&self,
size: &[usize],
stride: &[isize],
storage_offset: Option<usize>,
) -> Result<Tensor<T>, FerrotorchError>
pub fn as_strided_copy( &self, size: &[usize], stride: &[isize], storage_offset: Option<usize>, ) -> Result<Tensor<T>, FerrotorchError>
Materialised strided copy: returns a new contiguous tensor whose
values are the elements that as_strided(size, stride, offset) would
read.
On CUDA tensors this dispatches to the existing strided_copy_f32
/ strided_copy_f64 GPU kernels (no host bounce). On CPU it walks
the multi-index. On other devices (e.g. XPU) it returns
FerrotorchError::NotImplementedOnCuda — install a kernel before
using this on those devices.
Sourcepub fn as_strided_scatter(
&self,
src: &Tensor<T>,
size: &[usize],
stride: &[isize],
storage_offset: Option<usize>,
) -> Result<Tensor<T>, FerrotorchError>
pub fn as_strided_scatter( &self, src: &Tensor<T>, size: &[usize], stride: &[isize], storage_offset: Option<usize>, ) -> Result<Tensor<T>, FerrotorchError>
Inverse of as_strided: return a copy of self with src written
into the strided positions described by (size, stride, offset).
Positions outside that view retain self’s values.
Equivalent to torch.as_strided_scatter. The CUDA path
dispatches through the GPU backend (via the
strided_copy + strided_scatter kernels) — no host bounce.
Sourcepub fn view_reshape(
&self,
new_shape: Vec<usize>,
) -> Result<Tensor<T>, FerrotorchError>
pub fn view_reshape( &self, new_shape: Vec<usize>, ) -> Result<Tensor<T>, FerrotorchError>
Create a view of this tensor with a different shape, sharing the same underlying storage. Zero-copy — no data movement.
The new shape must have the same total number of elements. Non-contiguous tensors are materialized first (requires a copy).
Sourcepub fn view_operation(
&self,
new_shape: Vec<usize>,
grad_fn: Arc<dyn GradFn<T>>,
) -> Result<Tensor<T>, FerrotorchError>
pub fn view_operation( &self, new_shape: Vec<usize>, grad_fn: Arc<dyn GradFn<T>>, ) -> Result<Tensor<T>, FerrotorchError>
Create a zero-copy view with a grad_fn attached. Used for shape ops (squeeze, unsqueeze, reshape, etc.) that don’t change data layout. Shares the underlying storage with the source tensor.
Non-contiguous tensors are materialized first (requires a copy).
Sourcepub fn stride_view(
&self,
new_shape: Vec<usize>,
new_strides: Vec<isize>,
new_offset: usize,
) -> Tensor<T>
pub fn stride_view( &self, new_shape: Vec<usize>, new_strides: Vec<isize>, new_offset: usize, ) -> Tensor<T>
Create a zero-copy view with explicit shape, strides, and offset.
This is the lowest-level view constructor — used by permute, transpose, narrow, and other operations that change the logical layout without copying data. The caller is responsible for ensuring that the given shape + strides + offset are valid for the underlying storage.
Sourcepub fn stride_view_operation(
&self,
new_shape: Vec<usize>,
new_strides: Vec<isize>,
new_offset: usize,
grad_fn: Arc<dyn GradFn<T>>,
) -> Tensor<T>
pub fn stride_view_operation( &self, new_shape: Vec<usize>, new_strides: Vec<isize>, new_offset: usize, grad_fn: Arc<dyn GradFn<T>>, ) -> Tensor<T>
Create a zero-copy view with explicit shape, strides, and offset, with an attached gradient function for autograd.
pub fn id(&self) -> TensorId
pub fn shape(&self) -> &[usize]
pub fn ndim(&self) -> usize
pub fn numel(&self) -> usize
pub fn strides(&self) -> &[isize]
Sourcepub fn storage_offset(&self) -> usize
pub fn storage_offset(&self) -> usize
Offset (in number of elements) into the underlying storage.
Non-zero for views created by narrow, select, or other subregion ops.
Sourcepub fn storage_len(&self) -> usize
pub fn storage_len(&self) -> usize
Number of elements in the underlying storage buffer.
May be larger than numel() for views (transpose,
narrow, as_strided, etc.) that address only a subset of the
storage. Used by stride-manipulation ops (as_strided,
as_strided_copy) for bounds validation.
Sourcepub fn storage(&self) -> &TensorStorage<T>
pub fn storage(&self) -> &TensorStorage<T>
Borrow the underlying TensorStorage. Used by ops that need
access to the GPU buffer handle or to share storage Arc-wise.
pub fn device(&self) -> Device
pub fn requires_grad(&self) -> bool
pub fn is_leaf(&self) -> bool
pub fn grad_fn(&self) -> Option<&Arc<dyn GradFn<T>>>
Sourcepub fn register_hook<F>(&self, func: F) -> Result<HookHandle, FerrotorchError>
pub fn register_hook<F>(&self, func: F) -> Result<HookHandle, FerrotorchError>
Register a gradient hook on this tensor.
The hook is called during backward whenever a gradient is computed for
this tensor. It receives the gradient and may return Some(new_grad) to
replace it, or None to keep the original.
Returns a HookHandle that can
be used to remove the hook later via remove_hook.
Sourcepub fn register_post_accumulate_grad_hook<F>(
&self,
func: F,
) -> Result<HookHandle, FerrotorchError>
pub fn register_post_accumulate_grad_hook<F>( &self, func: F, ) -> Result<HookHandle, FerrotorchError>
Register a post-accumulate-grad hook on this tensor.
The hook is called after gradient accumulation completes on a leaf
tensor. It receives a reference to the tensor itself (so the hook can
read .grad()). Cannot modify the gradient — use
register_hook for that.
Sourcepub fn remove_hook(&self, handle: HookHandle) -> Result<bool, FerrotorchError>
pub fn remove_hook(&self, handle: HookHandle) -> Result<bool, FerrotorchError>
Remove a previously registered hook by its handle.
Returns true if the hook was found and removed.
Sourcepub fn grad(&self) -> Result<Option<Tensor<T>>, FerrotorchError>
pub fn grad(&self) -> Result<Option<Tensor<T>>, FerrotorchError>
Read the accumulated gradient. Returns None if no gradient has
been computed yet.
Sourcepub fn set_grad(&self, grad: Option<Tensor<T>>) -> Result<(), FerrotorchError>
pub fn set_grad(&self, grad: Option<Tensor<T>>) -> Result<(), FerrotorchError>
Set or replace the accumulated gradient.
Sourcepub fn zero_grad(&self) -> Result<(), FerrotorchError>
pub fn zero_grad(&self) -> Result<(), FerrotorchError>
Zero out the gradient of this tensor.
Equivalent to self.set_grad(None). Typically called before each
training iteration to prevent gradient accumulation across steps.
Sourcepub fn data(&self) -> Result<&[T], FerrotorchError>
pub fn data(&self) -> Result<&[T], FerrotorchError>
Borrow the underlying data as a flat slice.
Returns Err(GpuTensorNotAccessible) if the tensor is on a GPU.
Call .cpu() first to transfer it.
Returns Err if the tensor is not contiguous — the raw storage
slice would not correspond to the logical element order. Use
data_vec() or call .contiguous() first.
Sourcepub fn data_ref(&self) -> Result<&[T], FerrotorchError>
pub fn data_ref(&self) -> Result<&[T], FerrotorchError>
Borrow the underlying data as a flat slice (CPU-only alias for data()).
Identical to data() — returns a zero-copy &[T] reference
to the tensor’s storage. Returns Err(GpuTensorNotAccessible) if the
tensor lives on a GPU; call .cpu() first to transfer.
This alias exists for call-site clarity: use data_ref() when you want
to emphasise that no copy is made, vs data_vec() which always copies.
Sourcepub fn data_vec(&self) -> Result<Vec<T>, FerrotorchError>
pub fn data_vec(&self) -> Result<Vec<T>, FerrotorchError>
Get tensor data as an owned Vec<T>, transparently transferring from
GPU if needed and correctly handling non-contiguous tensors.
For contiguous CPU tensors this copies the slice. For non-contiguous CPU tensors it gathers elements in logical (C-order) sequence. For GPU tensors it performs a device-to-host transfer.
Sourcepub fn to(&self, device: Device) -> Result<Tensor<T>, FerrotorchError>
pub fn to(&self, device: Device) -> Result<Tensor<T>, FerrotorchError>
Move this tensor to a device, returning a new tensor.
If the tensor is already on the target device, returns a cheap clone (shared Arc storage).
Sourcepub fn to_pinned(&self, device: Device) -> Result<Tensor<T>, FerrotorchError>
pub fn to_pinned(&self, device: Device) -> Result<Tensor<T>, FerrotorchError>
Like to, but uses pinned (page-locked) host memory for
the CPU→CUDA transfer when applicable.
On CPU→CUDA, allocates a temporary pinned host buffer, copies the
tensor data into it, and uses DMA to transfer to the device. This is
roughly 2x faster than the regular to() path for large buffers
because it avoids one extra page-locked staging copy inside the CUDA
driver. For small buffers (< ~64KB) the pinning overhead may
outweigh the gain — measure before defaulting to this path.
Behaves identically to to for CPU→CPU, CUDA→CPU, and
cross-GPU paths (which all bypass pinned memory).
Used by ferrotorch_data::DataLoader when pin_memory(true) is set
alongside a target device.
Sourcepub fn cuda(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn cuda(&self) -> Result<Tensor<T>, FerrotorchError>
Move to CUDA device 0.
Sourcepub fn cpu(&self) -> Result<Tensor<T>, FerrotorchError>
pub fn cpu(&self) -> Result<Tensor<T>, FerrotorchError>
Move to CPU.
Sourcepub fn to_dtype<U>(&self) -> Result<Tensor<U>, FerrotorchError>where
U: Float,
pub fn to_dtype<U>(&self) -> Result<Tensor<U>, FerrotorchError>where
U: Float,
Cast this tensor to a different float dtype, preserving device + shape.
U: Float — any of f32 / f64 / bf16 / f16. PyTorch parity:
tensor.to(dtype) / tensor.to(torch.float32).
- Same dtype (
T == U): zero-copyArc-shared clone. - CPU: per-element cast via
crate::numeric_cast::cast(fallible — returnsErr(InvalidArgument)if a finite source value saturates to±∞in a narrower target, per issue #815). - GPU: dispatched through
crate::gpu_dispatch::GpuBackend::cast_f_to_f; stays GPU-resident. Initial implementation coversbf16 ↔ f32(issue #29); other float pairs returnErruntil the follow-up issue lands.
§Autograd
The returned tensor has requires_grad = false regardless of self.
A CastBackward grad_fn that propagates gradients through the cast is
follow-up work tracked alongside the remaining float-pair kernels.
Sourcepub fn is_meta(&self) -> bool
pub fn is_meta(&self) -> bool
Returns true if this tensor is on the meta device (no backing data).
Sourcepub fn meta_fill_value(&self) -> Option<&T>
pub fn meta_fill_value(&self) -> Option<&T>
Recorded fill value for a meta tensor, if it was constructed with one
(e.g. via crate::creation::full_meta). Returns None for any
non-meta tensor and for meta tensors created without a fill (e.g.
via crate::creation::zeros_meta / crate::creation::ones_meta
/ crate::creation::meta_like).
Meta tensors carry no element-wise data, so the per-element fill
cannot be read back — this is metadata only — but it lets callers
distinguish a full_meta(shape, 2.5) tensor from a full_meta(shape, 0.0) tensor (or from a plain zeros_meta(shape)), which closes the
“_value is silently ignored” gap.
Sourcepub fn is_xpu(&self) -> bool
pub fn is_xpu(&self) -> bool
Returns true if this tensor is on an XPU (CubeCL / Intel GPU) device.
Sourcepub fn gpu_handle(&self) -> Result<&GpuBufferHandle, FerrotorchError>
pub fn gpu_handle(&self) -> Result<&GpuBufferHandle, FerrotorchError>
Get the GPU buffer handle. Returns Err for CPU tensors.
Sourcepub fn masked_fill(
&self,
mask: &BoolTensor,
value: T,
) -> Result<Tensor<T>, FerrotorchError>
pub fn masked_fill( &self, mask: &BoolTensor, value: T, ) -> Result<Tensor<T>, FerrotorchError>
masked_fill(mask, value) — out[i] = mask[i] ? value : self[i],
returning a new tensor of the same shape (mask convention “true → fill”,
matching torch.Tensor.masked_fill). mask must have the same numel as
self and live on the same device.
When both self and mask are CUDA-resident, the fill runs on the GPU
(real PTX kernel dispatched on self’s dtype) and the result stays
GPU-resident — NO host crossing (crosslink #1185 Phase 3c). Otherwise it
takes the CPU path. Carries a MaskedFillBackward grad_fn when grad is
required.
Sourcepub fn masked_select(
&self,
mask: &BoolTensor,
) -> Result<Tensor<T>, FerrotorchError>
pub fn masked_select( &self, mask: &BoolTensor, ) -> Result<Tensor<T>, FerrotorchError>
masked_select(mask) — return a 1-D tensor of the elements of self
where mask is true, in flat C-order (torch.Tensor.masked_select).
On CUDA (self + mask resident, same device) this runs a GPU stream compaction; the result stays GPU-resident. The single output-length integer crosses to the host to size the data-dependent output (the result shape, not a data round-trip — PyTorch parity).
Sourcepub unsafe fn data_mut(&self) -> Result<&mut [T], FerrotorchError>
pub unsafe fn data_mut(&self) -> Result<&mut [T], FerrotorchError>
Borrow the underlying data as a mutable flat slice.
§Safety
The caller must ensure exclusive access to this tensor’s storage.
No other references to this tensor’s data may exist concurrently.
Optimizer step() methods satisfy this requirement: they run inside
no_grad() (no graph is being built) and hold &mut self (exclusive
access to the optimizer’s parameter copies).
Sourcepub unsafe fn update_data(&self, new_data: &[T]) -> Result<(), FerrotorchError>
pub unsafe fn update_data(&self, new_data: &[T]) -> Result<(), FerrotorchError>
Write new_data into this tensor’s storage, preserving tensor identity.
- CPU: copies data into the existing storage Vec.
- GPU: uploads data to GPU and replaces the storage buffer.
This is the device-transparent alternative to data_mut() for
optimizer step implementations.
§Safety
Same requirements as data_mut() — caller must ensure exclusive
access. No concurrent reads or writes to this tensor’s storage may
exist. Optimizer step() methods satisfy this by running inside
no_grad() with &mut self.
Sourcepub unsafe fn update_storage_and_shape(
&self,
new_storage: TensorStorage<T>,
new_shape: Vec<usize>,
) -> Result<(), FerrotorchError>
pub unsafe fn update_storage_and_shape( &self, new_storage: TensorStorage<T>, new_shape: Vec<usize>, ) -> Result<(), FerrotorchError>
Replace this tensor’s storage AND shape/strides in-place, matching
PyTorch’s Tensor.resize_(new_shape) + storage swap.
This is the rare case where both the underlying buffer and the
shape metadata in TensorInner need to change in lockstep — used
by the out= write path of torch.add(a, b, *, out=out) when
out.shape() != broadcast_shape (PyTorch silently resizes out,
with a deprecation warning, in current versions). The new strides
are computed as C-contiguous for new_shape.
§Safety
Same as [update_storage]: caller must ensure exclusive access.
The new storage’s numel must equal new_shape.iter().product().
The new storage must reside on the same device as the tensor.
The caller must also guarantee that no other Tensor clone is
concurrently observing this tensor’s shape — Tensor is
Arc<TensorInner>-shared, and a resize changes the observable
shape for every clone. This is the same invariant update_storage
already implicitly relies on for buffer-length changes; the
out=-style call sites this method exists for own a unique
&Tensor for the duration of the write.
Sourcepub unsafe fn update_storage(
&self,
new_storage: TensorStorage<T>,
) -> Result<(), FerrotorchError>
pub unsafe fn update_storage( &self, new_storage: TensorStorage<T>, ) -> Result<(), FerrotorchError>
Replace this tensor’s storage with a new TensorStorage in-place.
Used by GPU-native optimizer steps that compute the updated parameter entirely on-device and need to swap the underlying buffer without a CPU round-trip.
§Safety
Same as [update_data]: caller must ensure exclusive access. The new
storage must have the same number of elements as the tensor and reside
on the same device.
Sourcepub fn with_gpu_handle_mut<R>(
&self,
f: impl FnOnce(&mut GpuBufferHandle) -> Result<R, FerrotorchError>,
) -> Result<R, FerrotorchError>
pub fn with_gpu_handle_mut<R>( &self, f: impl FnOnce(&mut GpuBufferHandle) -> Result<R, FerrotorchError>, ) -> Result<R, FerrotorchError>
Run f with mutable access to this tensor’s underlying
[GpuBufferHandle], in-place.
This is a safe wrapper for the optimizer fast-path that fuses the
parameter update directly into a GPU kernel: the kernel needs an
&mut GpuBufferHandle aliased into the param tensor’s storage, but
Tensor is Arc-shared so a naïve &self -> &mut Storage route
requires unsafe at every call site. By centralizing the
Arc::as_ptr -> *mut TensorStorage<T> cast inside this single
method and returning Err(FerrotorchError::DeviceUnavailable) for
non-GPU storage, callers do not need to write any unsafe of their
own.
§Errors
Returns FerrotorchError::DeviceUnavailable when this tensor’s
storage is not GPU-resident.
§Safety contract (encapsulated)
This method is safe because the caller cannot violate any
invariant exposed through it: the closure receives a fresh
&mut GpuBufferHandle whose lifetime is bounded by the body of
this method, and no other reference to the storage can be created
concurrently from within the closure (the closure is FnOnce).
The only remaining hazard is concurrent access to the same Arc
from another thread — Tensor is not Sync for storage mutation
purposes, and the optimizer step that drives this method holds
&mut self on the outer Optimizer for the whole step, so no
other handle can be observing or mutating this storage during the
call. This is the same exclusive-access guarantee that
[update_data] and [update_storage] depend on; this method
simply lets the optimizer mutate the GPU handle in place rather
than swap the entire TensorStorage.
Sourcepub fn detach(&self) -> Tensor<T>
pub fn detach(&self) -> Tensor<T>
Detach this tensor from the computation graph, returning a new tensor that shares storage but has no grad_fn.
Sourcepub fn is_contiguous(&self) -> bool
pub fn is_contiguous(&self) -> bool
Whether this tensor is contiguous in memory (C-order).
Dimensions with size 1 can have any stride without affecting contiguity, since they contribute no index offset.
Sourcepub fn is_contiguous_for(&self, format: MemoryFormat) -> bool
pub fn is_contiguous_for(&self, format: MemoryFormat) -> bool
Check whether this tensor is contiguous in a specific memory format.
MemoryFormat::Contiguous— standard C-order (NCHW for 4D).MemoryFormat::ChannelsLast— NHWC stride pattern for 4D tensors.MemoryFormat::ChannelsLast3d— NDHWC stride pattern for 5D tensors.
Dimensions of size 1 are treated as matching any stride, consistent with PyTorch behaviour.
[CL-309] WU-05: channels-last memory format support
Sourcepub fn to_memory_format(
&self,
format: MemoryFormat,
) -> Result<Tensor<T>, FerrotorchError>
pub fn to_memory_format( &self, format: MemoryFormat, ) -> Result<Tensor<T>, FerrotorchError>
Rearrange this tensor to the target memory format.
If the tensor is already contiguous in the target format, returns a cheap clone (shared storage). Otherwise, physically rearranges the data and returns a new tensor with the correct strides.
The shape is never changed — only the strides (and possibly the underlying data order) are altered.
[CL-309] WU-05: channels-last memory format support
Sourcepub fn contiguous_in(
&self,
format: MemoryFormat,
) -> Result<Tensor<T>, FerrotorchError>
pub fn contiguous_in( &self, format: MemoryFormat, ) -> Result<Tensor<T>, FerrotorchError>
Return a tensor that is contiguous in the given memory format, materializing (copying) the data if necessary.
Equivalent to .to_memory_format(format) — both names are provided
for API familiarity: contiguous() is the PyTorch-style entry point
while to_memory_format() is the explicit variant.
[CL-309] WU-05: channels-last memory format support
Sourcepub fn item(&self) -> Result<T, FerrotorchError>
pub fn item(&self) -> Result<T, FerrotorchError>
For a scalar tensor, extract the single value.
Sourcepub fn is_same(&self, other: &Tensor<T>) -> bool
pub fn is_same(&self, other: &Tensor<T>) -> bool
Returns true if two tensors are the same object (same Arc).
Sourcepub fn inner_storage_arc(&self) -> &Arc<TensorStorage<T>>
pub fn inner_storage_arc(&self) -> &Arc<TensorStorage<T>>
Get a reference to the inner storage Arc.
Exposed for optimizer kernels that need to modify the param’s GPU
buffer in-place via unsafe pointer cast (same pattern as
update_data).
Trait Implementations§
Auto Trait Implementations§
impl<T> !RefUnwindSafe for Buffer<T>
impl<T> !UnwindSafe for Buffer<T>
impl<T> Freeze for Buffer<T>
impl<T> Send for Buffer<T>
impl<T> Sync for Buffer<T>
impl<T> Unpin for Buffer<T>
impl<T> UnsafeUnpin for Buffer<T>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T, U> Imply<T> for U
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more