Skip to main content

Crate ferrotorch

Crate ferrotorch 

Source
Expand description

ferrotorch — PyTorch-shaped deep learning framework in Rust.

This crate is the umbrella re-export crate. Sub-crates own the actual implementation; this crate exists so users can use ferrotorch::*; (or use ferrotorch::prelude::*;) and pick up the canonical public surface in one import.

§Examples

use ferrotorch::{FerrotorchResult, zeros};

fn main() -> FerrotorchResult<()> {
    let t = zeros::<f32>(&[2, 3])?;
    assert_eq!(t.shape(), &[2, 3]);
    Ok(())
}

See the prelude module for the items most users want, and the per-feature modules (nn, optim, data, vision, train, serialize, jit, jit_script, distributions, profiler, hub, tokenize, gpu, cubecl, mps, xpu, distributed, llama, ml) for sub-crate access.

Lint baseline mirrors the per-crate convention used across the workspace (ferrotorch-core, ferrotorch-jit, ferrotorch-cubecl, etc.). Workspace [lints] is intentionally not used — every crate carries its own #![warn/deny(...)] so the policy lives next to the code it governs.

§REQ status (per .design/ferrotorch/lib.md)

REQStatusEvidence
REQ-1SHIPPEDimpl: crate //! doc-comment at top of ferrotorch/src/lib.rs mirrors torch/__init__.py:1-9 module docstring; consumer: rustdoc no_run example block in the same docstring (run by cargo test -p ferrotorch --doc).
REQ-2SHIPPEDimpl: pub use ferrotorch_core::*; at ferrotorch/src/lib.rs mirrors torch/__init__.py:68-141 flat __all__; consumer: doctest at the top of this file imports ferrotorch::{FerrotorchResult, zeros} directly.
REQ-3SHIPPEDimpl: pub mod prelude { ... } in ferrotorch/src/lib.rs mirrors from torch import nn, optim convention; consumer: the //! doc-comment promises use ferrotorch::prelude::*; as the canonical one-import entry point — published-crate API contract.
REQ-4SHIPPEDimpl: always-on pub mod nn / pub mod optim / pub mod data / pub mod vision in ferrotorch/src/lib.rs mirror torch.nn / torch.optim / torch.utils.data / torchvision; consumer: ferrotorch/tests/public_surface.rs:22-25 compile-time pins each path (test harness for the public-API contract is the contract auditor); downstream-of-workspace: crates.io/ferrotorch users.
REQ-5SHIPPEDimpl: 11 #[cfg(feature = "<flag>")] pub mod <name> blocks in ferrotorch/src/lib.rs mirror upstream optional namespaces; consumer: ferrotorch/Cargo.toml:15-43 enumerates each matching feature flag; the published-crate contract is the boundary consumer per goal.md S5.
REQ-6SHIPPEDimpl: #[cfg(not(target_env = "msvc"))] #[global_allocator] static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc; in ferrotorch/src/lib.rs; consumer: every binary linking the ferrotorch crate (e.g. cargo run --example train_mnist -p ferrotorch via ferrotorch/examples/train_mnist.rs) picks up the allocator via the rustc #[global_allocator] mechanism.
REQ-7SHIPPEDimpl: lint baseline block #![warn(clippy::all, clippy::pedantic)] #![deny(rust_2018_idioms, missing_debug_implementations)] #![allow(missing_docs)] in ferrotorch/src/lib.rs; consumer: cargo clippy -p ferrotorch --lib -- -D warnings gates every commit per goal.md Step 7.
REQ-8SHIPPEDimpl: llama-cuda = ["llama", "gpu", "ferrotorch-llama/cuda"] at ferrotorch/Cargo.toml:42; consumer: the //! doc-comment in this file references llama-cuda as a documented feature combination; published-crate users are the boundary consumer.

Closes #1346.

Modules§

autograd
REQ status (per .design/ferrotorch-core/autograd/mod.md)
bool_tensor
Boolean tensors for masks and logical operations. (#596)
complex_tensor
ComplexTensor<T> — first-class complex-valued tensors. (#618)
cpu_pool
CPU tensor buffer pool — caching allocator for host memory.
creation
Module doc — REQ status table follows.
data
Data loading, datasets, samplers, and transforms.
device
REQ status (per .design/ferrotorch-core/device.md)
dispatch
Multi-dispatch key system for composable tensor backends. CL-397.
distributions
Probability distributions for sampling and variational inference.
dtype
REQ status (per .design/ferrotorch-core/dtype.md)
dtype_dispatch
Dtype-generic GPU dispatch for Float tensors.
einops
Einops-style tensor rearrangement operations.
einsum
Einstein summation (einsum) for ferrotorch tensors.
error
REQ status (per .design/ferrotorch-core/error.md)
fft
FFT operations for tensors.
flex_attention
Flexible attention with customizable score modification.
gpu_dispatch
GPU backend dispatch layer.
grad_fns
Module-root dispatch for the autograd-tracking wrapper layer.
hub
Model hub for downloading and caching pretrained models.
int_tensor
Integer-typed tensors for indexing, embedding lookups, and any other workload that needs first-class non-float storage. (#596)
jit
JIT tracing, IR graph, optimization passes, and code generation.
jit_script
#[script] proc macro for source-based graph capture.
linalg
Advanced linear algebra operations bridging to ferray-linalg.
masked
Masked tensors — torch.masked.MaskedTensor analog.
meta_propagate
Helpers for propagating the meta device through tensor operations.
named_tensor
NamedTensor<T> — dim-name annotations on top of Tensor<T>. (#621)
nested
NestedTensor and PackedNestedTensor — ragged (jagged) tensors that mirror torch.nested.nested_tensor (aten/src/ATen/native/nested/) + the jagged-layout NJT (torch/nested/_internal/nested_tensor.py).
nn
Neural network modules and layers.
numeric_cast
Fallible numeric conversions used across the workspace.
ops
Kernel-layer op-module declarations. Mirrors aten/src/ATen/native/’s directory-as-namespace convention. Each declared sub-module is the forward-only (no autograd) op family for its area; the autograd wrappers live in ferrotorch-core/src/grad_fns/.
optim
Optimizers and learning rate schedulers.
prelude
Prelude module — import everything commonly needed.
profiler
Performance profiling and Chrome trace export.
profiler_hook
Thread-local profiler hook for auto-instrumented tensor ops.
pruning
REQ status (per .design/ferrotorch-core/pruning.md)
quantize
Post-training quantization (PTQ) for ferrotorch tensors.
rng
Thread-local seeded random number generator state, mirroring torch.manual_seed / torch.Generator.
serialize
Model serialization: ONNX export, PyTorch import, safetensors, GGUF.
shape
REQ status (per .design/ferrotorch-core/shape.md)
signal
Signal-processing utilities.
simd_reduce
Torch-matching f32 reduction primitives.
sparse
REQ status (per .design/ferrotorch-core/sparse.md)
special
Special mathematical functions (torch.special equivalent).
storage
REQ status (per .design/ferrotorch-core/storage.md)
stride_tricks
as_strided family — direct stride manipulation on tensors.
tensor
REQ status (per .design/ferrotorch-core/tensor.md)
tokenize
HuggingFace tokenizer wrapper (BPE, WordPiece, Unigram).
train
Training loop, Learner, callbacks, and metrics.
vision
Computer vision models, datasets, and transforms.
vmap
Vectorized map (vmap) — apply a function over a batch dimension.

Macros§

dispatch_floating_dtype
Dispatch to one of three closures based on the static type T, returning Err(FerrotorchError::NotImplementedOnCuda { op }) for any dtype other than f32, f64, or half::bf16.

Structs§

AnomalyMode
Global anomaly detection mode.
AsStridedBackward
VJP for as_strided(input, size, stride, offset).
BoolTensor
Contiguous tensor of booleans, device-aware.
ComplexTensor
CPU-resident, contiguous, structure-of-arrays complex tensor.
CooTensor
A 2-D sparse tensor in COO (Coordinate List) format with separate row and column index arrays.
CscTensor
2-D sparse tensor in CSC (Compressed Sparse Column) format. Dual of CsrTensor: instead of storing row pointers + column indices, stores column pointers (col_ptrs, length ncols + 1) and row indices for each non-zero. Efficient for column slicing and A^T x style ops.
CsrTensor
A 2-D sparse tensor in CSR (Compressed Sparse Row) format.
CumExtremeResult
Result of cummax / cummin: values tensor and indices tensor.
DispatchKeySet
A set of active DispatchKeys, stored as a u16 bitmask for constant-time membership testing and iteration.
Dispatcher
A kernel registration table keyed by (op_name, dispatch_key). Looking up a kernel is a single HashMap probe.
DualTensor
A dual-number tensor: primal + epsilon * tangent.
FakeQuantize
Simulates quantization during training by quantizing and immediately dequantizing values, while allowing gradients to flow through via the straight-through estimator (STE).
ForwardBacktrace
A captured forward-pass backtrace, stored on tensors when anomaly mode is on.
Generator
Per-process / per-thread seeded RNG state, mirroring torch.Generator.
HistogramObserver
Histogram-based observer that collects a distribution of values.
HookHandle
An opaque handle returned by register_hook / register_post_accumulate_grad_hook.
IntTensor
Contiguous tensor of integers (i32 or i64), device-aware.
MaskedTensor
A tensor paired with a boolean mask.
MinMaxObserver
Tracks the running min/max of observed values.
NamedTensor
A Tensor<T> paired with one optional dim name per axis.
NestedTensor
A nested (ragged) tensor — a collection of tensors with differing sizes along one dimension (the “ragged” dimension).
PackedNestedTensor
A nested (jagged) tensor stored as one contiguous flat buffer with an offsets array marking the start of each component.
PerChannelMinMaxObserver
Tracks per-channel running min/max of observed values.
QParams
Computed quantization parameters (scale and zero_point).
QatLayer
A layer with associated FakeQuantize modules for QAT.
QatModel
Wraps a collection of named weight tensors for quantization-aware training.
QuantizedTensor
A tensor stored in quantized (integer) representation.
SemiStructuredSparseTensor
A tensor stored in the NVIDIA 2:4 structured sparsity format.
SparseGrad
A sparse gradient: a list of (index, value) pairs that an optimizer applies to a dense parameter tensor. Mirrors the coalesced form of torch.Tensor.is_sparse gradients used by nn.Embedding(sparse=True) and consumed by optim.SparseAdam / optim.SGD.
SparseTensor
A sparse tensor in COO (Coordinate List) format.
Tensor
The central type. A dynamically-shaped tensor with gradient tracking and device placement.
TensorId
A unique, monotonically increasing tensor identifier.
TensorStorage
The underlying data buffer for a tensor, tagged with its device.

Enums§

AutocastCategory
Policy: which operations should be cast to reduced precision.
AutocastDtype
The reduced-precision dtype used during autocast regions.
DType
Runtime descriptor for the element type stored in an array.
Device
Device on which a tensor’s data resides.
DispatchKey
One of the 16 possible dispatch keys, ordered from lowest to highest priority. The u8 repr matches the bit position in DispatchKeySet’s internal u16 bitmask, so the priority ordering is both the enum declaration order and the numeric order of the discriminants.
EinopsReduction
Reduction operation for reduce.
FerrotorchError
Errors produced by ferrotorch operations.
FftNorm
Normalization mode for FFT operations, matching NumPy’s norm parameter.
GeluApproximate
Selects the GELU approximation method.
MemoryFormat
Describes the physical memory layout of a tensor.
MeshIndexing
Cartesian-indexing convention for meshgrid_indexing.
QuantDtype
Target integer dtype for quantized storage.
QuantScheme
Granularity of quantization parameters (scale / zero_point).
StorageBuffer
Device-specific data buffer.

Traits§

Element
Trait bound for types that can be stored in a ferray array.
Float
Marker trait for float element types that support autograd.
GradFn
The backward function trait for reverse-mode automatic differentiation.
IntElement
Element types supported by IntTensor.
Observer
Trait for quantization observers that collect data statistics.

Functions§

airy_ai
Airy function of the first kind Ai(x). Mirrors torch.special.airy_ai (torch/special/__init__.py:982-985); scalar evaluator ports the Cephes multi-region kernel from aten/src/ATen/native/cuda/Math.cuh:1280-1459. airy_ai(0) = 0.3550280538878172; oscillatory for x < -2.09, decaying for x > 0; airy_ai(+/-inf) = NaN (the isinf short-circuit at Math.cuh:1360-1362), airy_ai(x > 103.892) = 0.
apply_2_4_mask
Apply 2:4 structured sparsity mask.
arange
Create a 1-D tensor with values from start to end (exclusive) with step step.
as_strided
Zero-copy strided view; see Tensor::as_strided for full docs.
as_strided_copy
Materialised strided copy; see Tensor::as_strided_copy for full docs.
as_strided_scatter
Inverse of as_strided; see Tensor::as_strided_scatter for full docs.
atan2
Differentiable element-wise atan2(y, x). Forward mirrors aten/src/ATen/native/BinaryOps.cpp:795 TORCH_IMPL_FUNC(atan2_out). The argument order matches torch.atan2(input, other) per torch/_torch_docs.py, where input == y and other == x (the result is the angle whose tangent equals y/x, with quadrant-aware sign per IEEE-754 atan2).
autocast
Execute a closure with mixed-precision autocast enabled.
autocast_dtype
Returns the target dtype for autocast regions on this thread.
autocast_guard
Primary entry point for op implementations to query autocast policy.
backward
Compute gradients of all leaf tensors that contribute to root.
backward_with_grad
Run backward pass through the computation graph.
beta
Beta function B(a, b) = exp(lnB(a, b)) = Γ(a)Γ(b)/Γ(a + b), element-wise over a broadcast of a and b.
broadcast_shapes
Compute the broadcasted shape of two shapes, following NumPy/PyTorch rules.
broadcast_tensors
broadcast_tensors(tensors) — expand every input to their common broadcast shape.
broadcast_to
broadcast_to(input, shape) — broadcast input to shape; a literal alias of expand.
bucketize
Discretize input values into buckets defined by boundaries.
cat
Concatenate tensors along an axis.
cdist
Pairwise distance matrix between two sets of vectors.
check_gradient_anomaly
Check a gradient tensor for NaN or Inf values (anomaly check).
chunk_t
Split tensor into chunks roughly equal pieces along dim.
clamp
Differentiable elementwise clamp: c[i] = x[i].clamp(min, max).
column_stack
column_stack(tensors) — stack 1-D/0-D tensors as columns of a 2-D matrix.
cond
Conditional subgraph execution.
contiguous_t
Make tensor contiguous (copy data if needed).
copysign
Differentiable element-wise copysign(magnitude, sign). Returns a tensor with the magnitude of magnitude and the sign of sign. Mirrors aten/src/ATen/native/BinaryOps.cpp:865 copysign_out. Backward: gradient flows to magnitude scaled by sign_factor = result / magnitude (zeroed where magnitude == 0); gradient to sign is identically zero.
cos
Differentiable elementwise cosine: c[i] = cos(x[i]).
cummax
Cumulative maximum along dim.
cummin
Cumulative minimum along dim.
cumprod
Differentiable cumulative product along dim.
cumsum
Differentiable cumulative sum along dim.
dequantize
Dequantize back to a floating-point tensor.
detect_anomaly
Execute a closure with anomaly detection enabled.
diag
Extract the diagonal of a 2-D tensor, or construct a 2-D diagonal matrix from a 1-D tensor.
diagflat
Construct a diagonal matrix from a 1-D tensor (flattened if needed).
digamma
Digamma function: psi(x) = d/dx ln(Gamma(x)).
dstack
dstack(tensors) — stack tensors depth-wise (along dim 2 after promoting each to ≥3-D).
dual_add
Forward rule for addition: d(a + b) = da + db.
dual_cos
Forward rule for cos: d(cos(a)) = -da * sin(a).
dual_div
Forward rule for division: d(a / b) = (da * b - a * db) / b^2.
dual_exp
Forward rule for exp: d(exp(a)) = da * exp(a).
dual_log
Forward rule for log: d(log(a)) = da / a.
dual_matmul
Forward rule for matrix multiplication: d(A @ B) = dA @ B + A @ dB.
dual_mul
Forward rule for multiplication: d(a * b) = a * db + da * b.
dual_neg
Forward rule for negation: d(-a) = -da.
dual_relu
Forward rule for ReLU: d(relu(a)) = da * (a > 0).
dual_sigmoid
Forward rule for sigmoid: d(sigmoid(a)) = da * sigmoid(a) * (1 - sigmoid(a)).
dual_sin
Forward rule for sin: d(sin(a)) = da * cos(a).
dual_sub
Forward rule for subtraction: d(a - b) = da - db.
dual_tanh
Forward rule for tanh: d(tanh(a)) = da * (1 - tanh(a)^2).
einsum
Einstein summation.
einsum_differentiable
Differentiable Einstein summation. If any input requires grad and grad is enabled, attaches [EinsumBackward].
enable_grad
Re-enable gradient computation inside a no_grad block.
entr
Entropy entr(x): x > 0 -> -x*log(x), x == 0 -> 0, x < 0 -> -inf, NaN -> NaN. Mirrors torch.special.entr (torch/special/__init__.py:67; kernel aten/src/ATen/native/cuda/Math.cuh:463-480).
erf
Error function: erf(x) = (2/sqrt(pi)) * integral(0, x, exp(-t^2) dt).
erfc
Complementary error function: erfc(x) = 1 - erf(x).
erfinv
Inverse error function: erfinv(erf(x)) = x.
exp
Differentiable elementwise exponential: c[i] = exp(x[i]).
expand
Broadcast (expand) a tensor to new_shape.
expand_as
expand_as(input, other) — broadcast input to the shape of other.
expm1
exp(x) - 1 – numerically stable for small x.
eye
Create an identity matrix of size n x n.
fake_quantize_differentiable
Backward-compatible alias for fake_quantize_per_tensor_affine.
fft
1-D complex-to-complex FFT along the last dimension (default norm).
fft2
2-D FFT (complex-to-complex) along the last two spatial dimensions (default s/dim/norm). Thin wrapper over fft2_norm.
fft2_differentiable
Differentiable 2-D FFT (default s/dim/norm). Attaches Fft2Backward.
fft2_differentiable_norm
Differentiable 2-D FFT with explicit s / dim / norm (#1294).
fft2_norm
2-D FFT with explicit s / dim / norm (#1294).
fft_differentiable
Differentiable 1-D FFT (default dim/norm). Attaches FftBackward.
fft_differentiable_norm
Differentiable 1-D FFT with explicit dim / norm (#1294). Attaches a FftBackward that threads the adjoint norm/dim. Matches torch.fft.fft.
fft_norm
1-D complex-to-complex FFT with explicit dim and norm (#1294).
fftfreq
Discrete Fourier Transform sample frequencies.
fftn
N-dimensional complex-to-complex FFT.
fftn_differentiable
Differentiable N-D FFT (default norm). Attaches FftnBackward.
fftn_differentiable_norm
Differentiable N-D FFT with explicit norm (#1294). Matches torch.fft.fftn; axes is torch’s dim.
fftn_norm
N-dimensional complex-to-complex FFT with explicit norm (#1294).
fftshift
Shift the zero-frequency component to the center along the given axes.
fixed_point
Find a fixed point of f starting from x0, then compute its derivative w.r.t. params using the implicit function theorem.
flex_attention
Compute flexible multi-head attention with an optional score modification function.
flip
Reverse the order of elements along each axis in dims.
fliplr
fliplr(input) — flip a (≥2-D) tensor left-to-right (along dim 1).
flipud
flipud(input) — flip a (≥1-D) tensor up-to-down (along dim 0).
from_slice
Create a tensor from a slice, copying the data.
from_vec
Create a tensor from a Vec<T>, taking ownership.
full
Create a tensor filled with a given value.
full_like
Create a tensor filled with value with the same shape as other.
gammainc
Regularized lower incomplete gamma P(a, x), element-wise over a broadcast of input (the a argument) and other (the x argument).
gammaincc
Regularized upper incomplete gamma Q(a, x) = 1 - P(a, x), element-wise over a broadcast of input (the a argument) and other (the x argument).
gammaln_sign
Sign of the gamma function Γ(x) — the ±1 (or NaN at poles) factor that lgamma = ln|Γ| discards, element-wise over input.
gather
Gather values from input along dim using index.
gelu
Compute gelu(x) with the default exact (erf-based) approximation.
gelu_with
Compute gelu(x) with configurable approximation, attaching a backward node when gradients are enabled.
grad
Compute gradients of outputs with respect to inputs.
grad_norm
Compute the L2 norm of gradients of outputs with respect to inputs.
gradient_penalty
Compute the gradient penalty for WGAN-GP.
hessian
Compute the Hessian matrix of a scalar function at a point.
hfft
1-D FFT of a Hermitian-symmetric complex spectrum, returning real output.
hfft2
2-D FFT of a Hermitian-symmetric spectrum, returning real output (torch.fft.hfft2).
hfft2_norm
2-D Hermitian FFT with explicit norm (#1294).
hfft_differentiable
Differentiable Hermitian FFT (complex spectrum → real signal). Attaches HfftBackward when grad is needed.
hfft_norm
1-D Hermitian FFT with explicit dim and norm (#1294).
hfftn
N-D FFT of a Hermitian-symmetric spectrum, returning real output (torch.fft.hfftn). Generalizes hfft / hfft2 to arbitrary axes.
hfftn_norm
N-D Hermitian FFT with explicit norm (#1294).
histc
Histogram — count elements in equal-width bins.
hstack
hstack(tensors) — stack tensors column-wise.
hypot
Differentiable element-wise hypot(x, y) = sqrt(x^2 + y^2) with the overflow-safe accumulation provided by num_traits::Float::hypot (delegates to f32::hypot / f64::hypot). Mirrors aten/src/ATen/native/BinaryOps.cpp:548 hypot_out. Backward: grad_x = grad * x / result; grad_y = grad * y / result, with result == 0 -> 0 masking (matching the upstream behavior in derivatives.yaml:814-817 whose grad * self / result is implicitly degenerate at the origin — we mask to a safe zero rather than producing NaN, which differs from torch’s literal IEEE 0/0 output at the (0,0) tie only; the divergence is filed as documentation, not a parity blocker).
i0
Modified Bessel function of the first kind, order 0: i0(x). Even function; i0(0) = 1, i0(+/-inf) = +inf, i0(NaN) = NaN. Mirrors torch.special.i0 / torch.i0 (torch/special/__init__.py:522); the scalar evaluator ports the Cephes chbevl Chebyshev kernel from aten/src/ATen/native/cuda/Math.cuh:502-555.
i0e
Exponentially-scaled modified Bessel order 0: i0e(x) = exp(-|x|) I0(x). Even; i0e(0) = 1, i0e(+/-inf) = 0 (stays finite where i0 overflows), i0e(NaN) = NaN. Mirrors torch.special.i0e (torch/special/__init__.py:548); scalar evaluator ports calc_i0e (aten/src/ATen/native/Math.h:101-145) — same Chebyshev sets as i0 without the exp(x) factor.
i1
Modified Bessel function of the first kind, order 1: i1(x). Odd function (sign follows x); i1(0) = 0, i1(+inf) = +inf, i1(-inf) = -inf, i1(NaN) = NaN. Mirrors torch.special.i1 / torch.i1; scalar evaluator ports i1_string (aten/src/ATen/native/cuda/Math.cuh:575-622).
i1e
Exponentially-scaled modified Bessel order 1: i1e(x) = exp(-|x|) I1(x). Odd; i1e(0) = 0, i1e(+/-inf) = +/-0, i1e(NaN) = NaN. Mirrors torch.special.i1e (torch/special/__init__.py:598); scalar evaluator ports calc_i1e (aten/src/ATen/native/cuda/Math.cuh:647-696) — same Chebyshev sets as i1 without the exp(x) factor.
ifft
1-D inverse FFT along the last dimension (default norm).
ifft2
2-D inverse FFT (complex-to-complex) along the last two spatial dimensions (default s/dim/norm). Thin wrapper over ifft2_norm.
ifft2_differentiable
Differentiable 2-D inverse FFT (default s/dim/norm). Attaches Ifft2Backward.
ifft2_differentiable_norm
Differentiable 2-D inverse FFT with explicit s / dim / norm (#1294).
ifft2_norm
2-D inverse FFT with explicit s / dim / norm (#1294).
ifft_differentiable
Differentiable 1-D inverse FFT (default dim/norm). Attaches IfftBackward.
ifft_differentiable_norm
Differentiable 1-D inverse FFT with explicit dim / norm (#1294).
ifft_norm
1-D inverse FFT with explicit dim and norm (#1294).
ifftn
N-dimensional inverse complex FFT.
ifftn_differentiable
Differentiable N-D inverse FFT (default norm). Attaches IfftnBackward.
ifftn_differentiable_norm
Differentiable N-D inverse FFT with explicit norm (#1294).
ifftn_norm
N-dimensional inverse complex FFT with explicit norm (#1294).
ifftshift
Inverse of fftshift.
ihfft
1-D inverse FFT of a real signal, returning a Hermitian-symmetric spectrum.
ihfft2
2-D inverse FFT of a real signal, returning a Hermitian-symmetric spectrum (torch.fft.ihfft2).
ihfft2_norm
2-D inverse Hermitian FFT with explicit norm (#1294).
ihfft_differentiable
Differentiable inverse Hermitian FFT (real signal → Hermitian spectrum). Attaches IhfftBackward when grad is needed.
ihfft_norm
1-D inverse Hermitian FFT with explicit dim and norm (#1294).
ihfftn
N-D inverse FFT of a real signal, returning a Hermitian-symmetric spectrum (torch.fft.ihfftn). Generalizes ihfft / ihfft2 to arbitrary axes.
ihfftn_norm
N-D inverse Hermitian FFT with explicit norm (#1294).
irfft
1-D complex-to-real inverse FFT (default norm).
irfft2
2-D complex-to-real inverse FFT (torch.fft.irfft2).
irfft2_norm
2-D complex-to-real inverse FFT with explicit norm (#1294).
irfft_differentiable
Differentiable 1-D inverse real FFT (default dim/norm). Attaches IrfftBackward.
irfft_differentiable_norm
Differentiable 1-D inverse real FFT with explicit dim / norm (#1294).
irfft_norm
1-D complex-to-real inverse FFT with explicit dim and norm (#1294).
irfftn
N-dimensional complex-to-real inverse FFT.
irfftn_differentiable
Differentiable N-D inverse real FFT (default norm). Attaches IrfftnBackward.
irfftn_differentiable_norm
Differentiable N-D inverse real FFT with explicit norm (#1294). Matches torch.fft.irfftn.
irfftn_norm
N-dimensional complex-to-real inverse FFT with explicit norm (#1294).
is_autocast_debug
Returns true if autocast debug event recording is active on this thread.
is_autocast_enabled
Returns true if mixed-precision autocast is currently enabled on this thread.
is_grad_enabled
Returns true if gradient tracking is currently enabled on this thread.
jacfwd
Compute the full Jacobian matrix using forward-mode AD.
jacobian
Compute the Jacobian matrix of a function at a point.
jvp
Compute the Jacobian-vector product (JVP): J @ v.
jvp_exact
Compute the exact Jacobian-vector product using forward-mode AD.
lgamma
Log-gamma function: lgamma(x) = log(|Gamma(x)|).
linspace
Create a 1-D tensor of num evenly spaced values from start to end (inclusive).
log
Differentiable elementwise natural log: c[i] = ln(x[i]).
log1p
log(1 + x) – numerically stable for small x.
log_beta
Log-beta function lnB(a, b) = lgamma(a) + lgamma(b) - lgamma(a + b), element-wise over a broadcast of a and b.
logcumsumexp
Differentiable log-cumulative-sum-exp along dim.
magnitude_prune
Unstructured magnitude pruning: zero out the smallest weights.
manual_seed
Set the current thread’s default RNG seed — mirrors torch.manual_seed at torch/random.py:46.
masked_count
Number of valid (unmasked) entries; returns a 0-d tensor in T.
masked_equal
Mask out entries equal to value. Matches numpy.ma.masked_equal.
masked_invalid
Mask out non-finite entries (NaN, ±∞). Matches numpy.ma.masked_invalid.
masked_max
Max of valid entries; returns a 0-d tensor (NaN if all masked).
masked_mean
Mean of valid entries; returns a 0-d tensor.
masked_min
Min of valid entries; returns a 0-d tensor (NaN if all masked).
masked_select
masked_select(input, mask) — return a 1-D tensor of the elements of input where mask is true, in flat C-order. Mirrors torch.masked_select. mask must have the same numel as input.
masked_sum
Sum of valid entries; returns a 0-d tensor.
masked_where
Wrap data with condition interpreted as “where condition is true, mask the value out”. Matches numpy.ma.masked_where. The resulting MaskedTensor has mask = !condition under the torch convention.
max_with_dim
Differentiable (values, indices) = max(input, dim, keepdim) with the PyTorch named-tuple return. Mirrors torch.max(input, dim, keepdim) at aten/src/ATen/native/ReduceOps.cpp max.dim overload. NaN propagation per SharedReduceOps.h:26-34. Backward scatters grad to the input positions identified by indices. Closes #1302 (max).
mean_dim
Mean along a specific dimension.
median_with_dim
Differentiable (values, indices) = median(input, dim, keepdim) with the PyTorch named-tuple return. Mirrors torch.median(input, dim, keepdim) at aten/src/ATen/native/Sorting.cpp:503 median_with_indices_impl (ignore_nan = false: a NaN in the slice poisons the result). Backward scatters grad to the input positions identified by indices via the shared MaxMinDimBackward. Closes #1306 (median).
meshgrid
Create coordinate grids from 1-D coordinate vectors.
meshgrid_indexing
Create coordinate grids from 1-D coordinate vectors with an explicit MeshIndexing convention.
min_with_dim
Differentiable (values, indices) = min(input, dim, keepdim) — symmetric to max_with_dim. Closes #1302 (min).
modified_bessel_k0
Modified Bessel function of the second kind, order 0: k0(x). Domain x > 0: k0(0) = +inf, k0(x < 0) = NaN, k0(NaN) = NaN. Decays to 0 for large x. Mirrors torch.special.modified_bessel_k0 (torch/special/__init__.py:1304-1341); scalar evaluator ports modified_bessel_k0_forward (aten/src/ATen/native/cuda/Math.cuh:2503-2577) over the shared chbevl Clenshaw evaluator and the batch-2 i0.
modified_bessel_k1
Modified Bessel function of the second kind, order 1: k1(x). Domain x > 0: k1(0) = +inf, k1(x < 0) = NaN, k1(NaN) = NaN. Mirrors torch.special.modified_bessel_k1 (torch/special/__init__.py:1321-1358); scalar evaluator ports modified_bessel_k1_forward (aten/src/ATen/native/cuda/Math.cuh:2661-2736) over chbevl and the batch-2 i1.
moveaxis
moveaxis(input, source, destination) — a literal alias of movedim.
movedim
movedim(input, source, destination) — reposition the dims listed in source to the indices listed in destination.
multigammaln
Multivariate log-gamma log Γ_p(a) with dimension p, element-wise over input:
mvlgamma
Alias for multigammaln — mirrors torch.mvlgamma(input, p) (torch/_torch_docs.py:7895, “Alias for torch.special.multigammaln”).
nanmedian_with_dim
Differentiable (values, indices) = nanmedian(input, dim, keepdim) — NaN-skipping counterpart of median_with_dim. Mirrors torch.nanmedian(input, dim, keepdim) (ignore_nan = true): NaNs are excluded from the median rank computation. Closes #1306 (nanmedian).
ndtr
Standard-normal CDF ndtr(x) = (1 + erf(x/sqrt(2))) / 2. Mirrors torch.special.ndtr (torch/special/__init__.py:624; kernel aten/src/ATen/native/UnaryOps.cpp:715-718). Composed over the shipped erf so ndtr(-inf) = 0, ndtr(0) = 0.5, ndtr(+inf) = 1, ndtr(NaN) = NaN.
ndtri
Inverse standard-normal CDF (quantile function) ndtri(p). Domain (0, 1): ndtri(0) = -inf, ndtri(1) = +inf, ndtri(p<0 || p>1) = NaN. Mirrors torch.special.ndtri (torch/special/__init__.py:649); the implementation ports the Cephes rational from aten/src/ATen/native/cuda/Math.cuh:48-173 (NOT sqrt(2)*erfinv(2p-1)) for ULP parity with torch.
nested_scaled_dot_product_attention
Scaled dot-product attention over nested tensors.
nextafter
Differentiable element-wise nextafter(a, b): the next representable floating-point value after a in the direction of b. Forward mirrors aten/src/ATen/native/BinaryOps.cpp:551 nextafter_out (CPU kernel std::nextafter). Backward per derivatives.yaml:1322-1324 routes grad to a where a != b (zero on the a == b tie); gradient to b is zero.
no_grad
Execute a closure with gradient tracking disabled.
norm_with_dim
Differentiable p-norm along a dimension: result = (sum(|x|^p, dim))^(1/p). Mirrors aten/src/ATen/native/ReduceOps.cpp linalg_vector_norm / the Tensor::norm(p, dim, keepdim) overload. Backward per tools/autograd/derivatives.yaml norm.ScalarOpt_dim. Closes #1308.
normalize_axis
Normalize a possibly-negative axis index to a positive one.
ones
Create a tensor filled with ones.
ones_like
Create a tensor of ones with the same shape as other.
permute_t
Permute tensor dimensions. Like PyTorch’s tensor.permute(dims).
prepare_qat
Prepare a set of named parameters for quantization-aware training.
quantize
Quantize a floating-point tensor.
quantize_named_tensors
Quantize every weight tensor in a module, returning a name -> QuantizedTensor map suitable for serialization or quantized inference.
quantized_matmul
Multiply two quantized 2-D matrices and return a quantized result.
rand
Create a tensor with random values uniformly distributed in [0, 1).
rand_like
Create a random tensor [0,1) with the same shape as other.
rand_on_device
Device-aware uniform-[0, 1) random tensor creation.
randn
Create a tensor with random values from a standard normal distribution.
randn_like
Create a random normal tensor with the same shape as other.
randn_on_device
Device-aware standard-normal random tensor creation.
rearrange
Rearrange tensor dimensions using an einops-style pattern.
rearrange_with
Rearrange with explicit axis sizes for ambiguous splits.
reduce
Reduce along axes that appear on the left but not the right.
repeat
Repeat tensor elements along new or existing axes.
repeat_interleave
repeat_interleave(input, repeats, dim) — repeat each element repeats times consecutively along dim.
rfft
1-D real-to-complex FFT along the last dimension (default norm).
rfft2
2-D real-to-complex FFT (torch.fft.rfft2).
rfft2_norm
2-D real-to-complex FFT with explicit norm (#1294).
rfft_differentiable
Differentiable 1-D real FFT (default dim/norm). Attaches RfftBackward.
rfft_differentiable_norm
Differentiable 1-D real FFT with explicit dim / norm (#1294).
rfft_norm
1-D real-to-complex FFT with explicit dim and norm (#1294).
rfftfreq
Sample frequencies for rfft (non-negative half).
rfftn
N-dimensional real-to-complex FFT.
rfftn_differentiable
Differentiable N-D real FFT (default norm). Attaches RfftnBackward.
rfftn_differentiable_norm
Differentiable N-D real FFT with explicit norm (#1294). Matches torch.fft.rfftn.
rfftn_norm
N-dimensional real-to-complex FFT with explicit norm (#1294).
roll
Roll (circular shift) a tensor along a dimension.
rot90
rot90(input, k, dims) — rotate a tensor 90° k times in the plane spanned by dims.
scalar
Create a scalar (0-D) tensor.
scaled_modified_bessel_k0
Exponentially-scaled modified Bessel order 0: scaled_modified_bessel_k0(x) = exp(x) * k0(x). Same domain as modified_bessel_k0; stays finite (-> sqrt(pi/(2x))) where k0 underflows. Mirrors torch.special.scaled_modified_bessel_k0 (torch/special/__init__.py:1304-1341); ports scaled_modified_bessel_k0_forward (aten/src/ATen/native/cuda/Math.cuh:2582-2656).
scaled_modified_bessel_k1
Exponentially-scaled modified Bessel order 1: scaled_modified_bessel_k1(x) = exp(x) * k1(x). Same domain as modified_bessel_k1. Mirrors torch.special.scaled_modified_bessel_k1 (torch/special/__init__.py:1321-1358); ports scaled_modified_bessel_k1_forward (aten/src/ATen/native/cuda/Math.cuh:2740-2815).
scan
Sequential state accumulation (scan / fold with outputs).
scatter
Scatter src values into a clone of input along dim using index.
scatter_add
Scatter-add src values into a clone of input along dim.
scatter_add_segments
Segmented scatter-add of a [E, D] source into an [dim_size, D] output, indexed along dim 0 by index[e].
searchsorted
Find insertion indices for values in a sorted 1-D boundaries tensor.
select
Extract a single slice along dim at position index, removing the dimension.
set_autocast_debug
Enable or disable autocast event recording on this thread.
set_grad_enabled
Programmatically set whether gradients are enabled.
sigmoid
Compute sigmoid(x), attaching a backward node when gradients are enabled.
signbit
Non-differentiable element-wise signbit(x). Returns a BoolTensor where each element is true iff the corresponding input is negative (sign bit set), matching f32::is_sign_negative / f64::is_sign_negative. Bool output is not differentiable — there is no derivatives.yaml entry.
sin
Differentiable elementwise sine: c[i] = sin(x[i]).
sinc
Normalized sinc function: sinc(x) = sin(pix) / (pix), with sinc(0) = 1.
sparse_matmul_24
Matrix multiply a @ b where b is stored in 2:4 semi- structured format. The last-dim strides of b’s original dense shape must be a multiple of 4 (guaranteed by SemiStructuredSparseTensor::compress).
sparsity_ratio
Compute the sparsity ratio of a tensor: fraction of exact zeros.
spherical_bessel_j0
Spherical Bessel function of the first kind, order 0: j0(x) = sin(x)/x, with j0(0) = 1 (the Taylor branch) and j0(+/-inf) = 0. j0(NaN) = NaN. Mirrors torch.special.spherical_bessel_j0 (torch/special/__init__.py:1444+); scalar evaluator ports spherical_bessel_j0_forward (aten/src/ATen/native/cuda/Math.cuh:3039-3052): |x| < 0.5 uses the explicit 6-term Taylor series, else sin(x)/x.
split_t
Split tensor into pieces of given sizes along dim.
stack
Stack a slice of tensors along a new dimension dim.
sum_dim
Sum along a specific dimension.
swapaxes
swapaxes(input, axis0, axis1) — swap two axes; a literal alias of transpose.
swapdims
swapdims(input, dim0, dim1) — swap two dims; a literal alias of transpose.
tanh
Compute tanh(x), attaching a backward node when gradients are enabled.
tensor
Create a 1-D tensor from a slice (shape inferred).
tensor_split
tensor_split(input, indices, dim) — split input at the given integer indices along dim (the indices form section boundaries).
tile
tile(input, reps) — NumPy-style tile.
topk
Return the k largest elements and their indices along the last dimension.
tril
Lower triangular part of a tensor with at least 2 dimensions.
triu
Upper triangular part of a tensor with at least 2 dimensions.
unbind
unbind(input, dim) — split input into size(dim) slices, removing dim from each.
unflatten
unflatten(input, dim, sizes) — reshape a single dimension dim into the multiple sizes sizes, leaving every other dimension untouched.
unique
Return the sorted unique elements of a 1-D tensor.
unique_consecutive
Remove consecutive duplicate elements from a 1-D tensor.
validate_cond_branches
Validate that two sets of outputs have matching shapes.
view_t
View tensor with new shape. Like PyTorch’s tensor.view(shape).
vjp
Compute the vector-Jacobian product (VJP): v^T @ J.
vmap
Vectorize a function over a batch dimension.
vmap2
Vectorize a two-argument function over batch dimensions.
vstack
vstack(tensors) — stack tensors row-wise (along dim 0 after promoting each to ≥2-D).
where_cond
Ternary selection: output[i] = condition[i] ? x[i] : y[i].
where_cond_bt
Ternary selection taking a [BoolTensor] condition: output[i] = cond[i] ? x[i] : y[i]. Mirrors torch.where(cond, x, y).
xlogy
x * log(y), with the convention that xlogy(0, y) = 0 for any y.
zeros
Create a tensor filled with zeros.
zeros_like
Create a tensor of zeros with the same shape as other.
zeta
Hurwitz zeta function zeta(x, q) = sum_{k=0}^inf (k + q)^{-x}, element-wise over a broadcast of input (the x exponent) and other (the q shift). Mirrors torch.special.zeta(input, other) (torch/special/__init__.py); scalar evaluator ports the Cephes Hurwitz-zeta kernel from aten/src/ATen/native/cuda/Math.cuh:299-383. Edge ladder: x == 1 -> +inf; x < 1 -> NaN; q <= 0 non-positive integer -> +inf; q <= 0 non-integer with non-integer x -> NaN. zeta(2, 1) == pi^2/6.

Type Aliases§

FerrotorchResult
Convenience alias for ferrotorch results.
Kernel
A dispatched kernel: takes the op’s input tensors, the currently-active keyset (after all higher-priority keys have been resolved), and a reference to the dispatcher so the kernel can redispatch to a lower-priority key.