Expand description
ferrotorch — PyTorch-shaped deep learning framework in Rust.
This crate is the umbrella re-export crate. Sub-crates own the actual
implementation; this crate exists so users can use ferrotorch::*; (or
use ferrotorch::prelude::*;) and pick up the canonical public surface
in one import.
§Examples
use ferrotorch::{FerrotorchResult, zeros};
fn main() -> FerrotorchResult<()> {
let t = zeros::<f32>(&[2, 3])?;
assert_eq!(t.shape(), &[2, 3]);
Ok(())
}See the prelude module for the items most users want, and the per-feature
modules (nn, optim, data, vision, train, serialize, jit,
jit_script, distributions, profiler, hub, tokenize, gpu,
cubecl, mps, xpu, distributed, llama, ml) for sub-crate access.
Lint baseline mirrors the per-crate convention used across the workspace
(ferrotorch-core, ferrotorch-jit, ferrotorch-cubecl, etc.). Workspace
[lints] is intentionally not used — every crate carries its own
#![warn/deny(...)] so the policy lives next to the code it governs.
§REQ status (per .design/ferrotorch/lib.md)
| REQ | Status | Evidence |
|---|---|---|
| REQ-1 | SHIPPED | impl: crate //! doc-comment at top of ferrotorch/src/lib.rs mirrors torch/__init__.py:1-9 module docstring; consumer: rustdoc no_run example block in the same docstring (run by cargo test -p ferrotorch --doc). |
| REQ-2 | SHIPPED | impl: pub use ferrotorch_core::*; at ferrotorch/src/lib.rs mirrors torch/__init__.py:68-141 flat __all__; consumer: doctest at the top of this file imports ferrotorch::{FerrotorchResult, zeros} directly. |
| REQ-3 | SHIPPED | impl: pub mod prelude { ... } in ferrotorch/src/lib.rs mirrors from torch import nn, optim convention; consumer: the //! doc-comment promises use ferrotorch::prelude::*; as the canonical one-import entry point — published-crate API contract. |
| REQ-4 | SHIPPED | impl: always-on pub mod nn / pub mod optim / pub mod data / pub mod vision in ferrotorch/src/lib.rs mirror torch.nn / torch.optim / torch.utils.data / torchvision; consumer: ferrotorch/tests/public_surface.rs:22-25 compile-time pins each path (test harness for the public-API contract is the contract auditor); downstream-of-workspace: crates.io/ferrotorch users. |
| REQ-5 | SHIPPED | impl: 11 #[cfg(feature = "<flag>")] pub mod <name> blocks in ferrotorch/src/lib.rs mirror upstream optional namespaces; consumer: ferrotorch/Cargo.toml:15-43 enumerates each matching feature flag; the published-crate contract is the boundary consumer per goal.md S5. |
| REQ-6 | SHIPPED | impl: #[cfg(not(target_env = "msvc"))] #[global_allocator] static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc; in ferrotorch/src/lib.rs; consumer: every binary linking the ferrotorch crate (e.g. cargo run --example train_mnist -p ferrotorch via ferrotorch/examples/train_mnist.rs) picks up the allocator via the rustc #[global_allocator] mechanism. |
| REQ-7 | SHIPPED | impl: lint baseline block #![warn(clippy::all, clippy::pedantic)] #![deny(rust_2018_idioms, missing_debug_implementations)] #![allow(missing_docs)] in ferrotorch/src/lib.rs; consumer: cargo clippy -p ferrotorch --lib -- -D warnings gates every commit per goal.md Step 7. |
| REQ-8 | SHIPPED | impl: llama-cuda = ["llama", "gpu", "ferrotorch-llama/cuda"] at ferrotorch/Cargo.toml:42; consumer: the //! doc-comment in this file references llama-cuda as a documented feature combination; published-crate users are the boundary consumer. |
Closes #1346.
Modules§
- autograd
- REQ status (per
.design/ferrotorch-core/autograd/mod.md) - bool_
tensor - Boolean tensors for masks and logical operations. (#596)
- complex_
tensor ComplexTensor<T>— first-class complex-valued tensors. (#618)- cpu_
pool - CPU tensor buffer pool — caching allocator for host memory.
- creation
- Module doc — REQ status table follows.
- data
- Data loading, datasets, samplers, and transforms.
- device
- REQ status (per
.design/ferrotorch-core/device.md) - dispatch
- Multi-dispatch key system for composable tensor backends. CL-397.
- distributions
- Probability distributions for sampling and variational inference.
- dtype
- REQ status (per
.design/ferrotorch-core/dtype.md) - dtype_
dispatch - Dtype-generic GPU dispatch for
Floattensors. - einops
- Einops-style tensor rearrangement operations.
- einsum
- Einstein summation (
einsum) for ferrotorch tensors. - error
- REQ status (per
.design/ferrotorch-core/error.md) - fft
- FFT operations for tensors.
- flex_
attention - Flexible attention with customizable score modification.
- gpu_
dispatch - GPU backend dispatch layer.
- grad_
fns - Module-root dispatch for the autograd-tracking wrapper layer.
- hub
- Model hub for downloading and caching pretrained models.
- int_
tensor - Integer-typed tensors for indexing, embedding lookups, and any other workload that needs first-class non-float storage. (#596)
- jit
- JIT tracing, IR graph, optimization passes, and code generation.
- jit_
script #[script]proc macro for source-based graph capture.- linalg
- Advanced linear algebra operations bridging to ferray-linalg.
- masked
- Masked tensors —
torch.masked.MaskedTensoranalog. - meta_
propagate - Helpers for propagating the meta device through tensor operations.
- named_
tensor NamedTensor<T>— dim-name annotations on top ofTensor<T>. (#621)- nested
NestedTensorandPackedNestedTensor— ragged (jagged) tensors that mirrortorch.nested.nested_tensor(aten/src/ATen/native/nested/) + the jagged-layout NJT (torch/nested/_internal/nested_tensor.py).- nn
- Neural network modules and layers.
- numeric_
cast - Fallible numeric conversions used across the workspace.
- ops
- Kernel-layer op-module declarations. Mirrors
aten/src/ATen/native/’s directory-as-namespace convention. Each declared sub-module is the forward-only (no autograd) op family for its area; the autograd wrappers live inferrotorch-core/src/grad_fns/. - optim
- Optimizers and learning rate schedulers.
- prelude
- Prelude module — import everything commonly needed.
- profiler
- Performance profiling and Chrome trace export.
- profiler_
hook - Thread-local profiler hook for auto-instrumented tensor ops.
- pruning
- REQ status (per
.design/ferrotorch-core/pruning.md) - quantize
- Post-training quantization (PTQ) for ferrotorch tensors.
- rng
- Thread-local seeded random number generator state, mirroring
torch.manual_seed/torch.Generator. - serialize
- Model serialization: ONNX export,
PyTorchimport, safetensors, GGUF. - shape
- REQ status (per
.design/ferrotorch-core/shape.md) - signal
- Signal-processing utilities.
- simd_
reduce - Torch-matching f32 reduction primitives.
- sparse
- REQ status (per
.design/ferrotorch-core/sparse.md) - special
- Special mathematical functions (
torch.specialequivalent). - storage
- REQ status (per
.design/ferrotorch-core/storage.md) - stride_
tricks as_stridedfamily — direct stride manipulation on tensors.- tensor
- REQ status (per
.design/ferrotorch-core/tensor.md) - tokenize
HuggingFacetokenizer wrapper (BPE,WordPiece, Unigram).- train
- Training loop, Learner, callbacks, and metrics.
- vision
- Computer vision models, datasets, and transforms.
- vmap
- Vectorized map (vmap) — apply a function over a batch dimension.
Macros§
- dispatch_
floating_ dtype - Dispatch to one of three closures based on the static type
T, returningErr(FerrotorchError::NotImplementedOnCuda { op })for any dtype other thanf32,f64, orhalf::bf16.
Structs§
- Anomaly
Mode - Global anomaly detection mode.
- AsStrided
Backward - VJP for
as_strided(input, size, stride, offset). - Bool
Tensor - Contiguous tensor of booleans, device-aware.
- Complex
Tensor - CPU-resident, contiguous, structure-of-arrays complex tensor.
- CooTensor
- A 2-D sparse tensor in COO (Coordinate List) format with separate row and column index arrays.
- CscTensor
- 2-D sparse tensor in CSC (Compressed Sparse Column) format. Dual of
CsrTensor: instead of storing row pointers + column indices, stores column pointers (col_ptrs, lengthncols + 1) and row indices for each non-zero. Efficient for column slicing andA^T xstyle ops. - CsrTensor
- A 2-D sparse tensor in CSR (Compressed Sparse Row) format.
- CumExtreme
Result - Result of
cummax/cummin: values tensor and indices tensor. - Dispatch
KeySet - A set of active
DispatchKeys, stored as au16bitmask for constant-time membership testing and iteration. - Dispatcher
- A kernel registration table keyed by
(op_name, dispatch_key). Looking up a kernel is a single HashMap probe. - Dual
Tensor - A dual-number tensor:
primal + epsilon * tangent. - Fake
Quantize - Simulates quantization during training by quantizing and immediately dequantizing values, while allowing gradients to flow through via the straight-through estimator (STE).
- Forward
Backtrace - A captured forward-pass backtrace, stored on tensors when anomaly mode is on.
- Generator
- Per-process / per-thread seeded RNG state, mirroring
torch.Generator. - Histogram
Observer - Histogram-based observer that collects a distribution of values.
- Hook
Handle - An opaque handle returned by
register_hook/register_post_accumulate_grad_hook. - IntTensor
- Contiguous tensor of integers (
i32ori64), device-aware. - Masked
Tensor - A tensor paired with a boolean mask.
- MinMax
Observer - Tracks the running min/max of observed values.
- Named
Tensor - A
Tensor<T>paired with one optional dim name per axis. - Nested
Tensor - A nested (ragged) tensor — a collection of tensors with differing sizes along one dimension (the “ragged” dimension).
- Packed
Nested Tensor - A nested (jagged) tensor stored as one contiguous flat buffer with an offsets array marking the start of each component.
- PerChannel
MinMax Observer - Tracks per-channel running min/max of observed values.
- QParams
- Computed quantization parameters (scale and zero_point).
- QatLayer
- A layer with associated FakeQuantize modules for QAT.
- QatModel
- Wraps a collection of named weight tensors for quantization-aware training.
- Quantized
Tensor - A tensor stored in quantized (integer) representation.
- Semi
Structured Sparse Tensor - A tensor stored in the NVIDIA 2:4 structured sparsity format.
- Sparse
Grad - A sparse gradient: a list of (index, value) pairs that an optimizer
applies to a dense parameter tensor. Mirrors the
coalesced form oftorch.Tensor.is_sparsegradients used bynn.Embedding(sparse=True)and consumed byoptim.SparseAdam/optim.SGD. - Sparse
Tensor - A sparse tensor in COO (Coordinate List) format.
- Tensor
- The central type. A dynamically-shaped tensor with gradient tracking and device placement.
- Tensor
Id - A unique, monotonically increasing tensor identifier.
- Tensor
Storage - The underlying data buffer for a tensor, tagged with its device.
Enums§
- Autocast
Category - Policy: which operations should be cast to reduced precision.
- Autocast
Dtype - The reduced-precision dtype used during autocast regions.
- DType
- Runtime descriptor for the element type stored in an array.
- Device
- Device on which a tensor’s data resides.
- Dispatch
Key - One of the 16 possible dispatch keys, ordered from lowest to
highest priority. The
u8repr matches the bit position inDispatchKeySet’s internalu16bitmask, so the priority ordering is both the enum declaration order and the numeric order of the discriminants. - Einops
Reduction - Reduction operation for
reduce. - Ferrotorch
Error - Errors produced by ferrotorch operations.
- FftNorm
- Normalization mode for FFT operations, matching
NumPy’snormparameter. - Gelu
Approximate - Selects the GELU approximation method.
- Memory
Format - Describes the physical memory layout of a tensor.
- Mesh
Indexing - Cartesian-indexing convention for
meshgrid_indexing. - Quant
Dtype - Target integer dtype for quantized storage.
- Quant
Scheme - Granularity of quantization parameters (scale / zero_point).
- Storage
Buffer - Device-specific data buffer.
Traits§
- Element
- Trait bound for types that can be stored in a ferray array.
- Float
- Marker trait for float element types that support autograd.
- GradFn
- The backward function trait for reverse-mode automatic differentiation.
- IntElement
- Element types supported by
IntTensor. - Observer
- Trait for quantization observers that collect data statistics.
Functions§
- airy_ai
- Airy function of the first kind
Ai(x). Mirrorstorch.special.airy_ai(torch/special/__init__.py:982-985); scalar evaluator ports the Cephes multi-region kernel fromaten/src/ATen/native/cuda/Math.cuh:1280-1459.airy_ai(0) = 0.3550280538878172; oscillatory forx < -2.09, decaying forx > 0;airy_ai(+/-inf) = NaN(theisinfshort-circuit atMath.cuh:1360-1362),airy_ai(x > 103.892) = 0. - apply_
2_ 4_ mask - Apply 2:4 structured sparsity mask.
- arange
- Create a 1-D tensor with values from
starttoend(exclusive) with stepstep. - as_
strided - Zero-copy strided view; see
Tensor::as_stridedfor full docs. - as_
strided_ copy - Materialised strided copy; see
Tensor::as_strided_copyfor full docs. - as_
strided_ scatter - Inverse of
as_strided; seeTensor::as_strided_scatterfor full docs. - atan2
- Differentiable element-wise
atan2(y, x). Forward mirrorsaten/src/ATen/native/BinaryOps.cpp:795 TORCH_IMPL_FUNC(atan2_out). The argument order matchestorch.atan2(input, other)pertorch/_torch_docs.py, whereinput == yandother == x(the result is the angle whose tangent equalsy/x, with quadrant-aware sign per IEEE-754atan2). - autocast
- Execute a closure with mixed-precision autocast enabled.
- autocast_
dtype - Returns the target dtype for autocast regions on this thread.
- autocast_
guard - Primary entry point for op implementations to query autocast policy.
- backward
- Compute gradients of all leaf tensors that contribute to
root. - backward_
with_ grad - Run backward pass through the computation graph.
- beta
- Beta function
B(a, b) = exp(lnB(a, b)) = Γ(a)Γ(b)/Γ(a + b), element-wise over a broadcast ofaandb. - broadcast_
shapes - Compute the broadcasted shape of two shapes, following NumPy/PyTorch rules.
- broadcast_
tensors broadcast_tensors(tensors)— expand every input to their common broadcast shape.- broadcast_
to broadcast_to(input, shape)— broadcastinputtoshape; a literal alias ofexpand.- bucketize
- Discretize
inputvalues into buckets defined byboundaries. - cat
- Concatenate tensors along an axis.
- cdist
- Pairwise distance matrix between two sets of vectors.
- check_
gradient_ anomaly - Check a gradient tensor for NaN or Inf values (anomaly check).
- chunk_t
- Split tensor into
chunksroughly equal pieces alongdim. - clamp
- Differentiable elementwise clamp:
c[i] = x[i].clamp(min, max). - column_
stack column_stack(tensors)— stack 1-D/0-D tensors as columns of a 2-D matrix.- cond
- Conditional subgraph execution.
- contiguous_
t - Make tensor contiguous (copy data if needed).
- copysign
- Differentiable element-wise
copysign(magnitude, sign). Returns a tensor with the magnitude ofmagnitudeand the sign ofsign. Mirrorsaten/src/ATen/native/BinaryOps.cpp:865 copysign_out. Backward: gradient flows tomagnitudescaled bysign_factor = result / magnitude(zeroed wheremagnitude == 0); gradient tosignis identically zero. - cos
- Differentiable elementwise cosine:
c[i] = cos(x[i]). - cummax
- Cumulative maximum along
dim. - cummin
- Cumulative minimum along
dim. - cumprod
- Differentiable cumulative product along
dim. - cumsum
- Differentiable cumulative sum along
dim. - dequantize
- Dequantize back to a floating-point tensor.
- detect_
anomaly - Execute a closure with anomaly detection enabled.
- diag
- Extract the diagonal of a 2-D tensor, or construct a 2-D diagonal matrix from a 1-D tensor.
- diagflat
- Construct a diagonal matrix from a 1-D tensor (flattened if needed).
- digamma
- Digamma function: psi(x) = d/dx ln(Gamma(x)).
- dstack
dstack(tensors)— stack tensors depth-wise (along dim 2 after promoting each to ≥3-D).- dual_
add - Forward rule for addition:
d(a + b) = da + db. - dual_
cos - Forward rule for cos:
d(cos(a)) = -da * sin(a). - dual_
div - Forward rule for division:
d(a / b) = (da * b - a * db) / b^2. - dual_
exp - Forward rule for exp:
d(exp(a)) = da * exp(a). - dual_
log - Forward rule for log:
d(log(a)) = da / a. - dual_
matmul - Forward rule for matrix multiplication:
d(A @ B) = dA @ B + A @ dB. - dual_
mul - Forward rule for multiplication:
d(a * b) = a * db + da * b. - dual_
neg - Forward rule for negation:
d(-a) = -da. - dual_
relu - Forward rule for ReLU:
d(relu(a)) = da * (a > 0). - dual_
sigmoid - Forward rule for sigmoid:
d(sigmoid(a)) = da * sigmoid(a) * (1 - sigmoid(a)). - dual_
sin - Forward rule for sin:
d(sin(a)) = da * cos(a). - dual_
sub - Forward rule for subtraction:
d(a - b) = da - db. - dual_
tanh - Forward rule for tanh:
d(tanh(a)) = da * (1 - tanh(a)^2). - einsum
- Einstein summation.
- einsum_
differentiable - Differentiable Einstein summation. If any input requires grad and grad
is enabled, attaches [
EinsumBackward]. - enable_
grad - Re-enable gradient computation inside a
no_gradblock. - entr
- Entropy
entr(x):x > 0 -> -x*log(x),x == 0 -> 0,x < 0 -> -inf,NaN -> NaN. Mirrorstorch.special.entr(torch/special/__init__.py:67; kernelaten/src/ATen/native/cuda/Math.cuh:463-480). - erf
- Error function: erf(x) = (2/sqrt(pi)) * integral(0, x, exp(-t^2) dt).
- erfc
- Complementary error function: erfc(x) = 1 - erf(x).
- erfinv
- Inverse error function: erfinv(erf(x)) = x.
- exp
- Differentiable elementwise exponential:
c[i] = exp(x[i]). - expand
- Broadcast (expand) a tensor to
new_shape. - expand_
as expand_as(input, other)— broadcastinputto the shape ofother.- expm1
- exp(x) - 1 – numerically stable for small x.
- eye
- Create an identity matrix of size
n x n. - fake_
quantize_ differentiable - Backward-compatible alias for
fake_quantize_per_tensor_affine. - fft
- 1-D complex-to-complex FFT along the last dimension (default
norm). - fft2
- 2-D FFT (complex-to-complex) along the last two spatial dimensions
(default
s/dim/norm). Thin wrapper overfft2_norm. - fft2_
differentiable - Differentiable 2-D FFT (default
s/dim/norm). AttachesFft2Backward. - fft2_
differentiable_ norm - Differentiable 2-D FFT with explicit
s/dim/norm(#1294). - fft2_
norm - 2-D FFT with explicit
s/dim/norm(#1294). - fft_
differentiable - Differentiable 1-D FFT (default
dim/norm). AttachesFftBackward. - fft_
differentiable_ norm - Differentiable 1-D FFT with explicit
dim/norm(#1294). Attaches aFftBackwardthat threads the adjoint norm/dim. Matchestorch.fft.fft. - fft_
norm - 1-D complex-to-complex FFT with explicit
dimandnorm(#1294). - fftfreq
- Discrete Fourier Transform sample frequencies.
- fftn
- N-dimensional complex-to-complex FFT.
- fftn_
differentiable - Differentiable N-D FFT (default
norm). AttachesFftnBackward. - fftn_
differentiable_ norm - Differentiable N-D FFT with explicit
norm(#1294). Matchestorch.fft.fftn;axesis torch’sdim. - fftn_
norm - N-dimensional complex-to-complex FFT with explicit
norm(#1294). - fftshift
- Shift the zero-frequency component to the center along the given axes.
- fixed_
point - Find a fixed point of
fstarting fromx0, then compute its derivative w.r.t.paramsusing the implicit function theorem. - flex_
attention - Compute flexible multi-head attention with an optional score modification function.
- flip
- Reverse the order of elements along each axis in
dims. - fliplr
fliplr(input)— flip a (≥2-D) tensor left-to-right (along dim 1).- flipud
flipud(input)— flip a (≥1-D) tensor up-to-down (along dim 0).- from_
slice - Create a tensor from a slice, copying the data.
- from_
vec - Create a tensor from a
Vec<T>, taking ownership. - full
- Create a tensor filled with a given value.
- full_
like - Create a tensor filled with
valuewith the same shape asother. - gammainc
- Regularized lower incomplete gamma
P(a, x), element-wise over a broadcast ofinput(theaargument) andother(thexargument). - gammaincc
- Regularized upper incomplete gamma
Q(a, x) = 1 - P(a, x), element-wise over a broadcast ofinput(theaargument) andother(thexargument). - gammaln_
sign - Sign of the gamma function
Γ(x)— the±1(orNaNat poles) factor thatlgamma = ln|Γ|discards, element-wise overinput. - gather
- Gather values from
inputalongdimusingindex. - gelu
- Compute
gelu(x)with the default exact (erf-based) approximation. - gelu_
with - Compute
gelu(x)with configurable approximation, attaching a backward node when gradients are enabled. - grad
- Compute gradients of
outputswith respect toinputs. - grad_
norm - Compute the L2 norm of gradients of
outputswith respect toinputs. - gradient_
penalty - Compute the gradient penalty for WGAN-GP.
- hessian
- Compute the Hessian matrix of a scalar function at a point.
- hfft
- 1-D FFT of a Hermitian-symmetric complex spectrum, returning real output.
- hfft2
- 2-D FFT of a Hermitian-symmetric spectrum, returning real output
(
torch.fft.hfft2). - hfft2_
norm - 2-D Hermitian FFT with explicit
norm(#1294). - hfft_
differentiable - Differentiable Hermitian FFT (complex spectrum → real signal). Attaches
HfftBackwardwhen grad is needed. - hfft_
norm - 1-D Hermitian FFT with explicit
dimandnorm(#1294). - hfftn
- N-D FFT of a Hermitian-symmetric spectrum, returning real output
(
torch.fft.hfftn). Generalizeshfft/hfft2to arbitrary axes. - hfftn_
norm - N-D Hermitian FFT with explicit
norm(#1294). - histc
- Histogram — count elements in equal-width bins.
- hstack
hstack(tensors)— stack tensors column-wise.- hypot
- Differentiable element-wise
hypot(x, y) = sqrt(x^2 + y^2)with the overflow-safe accumulation provided bynum_traits::Float::hypot(delegates tof32::hypot/f64::hypot). Mirrorsaten/src/ATen/native/BinaryOps.cpp:548 hypot_out. Backward:grad_x = grad * x / result; grad_y = grad * y / result, withresult == 0 -> 0masking (matching the upstream behavior inderivatives.yaml:814-817whosegrad * self / resultis implicitly degenerate at the origin — we mask to a safe zero rather than producing NaN, which differs from torch’s literal IEEE 0/0 output at the (0,0) tie only; the divergence is filed as documentation, not a parity blocker). - i0
- Modified Bessel function of the first kind, order 0:
i0(x). Even function;i0(0) = 1,i0(+/-inf) = +inf,i0(NaN) = NaN. Mirrorstorch.special.i0/torch.i0(torch/special/__init__.py:522); the scalar evaluator ports the CepheschbevlChebyshev kernel fromaten/src/ATen/native/cuda/Math.cuh:502-555. - i0e
- Exponentially-scaled modified Bessel order 0:
i0e(x) = exp(-|x|) I0(x). Even;i0e(0) = 1,i0e(+/-inf) = 0(stays finite wherei0overflows),i0e(NaN) = NaN. Mirrorstorch.special.i0e(torch/special/__init__.py:548); scalar evaluator portscalc_i0e(aten/src/ATen/native/Math.h:101-145) — same Chebyshev sets asi0without theexp(x)factor. - i1
- Modified Bessel function of the first kind, order 1:
i1(x). Odd function (sign followsx);i1(0) = 0,i1(+inf) = +inf,i1(-inf) = -inf,i1(NaN) = NaN. Mirrorstorch.special.i1/torch.i1; scalar evaluator portsi1_string(aten/src/ATen/native/cuda/Math.cuh:575-622). - i1e
- Exponentially-scaled modified Bessel order 1:
i1e(x) = exp(-|x|) I1(x). Odd;i1e(0) = 0,i1e(+/-inf) = +/-0,i1e(NaN) = NaN. Mirrorstorch.special.i1e(torch/special/__init__.py:598); scalar evaluator portscalc_i1e(aten/src/ATen/native/cuda/Math.cuh:647-696) — same Chebyshev sets asi1without theexp(x)factor. - ifft
- 1-D inverse FFT along the last dimension (default
norm). - ifft2
- 2-D inverse FFT (complex-to-complex) along the last two spatial dimensions
(default
s/dim/norm). Thin wrapper overifft2_norm. - ifft2_
differentiable - Differentiable 2-D inverse FFT (default
s/dim/norm). AttachesIfft2Backward. - ifft2_
differentiable_ norm - Differentiable 2-D inverse FFT with explicit
s/dim/norm(#1294). - ifft2_
norm - 2-D inverse FFT with explicit
s/dim/norm(#1294). - ifft_
differentiable - Differentiable 1-D inverse FFT (default
dim/norm). AttachesIfftBackward. - ifft_
differentiable_ norm - Differentiable 1-D inverse FFT with explicit
dim/norm(#1294). - ifft_
norm - 1-D inverse FFT with explicit
dimandnorm(#1294). - ifftn
- N-dimensional inverse complex FFT.
- ifftn_
differentiable - Differentiable N-D inverse FFT (default
norm). AttachesIfftnBackward. - ifftn_
differentiable_ norm - Differentiable N-D inverse FFT with explicit
norm(#1294). - ifftn_
norm - N-dimensional inverse complex FFT with explicit
norm(#1294). - ifftshift
- Inverse of
fftshift. - ihfft
- 1-D inverse FFT of a real signal, returning a Hermitian-symmetric spectrum.
- ihfft2
- 2-D inverse FFT of a real signal, returning a Hermitian-symmetric spectrum
(
torch.fft.ihfft2). - ihfft2_
norm - 2-D inverse Hermitian FFT with explicit
norm(#1294). - ihfft_
differentiable - Differentiable inverse Hermitian FFT (real signal → Hermitian spectrum).
Attaches
IhfftBackwardwhen grad is needed. - ihfft_
norm - 1-D inverse Hermitian FFT with explicit
dimandnorm(#1294). - ihfftn
- N-D inverse FFT of a real signal, returning a Hermitian-symmetric spectrum
(
torch.fft.ihfftn). Generalizesihfft/ihfft2to arbitrary axes. - ihfftn_
norm - N-D inverse Hermitian FFT with explicit
norm(#1294). - irfft
- 1-D complex-to-real inverse FFT (default
norm). - irfft2
- 2-D complex-to-real inverse FFT (
torch.fft.irfft2). - irfft2_
norm - 2-D complex-to-real inverse FFT with explicit
norm(#1294). - irfft_
differentiable - Differentiable 1-D inverse real FFT (default
dim/norm). AttachesIrfftBackward. - irfft_
differentiable_ norm - Differentiable 1-D inverse real FFT with explicit
dim/norm(#1294). - irfft_
norm - 1-D complex-to-real inverse FFT with explicit
dimandnorm(#1294). - irfftn
- N-dimensional complex-to-real inverse FFT.
- irfftn_
differentiable - Differentiable N-D inverse real FFT (default
norm). AttachesIrfftnBackward. - irfftn_
differentiable_ norm - Differentiable N-D inverse real FFT with explicit
norm(#1294). Matchestorch.fft.irfftn. - irfftn_
norm - N-dimensional complex-to-real inverse FFT with explicit
norm(#1294). - is_
autocast_ debug - Returns
trueif autocast debug event recording is active on this thread. - is_
autocast_ enabled - Returns
trueif mixed-precision autocast is currently enabled on this thread. - is_
grad_ enabled - Returns
trueif gradient tracking is currently enabled on this thread. - jacfwd
- Compute the full Jacobian matrix using forward-mode AD.
- jacobian
- Compute the Jacobian matrix of a function at a point.
- jvp
- Compute the Jacobian-vector product (JVP):
J @ v. - jvp_
exact - Compute the exact Jacobian-vector product using forward-mode AD.
- lgamma
- Log-gamma function: lgamma(x) = log(|Gamma(x)|).
- linspace
- Create a 1-D tensor of
numevenly spaced values fromstarttoend(inclusive). - log
- Differentiable elementwise natural log:
c[i] = ln(x[i]). - log1p
- log(1 + x) – numerically stable for small x.
- log_
beta - Log-beta function
lnB(a, b) = lgamma(a) + lgamma(b) - lgamma(a + b), element-wise over a broadcast ofaandb. - logcumsumexp
- Differentiable log-cumulative-sum-exp along
dim. - magnitude_
prune - Unstructured magnitude pruning: zero out the smallest weights.
- manual_
seed - Set the current thread’s default RNG seed — mirrors
torch.manual_seedattorch/random.py:46. - masked_
count - Number of valid (unmasked) entries; returns a 0-d tensor in
T. - masked_
equal - Mask out entries equal to
value. Matchesnumpy.ma.masked_equal. - masked_
invalid - Mask out non-finite entries (NaN, ±∞). Matches
numpy.ma.masked_invalid. - masked_
max - Max of valid entries; returns a 0-d tensor (NaN if all masked).
- masked_
mean - Mean of valid entries; returns a 0-d tensor.
- masked_
min - Min of valid entries; returns a 0-d tensor (NaN if all masked).
- masked_
select masked_select(input, mask)— return a 1-D tensor of the elements ofinputwheremaskis true, in flat C-order. Mirrorstorch.masked_select.maskmust have the same numel asinput.- masked_
sum - Sum of valid entries; returns a 0-d tensor.
- masked_
where - Wrap
datawithconditioninterpreted as “where condition is true, mask the value out”. Matchesnumpy.ma.masked_where. The resultingMaskedTensorhasmask = !conditionunder the torch convention. - max_
with_ dim - Differentiable
(values, indices) = max(input, dim, keepdim)with the PyTorch named-tuple return. Mirrorstorch.max(input, dim, keepdim)ataten/src/ATen/native/ReduceOps.cppmax.dimoverload. NaN propagation perSharedReduceOps.h:26-34. Backward scattersgradto the input positions identified byindices. Closes #1302 (max). - mean_
dim - Mean along a specific dimension.
- median_
with_ dim - Differentiable
(values, indices) = median(input, dim, keepdim)with the PyTorch named-tuple return. Mirrorstorch.median(input, dim, keepdim)ataten/src/ATen/native/Sorting.cpp:503 median_with_indices_impl(ignore_nan = false: a NaN in the slice poisons the result). Backward scattersgradto the input positions identified byindicesvia the sharedMaxMinDimBackward. Closes #1306 (median). - meshgrid
- Create coordinate grids from 1-D coordinate vectors.
- meshgrid_
indexing - Create coordinate grids from 1-D coordinate vectors with an explicit
MeshIndexingconvention. - min_
with_ dim - Differentiable
(values, indices) = min(input, dim, keepdim)— symmetric tomax_with_dim. Closes #1302 (min). - modified_
bessel_ k0 - Modified Bessel function of the second kind, order 0:
k0(x). Domainx > 0:k0(0) = +inf,k0(x < 0) = NaN,k0(NaN) = NaN. Decays to0for largex. Mirrorstorch.special.modified_bessel_k0(torch/special/__init__.py:1304-1341); scalar evaluator portsmodified_bessel_k0_forward(aten/src/ATen/native/cuda/Math.cuh:2503-2577) over the sharedchbevlClenshaw evaluator and the batch-2i0. - modified_
bessel_ k1 - Modified Bessel function of the second kind, order 1:
k1(x). Domainx > 0:k1(0) = +inf,k1(x < 0) = NaN,k1(NaN) = NaN. Mirrorstorch.special.modified_bessel_k1(torch/special/__init__.py:1321-1358); scalar evaluator portsmodified_bessel_k1_forward(aten/src/ATen/native/cuda/Math.cuh:2661-2736) overchbevland the batch-2i1. - moveaxis
moveaxis(input, source, destination)— a literal alias ofmovedim.- movedim
movedim(input, source, destination)— reposition the dims listed insourceto the indices listed indestination.- multigammaln
- Multivariate log-gamma
log Γ_p(a)with dimensionp, element-wise overinput: - mvlgamma
- Alias for
multigammaln— mirrorstorch.mvlgamma(input, p)(torch/_torch_docs.py:7895, “Alias for torch.special.multigammaln”). - nanmedian_
with_ dim - Differentiable
(values, indices) = nanmedian(input, dim, keepdim)— NaN-skipping counterpart ofmedian_with_dim. Mirrorstorch.nanmedian(input, dim, keepdim)(ignore_nan = true): NaNs are excluded from the median rank computation. Closes #1306 (nanmedian). - ndtr
- Standard-normal CDF
ndtr(x) = (1 + erf(x/sqrt(2))) / 2. Mirrorstorch.special.ndtr(torch/special/__init__.py:624; kernelaten/src/ATen/native/UnaryOps.cpp:715-718). Composed over the shippederfsondtr(-inf) = 0,ndtr(0) = 0.5,ndtr(+inf) = 1,ndtr(NaN) = NaN. - ndtri
- Inverse standard-normal CDF (quantile function)
ndtri(p). Domain(0, 1):ndtri(0) = -inf,ndtri(1) = +inf,ndtri(p<0 || p>1) = NaN. Mirrorstorch.special.ndtri(torch/special/__init__.py:649); the implementation ports the Cephes rational fromaten/src/ATen/native/cuda/Math.cuh:48-173(NOTsqrt(2)*erfinv(2p-1)) for ULP parity with torch. - nested_
scaled_ dot_ product_ attention - Scaled dot-product attention over nested tensors.
- nextafter
- Differentiable element-wise
nextafter(a, b): the next representable floating-point value afterain the direction ofb. Forward mirrorsaten/src/ATen/native/BinaryOps.cpp:551 nextafter_out(CPU kernelstd::nextafter). Backward perderivatives.yaml:1322-1324routesgradtoawherea != b(zero on thea == btie); gradient tobis zero. - no_grad
- Execute a closure with gradient tracking disabled.
- norm_
with_ dim - Differentiable p-norm along a dimension:
result = (sum(|x|^p, dim))^(1/p). Mirrorsaten/src/ATen/native/ReduceOps.cpplinalg_vector_norm/ theTensor::norm(p, dim, keepdim)overload. Backward pertools/autograd/derivatives.yamlnorm.ScalarOpt_dim. Closes #1308. - normalize_
axis - Normalize a possibly-negative axis index to a positive one.
- ones
- Create a tensor filled with ones.
- ones_
like - Create a tensor of ones with the same shape as
other. - permute_
t - Permute tensor dimensions. Like PyTorch’s
tensor.permute(dims). - prepare_
qat - Prepare a set of named parameters for quantization-aware training.
- quantize
- Quantize a floating-point tensor.
- quantize_
named_ tensors - Quantize every weight tensor in a module, returning a name -> QuantizedTensor map suitable for serialization or quantized inference.
- quantized_
matmul - Multiply two quantized 2-D matrices and return a quantized result.
- rand
- Create a tensor with random values uniformly distributed in [0, 1).
- rand_
like - Create a random tensor [0,1) with the same shape as
other. - rand_
on_ device - Device-aware uniform-
[0, 1)random tensor creation. - randn
- Create a tensor with random values from a standard normal distribution.
- randn_
like - Create a random normal tensor with the same shape as
other. - randn_
on_ device - Device-aware standard-normal random tensor creation.
- rearrange
- Rearrange tensor dimensions using an einops-style pattern.
- rearrange_
with - Rearrange with explicit axis sizes for ambiguous splits.
- reduce
- Reduce along axes that appear on the left but not the right.
- repeat
- Repeat tensor elements along new or existing axes.
- repeat_
interleave repeat_interleave(input, repeats, dim)— repeat each elementrepeatstimes consecutively alongdim.- rfft
- 1-D real-to-complex FFT along the last dimension (default
norm). - rfft2
- 2-D real-to-complex FFT (
torch.fft.rfft2). - rfft2_
norm - 2-D real-to-complex FFT with explicit
norm(#1294). - rfft_
differentiable - Differentiable 1-D real FFT (default
dim/norm). AttachesRfftBackward. - rfft_
differentiable_ norm - Differentiable 1-D real FFT with explicit
dim/norm(#1294). - rfft_
norm - 1-D real-to-complex FFT with explicit
dimandnorm(#1294). - rfftfreq
- Sample frequencies for
rfft(non-negative half). - rfftn
- N-dimensional real-to-complex FFT.
- rfftn_
differentiable - Differentiable N-D real FFT (default
norm). AttachesRfftnBackward. - rfftn_
differentiable_ norm - Differentiable N-D real FFT with explicit
norm(#1294). Matchestorch.fft.rfftn. - rfftn_
norm - N-dimensional real-to-complex FFT with explicit
norm(#1294). - roll
- Roll (circular shift) a tensor along a dimension.
- rot90
rot90(input, k, dims)— rotate a tensor 90°ktimes in the plane spanned bydims.- scalar
- Create a scalar (0-D) tensor.
- scaled_
modified_ bessel_ k0 - Exponentially-scaled modified Bessel order 0:
scaled_modified_bessel_k0(x) = exp(x) * k0(x). Same domain asmodified_bessel_k0; stays finite (-> sqrt(pi/(2x))) wherek0underflows. Mirrorstorch.special.scaled_modified_bessel_k0(torch/special/__init__.py:1304-1341); portsscaled_modified_bessel_k0_forward(aten/src/ATen/native/cuda/Math.cuh:2582-2656). - scaled_
modified_ bessel_ k1 - Exponentially-scaled modified Bessel order 1:
scaled_modified_bessel_k1(x) = exp(x) * k1(x). Same domain asmodified_bessel_k1. Mirrorstorch.special.scaled_modified_bessel_k1(torch/special/__init__.py:1321-1358); portsscaled_modified_bessel_k1_forward(aten/src/ATen/native/cuda/Math.cuh:2740-2815). - scan
- Sequential state accumulation (scan / fold with outputs).
- scatter
- Scatter
srcvalues into a clone ofinputalongdimusingindex. - scatter_
add - Scatter-add
srcvalues into a clone ofinputalongdim. - scatter_
add_ segments - Segmented scatter-add of a
[E, D]source into an[dim_size, D]output, indexed along dim 0 byindex[e]. - searchsorted
- Find insertion indices for
valuesin a sorted 1-Dboundariestensor. - select
- Extract a single slice along
dimat positionindex, removing the dimension. - set_
autocast_ debug - Enable or disable autocast event recording on this thread.
- set_
grad_ enabled - Programmatically set whether gradients are enabled.
- sigmoid
- Compute
sigmoid(x), attaching a backward node when gradients are enabled. - signbit
- Non-differentiable element-wise
signbit(x). Returns aBoolTensorwhere each element istrueiff the corresponding input is negative (sign bit set), matchingf32::is_sign_negative/f64::is_sign_negative. Bool output is not differentiable — there is noderivatives.yamlentry. - sin
- Differentiable elementwise sine:
c[i] = sin(x[i]). - sinc
- Normalized sinc function: sinc(x) = sin(pix) / (pix), with sinc(0) = 1.
- sparse_
matmul_ 24 - Matrix multiply
a @ bwherebis stored in 2:4 semi- structured format. The last-dim strides ofb’s original dense shape must be a multiple of 4 (guaranteed bySemiStructuredSparseTensor::compress). - sparsity_
ratio - Compute the sparsity ratio of a tensor: fraction of exact zeros.
- spherical_
bessel_ j0 - Spherical Bessel function of the first kind, order 0:
j0(x) = sin(x)/x, withj0(0) = 1(the Taylor branch) andj0(+/-inf) = 0.j0(NaN) = NaN. Mirrorstorch.special.spherical_bessel_j0(torch/special/__init__.py:1444+); scalar evaluator portsspherical_bessel_j0_forward(aten/src/ATen/native/cuda/Math.cuh:3039-3052):|x| < 0.5uses the explicit 6-term Taylor series, elsesin(x)/x. - split_t
- Split tensor into pieces of given sizes along
dim. - stack
- Stack a slice of tensors along a new dimension
dim. - sum_dim
- Sum along a specific dimension.
- swapaxes
swapaxes(input, axis0, axis1)— swap two axes; a literal alias oftranspose.- swapdims
swapdims(input, dim0, dim1)— swap two dims; a literal alias oftranspose.- tanh
- Compute
tanh(x), attaching a backward node when gradients are enabled. - tensor
- Create a 1-D tensor from a slice (shape inferred).
- tensor_
split tensor_split(input, indices, dim)— splitinputat the given integerindicesalongdim(the indices form section boundaries).- tile
tile(input, reps)— NumPy-style tile.- topk
- Return the
klargest elements and their indices along the last dimension. - tril
- Lower triangular part of a tensor with at least 2 dimensions.
- triu
- Upper triangular part of a tensor with at least 2 dimensions.
- unbind
unbind(input, dim)— splitinputintosize(dim)slices, removingdimfrom each.- unflatten
unflatten(input, dim, sizes)— reshape a single dimensiondiminto the multiple sizessizes, leaving every other dimension untouched.- unique
- Return the sorted unique elements of a 1-D tensor.
- unique_
consecutive - Remove consecutive duplicate elements from a 1-D tensor.
- validate_
cond_ branches - Validate that two sets of outputs have matching shapes.
- view_t
- View tensor with new shape. Like PyTorch’s
tensor.view(shape). - vjp
- Compute the vector-Jacobian product (VJP):
v^T @ J. - vmap
- Vectorize a function over a batch dimension.
- vmap2
- Vectorize a two-argument function over batch dimensions.
- vstack
vstack(tensors)— stack tensors row-wise (along dim 0 after promoting each to ≥2-D).- where_
cond - Ternary selection:
output[i] = condition[i] ? x[i] : y[i]. - where_
cond_ bt - Ternary selection taking a [
BoolTensor] condition:output[i] = cond[i] ? x[i] : y[i]. Mirrorstorch.where(cond, x, y). - xlogy
- x * log(y), with the convention that xlogy(0, y) = 0 for any y.
- zeros
- Create a tensor filled with zeros.
- zeros_
like - Create a tensor of zeros with the same shape as
other. - zeta
- Hurwitz zeta function
zeta(x, q) = sum_{k=0}^inf (k + q)^{-x}, element-wise over a broadcast ofinput(thexexponent) andother(theqshift). Mirrorstorch.special.zeta(input, other)(torch/special/__init__.py); scalar evaluator ports the Cephes Hurwitz-zeta kernel fromaten/src/ATen/native/cuda/Math.cuh:299-383. Edge ladder:x == 1 -> +inf;x < 1 -> NaN;q <= 0non-positive integer-> +inf;q <= 0non-integer with non-integerx -> NaN.zeta(2, 1) == pi^2/6.
Type Aliases§
- Ferrotorch
Result - Convenience alias for ferrotorch results.
- Kernel
- A dispatched kernel: takes the op’s input tensors, the currently-active keyset (after all higher-priority keys have been resolved), and a reference to the dispatcher so the kernel can redispatch to a lower-priority key.