Expand description
ferrotorch — PyTorch-shaped deep learning framework in Rust.
This crate is the umbrella re-export crate. Sub-crates own the actual
implementation; this crate exists so users can use ferrotorch::*; (or
use ferrotorch::prelude::*;) and pick up the canonical public surface
in one import.
§Examples
use ferrotorch::{FerrotorchResult, zeros};
fn main() -> FerrotorchResult<()> {
let t = zeros::<f32>(&[2, 3])?;
assert_eq!(t.shape(), &[2, 3]);
Ok(())
}See the prelude module for the items most users want, and the per-feature
modules (nn, optim, data, vision, train, serialize, jit,
jit_script, distributions, profiler, hub, tokenize, gpu,
cubecl, mps, xpu, distributed, llama, ml) for sub-crate access.
Lint baseline mirrors the per-crate convention used across the workspace
(ferrotorch-core, ferrotorch-jit, ferrotorch-cubecl, etc.). Workspace
[lints] is intentionally not used — every crate carries its own
#![warn/deny(...)] so the policy lives next to the code it governs.
Modules§
- autograd
- bool_
tensor - Boolean tensors for masks and logical operations. (#596)
- complex_
tensor ComplexTensor<T>— first-class complex-valued tensors. (#618)- cpu_
pool - CPU tensor buffer pool — caching allocator for host memory.
- creation
- data
- Data loading, datasets, samplers, and transforms.
- device
- dispatch
- Multi-dispatch key system for composable tensor backends. CL-397.
- distributions
- Probability distributions for sampling and variational inference.
- dtype
- einops
- Einops-style tensor rearrangement operations.
- einsum
- Einstein summation (
einsum) for ferrotorch tensors. - error
- fft
- FFT operations for tensors.
- flex_
attention - Flexible attention with customizable score modification.
- gpu_
dispatch - GPU backend dispatch layer.
- grad_
fns - hub
- Model hub for downloading and caching pretrained models.
- int_
tensor - Integer-typed tensors for indexing, embedding lookups, and any other workload that needs first-class non-float storage. (#596)
- jit
- JIT tracing, IR graph, optimization passes, and code generation.
- jit_
script #[script]proc macro for source-based graph capture.- linalg
- Advanced linear algebra operations bridging to ferray-linalg.
- masked
- Masked tensors —
torch.masked.MaskedTensoranalog. - meta_
propagate - Helpers for propagating the meta device through tensor operations.
- named_
tensor NamedTensor<T>— dim-name annotations on top ofTensor<T>. (#621)- nested
- nn
- Neural network modules and layers.
- numeric_
cast - Fallible numeric conversions used across the workspace.
- ops
- optim
- Optimizers and learning rate schedulers.
- prelude
- Prelude module — import everything commonly needed.
- profiler
- Performance profiling and Chrome trace export.
- profiler_
hook - Thread-local profiler hook for auto-instrumented tensor ops.
- pruning
- quantize
- Post-training quantization (PTQ) for ferrotorch tensors.
- serialize
- Model serialization: ONNX export,
PyTorchimport, safetensors, GGUF. - shape
- signal
- Signal-processing utilities.
- sparse
- special
- Special mathematical functions (
torch.specialequivalent). - storage
- stride_
tricks as_stridedfamily — direct stride manipulation on tensors.- tensor
- tokenize
HuggingFacetokenizer wrapper (BPE,WordPiece, Unigram).- train
- Training loop, Learner, callbacks, and metrics.
- vision
- Computer vision models, datasets, and transforms.
- vmap
- Vectorized map (vmap) — apply a function over a batch dimension.
Structs§
- Anomaly
Mode - Global anomaly detection mode.
- AsStrided
Backward - VJP for
as_strided(input, size, stride, offset). - Bool
Tensor - CPU-resident tensor of booleans. Shape is metadata; storage is a flat
Arc<Vec<bool>>for cheap clones. - Complex
Tensor - CPU-resident, contiguous, structure-of-arrays complex tensor.
- CooTensor
- A 2-D sparse tensor in COO (Coordinate List) format with separate row and column index arrays.
- CscTensor
- 2-D sparse tensor in CSC (Compressed Sparse Column) format. Dual of
CsrTensor: instead of storing row pointers + column indices, stores column pointers (col_ptrs, lengthncols + 1) and row indices for each non-zero. Efficient for column slicing andA^T xstyle ops. - CsrTensor
- A 2-D sparse tensor in CSR (Compressed Sparse Row) format.
- CumExtreme
Result - Result of
cummax/cummin: values tensor and indices tensor. - Dispatch
KeySet - A set of active
DispatchKeys, stored as au16bitmask for constant-time membership testing and iteration. - Dispatcher
- A kernel registration table keyed by
(op_name, dispatch_key). Looking up a kernel is a single HashMap probe. - Dual
Tensor - A dual-number tensor:
primal + epsilon * tangent. - Fake
Quantize - Simulates quantization during training by quantizing and immediately dequantizing values, while allowing gradients to flow through via the straight-through estimator (STE).
- Forward
Backtrace - A captured forward-pass backtrace, stored on tensors when anomaly mode is on.
- Histogram
Observer - Histogram-based observer that collects a distribution of values.
- Hook
Handle - An opaque handle returned by
register_hook/register_post_accumulate_grad_hook. - IntTensor
- CPU-resident, contiguous tensor of integers.
Arc<Vec<I>>storage so clones are cheap and shape views are trivial. - Masked
Tensor - A tensor paired with a boolean mask.
- MinMax
Observer - Tracks the running min/max of observed values.
- Named
Tensor - A
Tensor<T>paired with one optional dim name per axis. - Nested
Tensor - A nested (ragged) tensor — a collection of tensors with differing sizes along one dimension (the “ragged” dimension).
- Packed
Nested Tensor - A nested (jagged) tensor stored as one contiguous flat buffer with an offsets array marking the start of each component.
- PerChannel
MinMax Observer - Tracks per-channel running min/max of observed values.
- QParams
- Computed quantization parameters (scale and zero_point).
- QatLayer
- A layer with associated FakeQuantize modules for QAT.
- QatModel
- Wraps a collection of named weight tensors for quantization-aware training.
- Quantized
Tensor - A tensor stored in quantized (integer) representation.
- Semi
Structured Sparse Tensor - A tensor stored in the NVIDIA 2:4 structured sparsity format.
- Sparse
Grad - A sparse gradient: a list of (index, value) pairs that an optimizer
applies to a dense parameter tensor. Mirrors the
coalesced form oftorch.Tensor.is_sparsegradients used bynn.Embedding(sparse=True)and consumed byoptim.SparseAdam/optim.SGD. - Sparse
Tensor - A sparse tensor in COO (Coordinate List) format.
- Tensor
- The central type. A dynamically-shaped tensor with gradient tracking and device placement.
- Tensor
Id - A unique, monotonically increasing tensor identifier.
- Tensor
Storage - The underlying data buffer for a tensor, tagged with its device.
Enums§
- Autocast
Category - Policy: which operations should be cast to reduced precision.
- Autocast
Dtype - The reduced-precision dtype used during autocast regions.
- DType
- Runtime descriptor for the element type stored in an array.
- Device
- Device on which a tensor’s data resides.
- Dispatch
Key - One of the 16 possible dispatch keys, ordered from lowest to
highest priority. The
u8repr matches the bit position inDispatchKeySet’s internalu16bitmask, so the priority ordering is both the enum declaration order and the numeric order of the discriminants. - Einops
Reduction - Reduction operation for
reduce. - Ferrotorch
Error - Errors produced by ferrotorch operations.
- Gelu
Approximate - Selects the GELU approximation method.
- Memory
Format - Describes the physical memory layout of a tensor.
- Quant
Dtype - Target integer dtype for quantized storage.
- Quant
Scheme - Granularity of quantization parameters (scale / zero_point).
- Storage
Buffer - Device-specific data buffer.
Traits§
- Element
- Trait bound for types that can be stored in a ferray array.
- Float
- Marker trait for float element types that support autograd.
- GradFn
- The backward function trait for reverse-mode automatic differentiation.
- IntElement
- Element types supported by
IntTensor. - Observer
- Trait for quantization observers that collect data statistics.
Functions§
- apply_
2_ 4_ mask - Apply 2:4 structured sparsity mask.
- arange
- Create a 1-D tensor with values from
starttoend(exclusive) with stepstep. - as_
strided - Zero-copy strided view; see
Tensor::as_stridedfor full docs. - as_
strided_ copy - Materialised strided copy; see
Tensor::as_strided_copyfor full docs. - as_
strided_ scatter - Inverse of
as_strided; seeTensor::as_strided_scatterfor full docs. - autocast
- Execute a closure with mixed-precision autocast enabled.
- autocast_
dtype - Returns the target dtype for autocast regions on this thread.
- autocast_
guard - Primary entry point for op implementations to query autocast policy.
- backward
- Compute gradients of all leaf tensors that contribute to
root. - backward_
with_ grad - Run backward pass through the computation graph.
- broadcast_
shapes - Compute the broadcasted shape of two shapes, following NumPy/PyTorch rules.
- bucketize
- Discretize
inputvalues into buckets defined byboundaries. - cat
- Concatenate tensors along an axis.
- cdist
- Pairwise distance matrix between two sets of vectors.
- check_
gradient_ anomaly - Check a gradient tensor for NaN or Inf values (anomaly check).
- chunk_t
- Split tensor into
chunksroughly equal pieces alongdim. - clamp
- Differentiable elementwise clamp:
c[i] = x[i].clamp(min, max). - cond
- Conditional subgraph execution.
- contiguous_
t - Make tensor contiguous (copy data if needed).
- cos
- Differentiable elementwise cosine:
c[i] = cos(x[i]). - cummax
- Cumulative maximum along
dim. - cummin
- Cumulative minimum along
dim. - cumprod
- Differentiable cumulative product along
dim. - cumsum
- Differentiable cumulative sum along
dim. - dequantize
- Dequantize back to a floating-point tensor.
- detect_
anomaly - Execute a closure with anomaly detection enabled.
- diag
- Extract the diagonal of a 2-D tensor, or construct a 2-D diagonal matrix from a 1-D tensor.
- diagflat
- Construct a diagonal matrix from a 1-D tensor (flattened if needed).
- digamma
- Digamma function: psi(x) = d/dx ln(Gamma(x)).
- dual_
add - Forward rule for addition:
d(a + b) = da + db. - dual_
cos - Forward rule for cos:
d(cos(a)) = -da * sin(a). - dual_
div - Forward rule for division:
d(a / b) = (da * b - a * db) / b^2. - dual_
exp - Forward rule for exp:
d(exp(a)) = da * exp(a). - dual_
log - Forward rule for log:
d(log(a)) = da / a. - dual_
matmul - Forward rule for matrix multiplication:
d(A @ B) = dA @ B + A @ dB. - dual_
mul - Forward rule for multiplication:
d(a * b) = a * db + da * b. - dual_
neg - Forward rule for negation:
d(-a) = -da. - dual_
relu - Forward rule for ReLU:
d(relu(a)) = da * (a > 0). - dual_
sigmoid - Forward rule for sigmoid:
d(sigmoid(a)) = da * sigmoid(a) * (1 - sigmoid(a)). - dual_
sin - Forward rule for sin:
d(sin(a)) = da * cos(a). - dual_
sub - Forward rule for subtraction:
d(a - b) = da - db. - dual_
tanh - Forward rule for tanh:
d(tanh(a)) = da * (1 - tanh(a)^2). - einsum
- Einstein summation.
- einsum_
differentiable - Differentiable Einstein summation. If any input requires grad and grad
is enabled, attaches [
EinsumBackward]. - enable_
grad - Re-enable gradient computation inside a
no_gradblock. - erf
- Error function: erf(x) = (2/sqrt(pi)) * integral(0, x, exp(-t^2) dt).
- erfc
- Complementary error function: erfc(x) = 1 - erf(x).
- erfinv
- Inverse error function: erfinv(erf(x)) = x.
- exp
- Differentiable elementwise exponential:
c[i] = exp(x[i]). - expand
- Broadcast (expand) a tensor to
new_shape. - expm1
- exp(x) - 1 – numerically stable for small x.
- eye
- Create an identity matrix of size
n x n. - fake_
quantize_ differentiable - Differentiable fake quantize per-tensor (affine).
- fft
- 1-D complex-to-complex FFT along the last dimension.
- fft2
- 2-D FFT (complex-to-complex) along the last two spatial dimensions.
- fft_
differentiable - Differentiable 1-D FFT. Attaches
FftBackwardwhen grad is needed. - fftfreq
- Discrete Fourier Transform sample frequencies.
- fftn
- N-dimensional complex-to-complex FFT.
- fftshift
- Shift the zero-frequency component to the center along the given axes.
- fixed_
point - Find a fixed point of
fstarting fromx0, then compute its derivative w.r.t.paramsusing the implicit function theorem. - flex_
attention - Compute flexible multi-head attention with an optional score modification function.
- from_
slice - Create a tensor from a slice, copying the data.
- from_
vec - Create a tensor from a
Vec<T>, taking ownership. - full
- Create a tensor filled with a given value.
- full_
like - Create a tensor filled with
valuewith the same shape asother. - gather
- Gather values from
inputalongdimusingindex. - gelu
- Compute
gelu(x)with the default exact (erf-based) approximation. - gelu_
with - Compute
gelu(x)with configurable approximation, attaching a backward node when gradients are enabled. - grad
- Compute gradients of
outputswith respect toinputs. - grad_
norm - Compute the L2 norm of gradients of
outputswith respect toinputs. - gradient_
penalty - Compute the gradient penalty for WGAN-GP.
- hessian
- Compute the Hessian matrix of a scalar function at a point.
- hfft
- 1-D FFT of a Hermitian-symmetric complex spectrum, returning real output.
- histc
- Histogram — count elements in equal-width bins.
- ifft
- 1-D inverse FFT along the last dimension.
- ifft2
- 2-D inverse FFT (complex-to-complex) along the last two spatial dimensions.
- ifft_
differentiable - Differentiable 1-D inverse FFT. Attaches
IfftBackwardwhen grad is needed. - ifftn
- N-dimensional inverse complex FFT.
- ifftshift
- Inverse of
fftshift. - ihfft
- 1-D inverse FFT of a real signal, returning a Hermitian-symmetric spectrum.
- irfft
- 1-D complex-to-real inverse FFT.
- irfft_
differentiable - Differentiable 1-D inverse real FFT. Attaches
IrfftBackwardwhen grad is needed. - irfftn
- N-dimensional complex-to-real inverse FFT.
- is_
autocast_ debug - Returns
trueif autocast debug event recording is active on this thread. - is_
autocast_ enabled - Returns
trueif mixed-precision autocast is currently enabled on this thread. - is_
grad_ enabled - Returns
trueif gradient tracking is currently enabled on this thread. - jacfwd
- Compute the full Jacobian matrix using forward-mode AD.
- jacobian
- Compute the Jacobian matrix of a function at a point.
- jvp
- Compute the Jacobian-vector product (JVP):
J @ v. - jvp_
exact - Compute the exact Jacobian-vector product using forward-mode AD.
- lgamma
- Log-gamma function: lgamma(x) = log(|Gamma(x)|).
- linspace
- Create a 1-D tensor of
numevenly spaced values fromstarttoend(inclusive). - log
- Differentiable elementwise natural log:
c[i] = ln(x[i]). - log1p
- log(1 + x) – numerically stable for small x.
- logcumsumexp
- Differentiable log-cumulative-sum-exp along
dim. - magnitude_
prune - Unstructured magnitude pruning: zero out the smallest weights.
- masked_
count - Number of valid (unmasked) entries; returns a 0-d tensor in
T. - masked_
equal - Mask out entries equal to
value. Matchesnumpy.ma.masked_equal. - masked_
invalid - Mask out non-finite entries (NaN, ±∞). Matches
numpy.ma.masked_invalid. - masked_
max - Max of valid entries; returns a 0-d tensor (NaN if all masked).
- masked_
mean - Mean of valid entries; returns a 0-d tensor.
- masked_
min - Min of valid entries; returns a 0-d tensor (NaN if all masked).
- masked_
sum - Sum of valid entries; returns a 0-d tensor.
- masked_
where - Wrap
datawithconditioninterpreted as “where condition is true, mask the value out”. Matchesnumpy.ma.masked_where. The resultingMaskedTensorhasmask = !conditionunder the torch convention. - mean_
dim - Mean along a specific dimension.
- meshgrid
- Create coordinate grids from 1-D coordinate vectors.
- nested_
scaled_ dot_ product_ attention - Scaled dot-product attention over nested tensors.
- no_grad
- Execute a closure with gradient tracking disabled.
- normalize_
axis - Normalize a possibly-negative axis index to a positive one.
- ones
- Create a tensor filled with ones.
- ones_
like - Create a tensor of ones with the same shape as
other. - permute_
t - Permute tensor dimensions. Like PyTorch’s
tensor.permute(dims). - prepare_
qat - Prepare a set of named parameters for quantization-aware training.
- quantize
- Quantize a floating-point tensor.
- quantize_
named_ tensors - Quantize every weight tensor in a module, returning a name -> QuantizedTensor map suitable for serialization or quantized inference.
- quantized_
matmul - Multiply two quantized 2-D matrices and return a quantized result.
- rand
- Create a tensor with random values uniformly distributed in [0, 1).
- rand_
like - Create a random tensor [0,1) with the same shape as
other. - randn
- Create a tensor with random values from a standard normal distribution.
- randn_
like - Create a random normal tensor with the same shape as
other. - rearrange
- Rearrange tensor dimensions using an einops-style pattern.
- rearrange_
with - Rearrange with explicit axis sizes for ambiguous splits.
- reduce
- Reduce along axes that appear on the left but not the right.
- repeat
- Repeat tensor elements along new or existing axes.
- rfft
- 1-D real-to-complex FFT along the last dimension.
- rfft_
differentiable - Differentiable 1-D real FFT. Attaches
RfftBackwardwhen grad is needed. - rfftfreq
- Sample frequencies for
rfft(non-negative half). - rfftn
- N-dimensional real-to-complex FFT.
- roll
- Roll (circular shift) a tensor along a dimension.
- scalar
- Create a scalar (0-D) tensor.
- scan
- Sequential state accumulation (scan / fold with outputs).
- scatter
- Scatter
srcvalues into a clone ofinputalongdimusingindex. - scatter_
add - Scatter-add
srcvalues into a clone ofinputalongdim. - scatter_
add_ segments - Segmented scatter-add of a
[E, D]source into an[dim_size, D]output, indexed along dim 0 byindex[e]. - searchsorted
- Find insertion indices for
valuesin a sorted 1-Dboundariestensor. - select
- Extract a single slice along
dimat positionindex, removing the dimension. - set_
autocast_ debug - Enable or disable autocast event recording on this thread.
- set_
grad_ enabled - Programmatically set whether gradients are enabled.
- sigmoid
- Compute
sigmoid(x), attaching a backward node when gradients are enabled. - sin
- Differentiable elementwise sine:
c[i] = sin(x[i]). - sinc
- Normalized sinc function: sinc(x) = sin(pix) / (pix), with sinc(0) = 1.
- sparse_
matmul_ 24 - Matrix multiply
a @ bwherebis stored in 2:4 semi- structured format. The last-dim strides ofb’s original dense shape must be a multiple of 4 (guaranteed bySemiStructuredSparseTensor::compress). - sparsity_
ratio - Compute the sparsity ratio of a tensor: fraction of exact zeros.
- split_t
- Split tensor into pieces of given sizes along
dim. - stack
- Stack a slice of tensors along a new dimension
dim. - sum_dim
- Sum along a specific dimension.
- tanh
- Compute
tanh(x), attaching a backward node when gradients are enabled. - tensor
- Create a 1-D tensor from a slice (shape inferred).
- topk
- Return the
klargest elements and their indices along the last dimension. - tril
- Lower triangular part of a 2-D tensor.
- triu
- Upper triangular part of a 2-D tensor.
- unique
- Return the sorted unique elements of a 1-D tensor.
- unique_
consecutive - Remove consecutive duplicate elements from a 1-D tensor.
- validate_
cond_ branches - Validate that two sets of outputs have matching shapes.
- view_t
- View tensor with new shape. Like PyTorch’s
tensor.view(shape). - vjp
- Compute the vector-Jacobian product (VJP):
v^T @ J. - vmap
- Vectorize a function over a batch dimension.
- vmap2
- Vectorize a two-argument function over batch dimensions.
- where_
cond - Ternary selection:
output[i] = condition[i] ? x[i] : y[i]. - xlogy
- x * log(y), with the convention that xlogy(0, y) = 0 for any y.
- zeros
- Create a tensor filled with zeros.
- zeros_
like - Create a tensor of zeros with the same shape as
other.
Type Aliases§
- Ferrotorch
Result - Convenience alias for ferrotorch results.
- Kernel
- A dispatched kernel: takes the op’s input tensors, the currently-active keyset (after all higher-priority keys have been resolved), and a reference to the dispatcher so the kernel can redispatch to a lower-priority key.