Crate ferrotorch

Expand description

ferrotorch — PyTorch-shaped deep learning framework in Rust.

This crate is the umbrella re-export crate. Sub-crates own the actual implementation; this crate exists so users can use ferrotorch::*; (or use ferrotorch::prelude::*;) and pick up the canonical public surface in one import.

§Examples

use ferrotorch::{FerrotorchResult, zeros};

fn main() -> FerrotorchResult<()> {
    let t = zeros::<f32>(&[2, 3])?;
    assert_eq!(t.shape(), &[2, 3]);
    Ok(())
}

See the prelude module for the items most users want, and the per-feature modules (nn, optim, data, vision, train, serialize, jit, jit_script, distributions, profiler, hub, tokenize, gpu, cubecl, mps, xpu, distributed, llama, ml) for sub-crate access.

Lint baseline mirrors the per-crate convention used across the workspace (ferrotorch-core, ferrotorch-jit, ferrotorch-cubecl, etc.). Workspace [lints] is intentionally not used — every crate carries its own #![warn/deny(...)] so the policy lives next to the code it governs.

Modules§

autograd
bool_tensor: Boolean tensors for masks and logical operations. (#596)
complex_tensor: ComplexTensor<T> — first-class complex-valued tensors. (#618)
cpu_pool: CPU tensor buffer pool — caching allocator for host memory.
creation
data: Data loading, datasets, samplers, and transforms.
device
dispatch: Multi-dispatch key system for composable tensor backends. CL-397.
distributions: Probability distributions for sampling and variational inference.
dtype
einops: Einops-style tensor rearrangement operations.
einsum: Einstein summation (einsum) for ferrotorch tensors.
error
fft: FFT operations for tensors.
flex_attention: Flexible attention with customizable score modification.
gpu_dispatch: GPU backend dispatch layer.
grad_fns
hub: Model hub for downloading and caching pretrained models.
int_tensor: Integer-typed tensors for indexing, embedding lookups, and any other workload that needs first-class non-float storage. (#596)
jit: JIT tracing, IR graph, optimization passes, and code generation.
jit_script: #[script] proc macro for source-based graph capture.
linalg: Advanced linear algebra operations bridging to ferray-linalg.
masked: Masked tensors — torch.masked.MaskedTensor analog.
meta_propagate: Helpers for propagating the meta device through tensor operations.
named_tensor: NamedTensor<T> — dim-name annotations on top of Tensor<T>. (#621)
nested
nn: Neural network modules and layers.
numeric_cast: Fallible numeric conversions used across the workspace.
ops
optim: Optimizers and learning rate schedulers.
prelude: Prelude module — import everything commonly needed.
profiler: Performance profiling and Chrome trace export.
profiler_hook: Thread-local profiler hook for auto-instrumented tensor ops.
pruning
quantize: Post-training quantization (PTQ) for ferrotorch tensors.
serialize: Model serialization: ONNX export, PyTorch import, safetensors, GGUF.
shape
signal: Signal-processing utilities.
sparse
special: Special mathematical functions (torch.special equivalent).
storage
stride_tricks: as_strided family — direct stride manipulation on tensors.
tensor
tokenize: HuggingFace tokenizer wrapper (BPE, WordPiece, Unigram).
train: Training loop, Learner, callbacks, and metrics.
vision: Computer vision models, datasets, and transforms.
vmap: Vectorized map (vmap) — apply a function over a batch dimension.

Structs§

AnomalyMode: Global anomaly detection mode.
AsStridedBackward: VJP for as_strided(input, size, stride, offset).
BoolTensor: CPU-resident tensor of booleans. Shape is metadata; storage is a flat Arc<Vec<bool>> for cheap clones.
ComplexTensor: CPU-resident, contiguous, structure-of-arrays complex tensor.
CooTensor: A 2-D sparse tensor in COO (Coordinate List) format with separate row and column index arrays.
CscTensor: 2-D sparse tensor in CSC (Compressed Sparse Column) format. Dual of CsrTensor: instead of storing row pointers + column indices, stores column pointers (col_ptrs, length ncols + 1) and row indices for each non-zero. Efficient for column slicing and A^T x style ops.
CsrTensor: A 2-D sparse tensor in CSR (Compressed Sparse Row) format.
CumExtremeResult: Result of cummax / cummin: values tensor and indices tensor.
DispatchKeySet: A set of active DispatchKeys, stored as a u16 bitmask for constant-time membership testing and iteration.
Dispatcher: A kernel registration table keyed by (op_name, dispatch_key). Looking up a kernel is a single HashMap probe.
DualTensor: A dual-number tensor: primal + epsilon * tangent.
FakeQuantize: Simulates quantization during training by quantizing and immediately dequantizing values, while allowing gradients to flow through via the straight-through estimator (STE).
ForwardBacktrace: A captured forward-pass backtrace, stored on tensors when anomaly mode is on.
HistogramObserver: Histogram-based observer that collects a distribution of values.
HookHandle: An opaque handle returned by register_hook / register_post_accumulate_grad_hook.
IntTensor: CPU-resident, contiguous tensor of integers. Arc<Vec<I>> storage so clones are cheap and shape views are trivial.
MaskedTensor: A tensor paired with a boolean mask.
MinMaxObserver: Tracks the running min/max of observed values.
NamedTensor: A Tensor<T> paired with one optional dim name per axis.
NestedTensor: A nested (ragged) tensor — a collection of tensors with differing sizes along one dimension (the “ragged” dimension).
PackedNestedTensor: A nested (jagged) tensor stored as one contiguous flat buffer with an offsets array marking the start of each component.
PerChannelMinMaxObserver: Tracks per-channel running min/max of observed values.
QParams: Computed quantization parameters (scale and zero_point).
QatLayer: A layer with associated FakeQuantize modules for QAT.
QatModel: Wraps a collection of named weight tensors for quantization-aware training.
QuantizedTensor: A tensor stored in quantized (integer) representation.
SemiStructuredSparseTensor: A tensor stored in the NVIDIA 2:4 structured sparsity format.
SparseGrad: A sparse gradient: a list of (index, value) pairs that an optimizer applies to a dense parameter tensor. Mirrors the coalesced form of torch.Tensor.is_sparse gradients used by nn.Embedding(sparse=True) and consumed by optim.SparseAdam / optim.SGD.
SparseTensor: A sparse tensor in COO (Coordinate List) format.
Tensor: The central type. A dynamically-shaped tensor with gradient tracking and device placement.
TensorId: A unique, monotonically increasing tensor identifier.
TensorStorage: The underlying data buffer for a tensor, tagged with its device.

Enums§

AutocastCategory: Policy: which operations should be cast to reduced precision.
AutocastDtype: The reduced-precision dtype used during autocast regions.
DType: Runtime descriptor for the element type stored in an array.
Device: Device on which a tensor’s data resides.
DispatchKey: One of the 16 possible dispatch keys, ordered from lowest to highest priority. The u8 repr matches the bit position in DispatchKeySet’s internal u16 bitmask, so the priority ordering is both the enum declaration order and the numeric order of the discriminants.
EinopsReduction: Reduction operation for reduce.
FerrotorchError: Errors produced by ferrotorch operations.
GeluApproximate: Selects the GELU approximation method.
MemoryFormat: Describes the physical memory layout of a tensor.
QuantDtype: Target integer dtype for quantized storage.
QuantScheme: Granularity of quantization parameters (scale / zero_point).
StorageBuffer: Device-specific data buffer.

Traits§

Element: Trait bound for types that can be stored in a ferray array.
Float: Marker trait for float element types that support autograd.
GradFn: The backward function trait for reverse-mode automatic differentiation.
IntElement: Element types supported by IntTensor.
Observer: Trait for quantization observers that collect data statistics.

Functions§

apply_2_4_mask: Apply 2:4 structured sparsity mask.
arange: Create a 1-D tensor with values from start to end (exclusive) with step step.
as_strided: Zero-copy strided view; see Tensor::as_strided for full docs.
as_strided_copy: Materialised strided copy; see Tensor::as_strided_copy for full docs.
as_strided_scatter: Inverse of as_strided; see Tensor::as_strided_scatter for full docs.
autocast: Execute a closure with mixed-precision autocast enabled.
autocast_dtype: Returns the target dtype for autocast regions on this thread.
autocast_guard: Primary entry point for op implementations to query autocast policy.
backward: Compute gradients of all leaf tensors that contribute to root.
backward_with_grad: Run backward pass through the computation graph.
broadcast_shapes: Compute the broadcasted shape of two shapes, following NumPy/PyTorch rules.
bucketize: Discretize input values into buckets defined by boundaries.
cat: Concatenate tensors along an axis.
cdist: Pairwise distance matrix between two sets of vectors.
check_gradient_anomaly: Check a gradient tensor for NaN or Inf values (anomaly check).
chunk_t: Split tensor into chunks roughly equal pieces along dim.
clamp: Differentiable elementwise clamp: c[i] = x[i].clamp(min, max).
cond: Conditional subgraph execution.
contiguous_t: Make tensor contiguous (copy data if needed).
cos: Differentiable elementwise cosine: c[i] = cos(x[i]).
cummax: Cumulative maximum along dim.
cummin: Cumulative minimum along dim.
cumprod: Differentiable cumulative product along dim.
cumsum: Differentiable cumulative sum along dim.
dequantize: Dequantize back to a floating-point tensor.
detect_anomaly: Execute a closure with anomaly detection enabled.
diag: Extract the diagonal of a 2-D tensor, or construct a 2-D diagonal matrix from a 1-D tensor.
diagflat: Construct a diagonal matrix from a 1-D tensor (flattened if needed).
digamma: Digamma function: psi(x) = d/dx ln(Gamma(x)).
dual_add: Forward rule for addition: d(a + b) = da + db.
dual_cos: Forward rule for cos: d(cos(a)) = -da * sin(a).
dual_div: Forward rule for division: d(a / b) = (da * b - a * db) / b^2.
dual_exp: Forward rule for exp: d(exp(a)) = da * exp(a).
dual_log: Forward rule for log: d(log(a)) = da / a.
dual_matmul: Forward rule for matrix multiplication: d(A @ B) = dA @ B + A @ dB.
dual_mul: Forward rule for multiplication: d(a * b) = a * db + da * b.
dual_neg: Forward rule for negation: d(-a) = -da.
dual_relu: Forward rule for ReLU: d(relu(a)) = da * (a > 0).
dual_sigmoid: Forward rule for sigmoid: d(sigmoid(a)) = da * sigmoid(a) * (1 - sigmoid(a)).
dual_sin: Forward rule for sin: d(sin(a)) = da * cos(a).
dual_sub: Forward rule for subtraction: d(a - b) = da - db.
dual_tanh: Forward rule for tanh: d(tanh(a)) = da * (1 - tanh(a)^2).
einsum: Einstein summation.
einsum_differentiable: Differentiable Einstein summation. If any input requires grad and grad is enabled, attaches [EinsumBackward].
enable_grad: Re-enable gradient computation inside a no_grad block.
erf: Error function: erf(x) = (2/sqrt(pi)) * integral(0, x, exp(-t^2) dt).
erfc: Complementary error function: erfc(x) = 1 - erf(x).
erfinv: Inverse error function: erfinv(erf(x)) = x.
exp: Differentiable elementwise exponential: c[i] = exp(x[i]).
expand: Broadcast (expand) a tensor to new_shape.
expm1: exp(x) - 1 – numerically stable for small x.
eye: Create an identity matrix of size n x n.
fake_quantize_differentiable: Differentiable fake quantize per-tensor (affine).
fft: 1-D complex-to-complex FFT along the last dimension.
fft2: 2-D FFT (complex-to-complex) along the last two spatial dimensions.
fft_differentiable: Differentiable 1-D FFT. Attaches FftBackward when grad is needed.
fftfreq: Discrete Fourier Transform sample frequencies.
fftn: N-dimensional complex-to-complex FFT.
fftshift: Shift the zero-frequency component to the center along the given axes.
fixed_point: Find a fixed point of f starting from x0, then compute its derivative w.r.t. params using the implicit function theorem.
flex_attention: Compute flexible multi-head attention with an optional score modification function.
from_slice: Create a tensor from a slice, copying the data.
from_vec: Create a tensor from a Vec<T>, taking ownership.
full: Create a tensor filled with a given value.
full_like: Create a tensor filled with value with the same shape as other.
gather: Gather values from input along dim using index.
gelu: Compute gelu(x) with the default exact (erf-based) approximation.
gelu_with: Compute gelu(x) with configurable approximation, attaching a backward node when gradients are enabled.
grad: Compute gradients of outputs with respect to inputs.
grad_norm: Compute the L2 norm of gradients of outputs with respect to inputs.
gradient_penalty: Compute the gradient penalty for WGAN-GP.
hessian: Compute the Hessian matrix of a scalar function at a point.
hfft: 1-D FFT of a Hermitian-symmetric complex spectrum, returning real output.
histc: Histogram — count elements in equal-width bins.
ifft: 1-D inverse FFT along the last dimension.
ifft2: 2-D inverse FFT (complex-to-complex) along the last two spatial dimensions.
ifft_differentiable: Differentiable 1-D inverse FFT. Attaches IfftBackward when grad is needed.
ifftn: N-dimensional inverse complex FFT.
ifftshift: Inverse of fftshift.
ihfft: 1-D inverse FFT of a real signal, returning a Hermitian-symmetric spectrum.
irfft: 1-D complex-to-real inverse FFT.
irfft_differentiable: Differentiable 1-D inverse real FFT. Attaches IrfftBackward when grad is needed.
irfftn: N-dimensional complex-to-real inverse FFT.
is_autocast_debug: Returns true if autocast debug event recording is active on this thread.
is_autocast_enabled: Returns true if mixed-precision autocast is currently enabled on this thread.
is_grad_enabled: Returns true if gradient tracking is currently enabled on this thread.
jacfwd: Compute the full Jacobian matrix using forward-mode AD.
jacobian: Compute the Jacobian matrix of a function at a point.
jvp: Compute the Jacobian-vector product (JVP): J @ v.
jvp_exact: Compute the exact Jacobian-vector product using forward-mode AD.
lgamma: Log-gamma function: lgamma(x) = log(|Gamma(x)|).
linspace: Create a 1-D tensor of num evenly spaced values from start to end (inclusive).
log: Differentiable elementwise natural log: c[i] = ln(x[i]).
log1p: log(1 + x) – numerically stable for small x.
logcumsumexp: Differentiable log-cumulative-sum-exp along dim.
magnitude_prune: Unstructured magnitude pruning: zero out the smallest weights.
masked_count: Number of valid (unmasked) entries; returns a 0-d tensor in T.
masked_equal: Mask out entries equal to value. Matches numpy.ma.masked_equal.
masked_invalid: Mask out non-finite entries (NaN, ±∞). Matches numpy.ma.masked_invalid.
masked_max: Max of valid entries; returns a 0-d tensor (NaN if all masked).
masked_mean: Mean of valid entries; returns a 0-d tensor.
masked_min: Min of valid entries; returns a 0-d tensor (NaN if all masked).
masked_sum: Sum of valid entries; returns a 0-d tensor.
masked_where: Wrap data with condition interpreted as “where condition is true, mask the value out”. Matches numpy.ma.masked_where. The resulting MaskedTensor has mask = !condition under the torch convention.
mean_dim: Mean along a specific dimension.
meshgrid: Create coordinate grids from 1-D coordinate vectors.
nested_scaled_dot_product_attention: Scaled dot-product attention over nested tensors.
no_grad: Execute a closure with gradient tracking disabled.
normalize_axis: Normalize a possibly-negative axis index to a positive one.
ones: Create a tensor filled with ones.
ones_like: Create a tensor of ones with the same shape as other.
permute_t: Permute tensor dimensions. Like PyTorch’s tensor.permute(dims).
prepare_qat: Prepare a set of named parameters for quantization-aware training.
quantize: Quantize a floating-point tensor.
quantize_named_tensors: Quantize every weight tensor in a module, returning a name -> QuantizedTensor map suitable for serialization or quantized inference.
quantized_matmul: Multiply two quantized 2-D matrices and return a quantized result.
rand: Create a tensor with random values uniformly distributed in [0, 1).
rand_like: Create a random tensor [0,1) with the same shape as other.
randn: Create a tensor with random values from a standard normal distribution.
randn_like: Create a random normal tensor with the same shape as other.
rearrange: Rearrange tensor dimensions using an einops-style pattern.
rearrange_with: Rearrange with explicit axis sizes for ambiguous splits.
reduce: Reduce along axes that appear on the left but not the right.
repeat: Repeat tensor elements along new or existing axes.
rfft: 1-D real-to-complex FFT along the last dimension.
rfft_differentiable: Differentiable 1-D real FFT. Attaches RfftBackward when grad is needed.
rfftfreq: Sample frequencies for rfft (non-negative half).
rfftn: N-dimensional real-to-complex FFT.
roll: Roll (circular shift) a tensor along a dimension.
scalar: Create a scalar (0-D) tensor.
scan: Sequential state accumulation (scan / fold with outputs).
scatter: Scatter src values into a clone of input along dim using index.
scatter_add: Scatter-add src values into a clone of input along dim.
scatter_add_segments: Segmented scatter-add of a [E, D] source into an [dim_size, D] output, indexed along dim 0 by index[e].
searchsorted: Find insertion indices for values in a sorted 1-D boundaries tensor.
select: Extract a single slice along dim at position index, removing the dimension.
set_autocast_debug: Enable or disable autocast event recording on this thread.
set_grad_enabled: Programmatically set whether gradients are enabled.
sigmoid: Compute sigmoid(x), attaching a backward node when gradients are enabled.
sin: Differentiable elementwise sine: c[i] = sin(x[i]).
sinc: Normalized sinc function: sinc(x) = sin(pix) / (pix), with sinc(0) = 1.
sparse_matmul_24: Matrix multiply a @ b where b is stored in 2:4 semi- structured format. The last-dim strides of b’s original dense shape must be a multiple of 4 (guaranteed by SemiStructuredSparseTensor::compress).
sparsity_ratio: Compute the sparsity ratio of a tensor: fraction of exact zeros.
split_t: Split tensor into pieces of given sizes along dim.
stack: Stack a slice of tensors along a new dimension dim.
sum_dim: Sum along a specific dimension.
tanh: Compute tanh(x), attaching a backward node when gradients are enabled.
tensor: Create a 1-D tensor from a slice (shape inferred).
topk: Return the k largest elements and their indices along the last dimension.
tril: Lower triangular part of a 2-D tensor.
triu: Upper triangular part of a 2-D tensor.
unique: Return the sorted unique elements of a 1-D tensor.
unique_consecutive: Remove consecutive duplicate elements from a 1-D tensor.
validate_cond_branches: Validate that two sets of outputs have matching shapes.
view_t: View tensor with new shape. Like PyTorch’s tensor.view(shape).
vjp: Compute the vector-Jacobian product (VJP): v^T @ J.
vmap: Vectorize a function over a batch dimension.
vmap2: Vectorize a two-argument function over batch dimensions.
where_cond: Ternary selection: output[i] = condition[i] ? x[i] : y[i].
xlogy: x * log(y), with the convention that xlogy(0, y) = 0 for any y.
zeros: Create a tensor filled with zeros.
zeros_like: Create a tensor of zeros with the same shape as other.

Type Aliases§

FerrotorchResult: Convenience alias for ferrotorch results.
Kernel: A dispatched kernel: takes the op’s input tensors, the currently-active keyset (after all higher-priority keys have been resolved), and a reference to the dispatcher so the kernel can redispatch to a lower-priority key.