Skip to main content

Crate ferrotorch

Crate ferrotorch 

Source

Modules§

autograd
cpu_pool
CPU tensor buffer pool — caching allocator for host memory.
creation
data
Data loading, datasets, samplers, and transforms.
device
dispatch
Multi-dispatch key system for composable tensor backends. CL-397.
distributions
Probability distributions for sampling and variational inference.
dtype
einops
Einops-style tensor rearrangement operations.
einsum
Einstein summation (einsum) for ferrotorch tensors.
error
fft
FFT operations for tensors, powered by rustfft.
flex_attention
Flexible attention with customizable score modification.
gpu_dispatch
GPU backend dispatch layer.
grad_fns
hub
Model hub for downloading and caching pretrained models.
jit
JIT tracing, IR graph, optimization passes, and code generation.
linalg
Advanced linear algebra operations bridging to ferray-linalg.
meta_propagate
Helpers for propagating the meta device through tensor operations.
nested
nn
Neural network modules and layers.
ops
optim
Optimizers and learning rate schedulers.
prelude
Prelude module — import everything commonly needed.
profiler
Performance profiling and Chrome trace export.
profiler_hook
Thread-local profiler hook for auto-instrumented tensor ops.
pruning
quantize
Post-training quantization (PTQ) for ferrotorch tensors.
serialize
Model serialization: ONNX export, PyTorch import, safetensors, GGUF.
shape
sparse
special
Special mathematical functions (torch.special equivalent).
storage
tensor
train
Training loop, Learner, callbacks, and metrics.
vision
Computer vision models, datasets, and transforms.
vmap
Vectorized map (vmap) — apply a function over a batch dimension.

Structs§

AnomalyMode
Global anomaly detection mode.
CooTensor
A 2-D sparse tensor in COO (Coordinate List) format with separate row and column index arrays.
CsrTensor
A 2-D sparse tensor in CSR (Compressed Sparse Row) format.
CumExtremeResult
Result of cummax / cummin: values tensor and indices tensor.
DispatchKeySet
A set of active DispatchKeys, stored as a u16 bitmask for constant-time membership testing and iteration.
Dispatcher
A kernel registration table keyed by (op_name, dispatch_key). Looking up a kernel is a single HashMap probe.
DualTensor
A dual-number tensor: primal + epsilon * tangent.
FakeQuantize
Simulates quantization during training by quantizing and immediately dequantizing values, while allowing gradients to flow through via the straight-through estimator (STE).
ForwardBacktrace
A captured forward-pass backtrace, stored on tensors when anomaly mode is on.
HistogramObserver
Histogram-based observer that collects a distribution of values.
HookHandle
An opaque handle returned by register_hook / register_post_accumulate_grad_hook.
MinMaxObserver
Tracks the running min/max of observed values.
NestedTensor
A nested (ragged) tensor — a collection of tensors with differing sizes along one dimension (the “ragged” dimension).
PackedNestedTensor
A nested (jagged) tensor stored as one contiguous flat buffer with an offsets array marking the start of each component.
PerChannelMinMaxObserver
Tracks per-channel running min/max of observed values.
QParams
Computed quantization parameters (scale and zero_point).
QatLayer
A layer with associated FakeQuantize modules for QAT.
QatModel
Wraps a collection of named weight tensors for quantization-aware training.
QuantizedTensor
A tensor stored in quantized (integer) representation.
SemiStructuredSparseTensor
A tensor stored in the NVIDIA 2:4 structured sparsity format.
SparseTensor
A sparse tensor in COO (Coordinate List) format.
Tensor
The central type. A dynamically-shaped tensor with gradient tracking and device placement.
TensorId
A unique, monotonically increasing tensor identifier.
TensorStorage
The underlying data buffer for a tensor, tagged with its device.

Enums§

AutocastCategory
Policy: which operations should be cast to reduced precision.
AutocastDtype
The reduced-precision dtype used during autocast regions.
DType
Runtime descriptor for the element type stored in an array.
Device
Device on which a tensor’s data resides.
DispatchKey
One of the 16 possible dispatch keys, ordered from lowest to highest priority. The u8 repr matches the bit position in DispatchKeySet’s internal u16 bitmask, so the priority ordering is both the enum declaration order and the numeric order of the discriminants.
EinopsReduction
Reduction operation for reduce.
FerrotorchError
Errors produced by ferrotorch operations.
GeluApproximate
Selects the GELU approximation method.
MemoryFormat
Describes the physical memory layout of a tensor.
QuantDtype
Target integer dtype for quantized storage.
QuantScheme
Granularity of quantization parameters (scale / zero_point).
StorageBuffer
Device-specific data buffer.

Traits§

Element
Trait bound for types that can be stored in a ferray array.
Float
Marker trait for float element types that support autograd.
GradFn
The backward function trait for reverse-mode automatic differentiation.
Observer
Trait for quantization observers that collect data statistics.

Functions§

apply_2_4_mask
Apply 2:4 structured sparsity mask.
arange
Create a 1-D tensor with values from start to end (exclusive) with step step.
autocast
Execute a closure with mixed-precision autocast enabled.
autocast_dtype
Returns the target dtype for autocast regions on this thread.
autocast_guard
Primary entry point for op implementations to query autocast policy.
backward
Compute gradients of all leaf tensors that contribute to root.
backward_with_grad
Run backward pass through the computation graph.
broadcast_shapes
Compute the broadcasted shape of two shapes, following NumPy/PyTorch rules.
bucketize
Discretize input values into buckets defined by boundaries.
cat
Concatenate tensors along an axis.
cdist
Pairwise distance matrix between two sets of vectors.
check_gradient_anomaly
Check a gradient tensor for NaN or Inf values (anomaly check).
chunk_t
Split tensor into chunks roughly equal pieces along dim.
clamp
Differentiable elementwise clamp: c[i] = x[i].clamp(min, max).
cond
Differentiable conditional: execute true_fn or false_fn based on predicate, with autograd support.
contiguous_t
Make tensor contiguous (copy data if needed).
cos
Differentiable elementwise cosine: c[i] = cos(x[i]).
cummax
Cumulative maximum along dim.
cummin
Cumulative minimum along dim.
cumprod
Differentiable cumulative product along dim.
cumsum
Differentiable cumulative sum along dim.
dequantize
Dequantize back to a floating-point tensor.
detect_anomaly
Execute a closure with anomaly detection enabled.
diag
Extract the diagonal of a 2-D tensor, or construct a 2-D diagonal matrix from a 1-D tensor.
diagflat
Construct a diagonal matrix from a 1-D tensor (flattened if needed).
digamma
Digamma function: psi(x) = d/dx ln(Gamma(x)).
dual_add
Forward rule for addition: d(a + b) = da + db.
dual_cos
Forward rule for cos: d(cos(a)) = -da * sin(a).
dual_div
Forward rule for division: d(a / b) = (da * b - a * db) / b^2.
dual_exp
Forward rule for exp: d(exp(a)) = da * exp(a).
dual_log
Forward rule for log: d(log(a)) = da / a.
dual_matmul
Forward rule for matrix multiplication: d(A @ B) = dA @ B + A @ dB.
dual_mul
Forward rule for multiplication: d(a * b) = a * db + da * b.
dual_neg
Forward rule for negation: d(-a) = -da.
dual_relu
Forward rule for ReLU: d(relu(a)) = da * (a > 0).
dual_sigmoid
Forward rule for sigmoid: d(sigmoid(a)) = da * sigmoid(a) * (1 - sigmoid(a)).
dual_sin
Forward rule for sin: d(sin(a)) = da * cos(a).
dual_sub
Forward rule for subtraction: d(a - b) = da - db.
dual_tanh
Forward rule for tanh: d(tanh(a)) = da * (1 - tanh(a)^2).
einsum
Einstein summation.
einsum_differentiable
Differentiable Einstein summation. If any input requires grad and grad is enabled, attaches [EinsumBackward].
enable_grad
Re-enable gradient computation inside a no_grad block.
erf
Error function: erf(x) = (2/sqrt(pi)) * integral(0, x, exp(-t^2) dt).
erfc
Complementary error function: erfc(x) = 1 - erf(x).
erfinv
Inverse error function: erfinv(erf(x)) = x.
exp
Differentiable elementwise exponential: c[i] = exp(x[i]).
expm1
exp(x) - 1 – numerically stable for small x.
eye
Create an identity matrix of size n x n.
fake_quantize_differentiable
Differentiable fake quantize per-tensor (affine).
fft
1-D complex-to-complex FFT along the last dimension.
fft2
2-D FFT (complex-to-complex) along the last two spatial dimensions.
fft_differentiable
Differentiable 1-D FFT. Attaches FftBackward when grad is needed.
fixed_point
Find a fixed point of f starting from x0, then compute its derivative w.r.t. params using the implicit function theorem.
flex_attention
Compute flexible multi-head attention with an optional score modification function.
from_slice
Create a tensor from a slice, copying the data.
from_vec
Create a tensor from a Vec<T>, taking ownership.
full
Create a tensor filled with a given value.
full_like
Create a tensor filled with value with the same shape as other.
gather
Gather values from input along dim using index.
gelu
Compute gelu(x) with the default exact (erf-based) approximation.
gelu_with
Compute gelu(x) with configurable approximation, attaching a backward node when gradients are enabled.
grad
Compute gradients of outputs with respect to inputs.
grad_norm
Compute the L2 norm of gradients of outputs with respect to inputs.
gradient_penalty
Compute the gradient penalty for WGAN-GP.
hessian
Compute the Hessian matrix of a scalar function at a point.
histc
Histogram — count elements in equal-width bins.
ifft
1-D inverse FFT along the last dimension.
ifft2
2-D inverse FFT (complex-to-complex) along the last two spatial dimensions.
ifft_differentiable
Differentiable 1-D inverse FFT. Attaches IfftBackward when grad is needed.
irfft
1-D complex-to-real inverse FFT.
irfft_differentiable
Differentiable 1-D inverse real FFT. Attaches IrfftBackward when grad is needed.
is_autocast_debug
Returns true if autocast debug event recording is active on this thread.
is_autocast_enabled
Returns true if mixed-precision autocast is currently enabled on this thread.
is_grad_enabled
Returns true if gradient tracking is currently enabled on this thread.
jacfwd
Compute the full Jacobian matrix using forward-mode AD.
jacobian
Compute the Jacobian matrix of a function at a point.
jvp
Compute the Jacobian-vector product (JVP): J @ v.
jvp_exact
Compute the exact Jacobian-vector product using forward-mode AD.
lgamma
Log-gamma function: lgamma(x) = log(|Gamma(x)|).
linspace
Create a 1-D tensor of num evenly spaced values from start to end (inclusive).
log
Differentiable elementwise natural log: c[i] = ln(x[i]).
log1p
log(1 + x) – numerically stable for small x.
logcumsumexp
Differentiable log-cumulative-sum-exp along dim.
magnitude_prune
Unstructured magnitude pruning: zero out the smallest weights.
mean_dim
Mean along a specific dimension.
meshgrid
Create coordinate grids from 1-D coordinate vectors.
nested_scaled_dot_product_attention
Scaled dot-product attention over nested tensors.
no_grad
Execute a closure with gradient tracking disabled.
normalize_axis
Normalize a possibly-negative axis index to a positive one.
ones
Create a tensor filled with ones.
ones_like
Create a tensor of ones with the same shape as other.
permute_t
Permute tensor dimensions. Like PyTorch’s tensor.permute(dims).
prepare_qat
Prepare a set of named parameters for quantization-aware training.
quantize
Quantize a floating-point tensor.
quantize_named_tensors
Quantize every weight tensor in a module, returning a name -> QuantizedTensor map suitable for serialization or quantized inference.
quantized_matmul
Multiply two quantized 2-D matrices and return a quantized result.
rand
Create a tensor with random values uniformly distributed in [0, 1).
rand_like
Create a random tensor [0,1) with the same shape as other.
randn
Create a tensor with random values from a standard normal distribution.
randn_like
Create a random normal tensor with the same shape as other.
rearrange
Rearrange tensor dimensions using an einops-style pattern.
rearrange_with
Rearrange with explicit axis sizes for ambiguous splits.
reduce
Reduce along axes that appear on the left but not the right.
repeat
Repeat tensor elements along new or existing axes.
rfft
1-D real-to-complex FFT along the last dimension.
rfft_differentiable
Differentiable 1-D real FFT. Attaches RfftBackward when grad is needed.
roll
Roll (circular shift) a tensor along a dimension.
scalar
Create a scalar (0-D) tensor.
scan
Differentiable sequential scan over a sequence of tensors.
scatter
Scatter src values into a clone of input along dim using index.
scatter_add
Scatter-add src values into a clone of input along dim.
searchsorted
Find insertion indices for values in a sorted 1-D boundaries tensor.
select
Extract a single slice along dim at position index, removing the dimension.
set_autocast_debug
Enable or disable autocast event recording on this thread.
set_grad_enabled
Programmatically set whether gradients are enabled.
sigmoid
Compute sigmoid(x), attaching a backward node when gradients are enabled.
sin
Differentiable elementwise sine: c[i] = sin(x[i]).
sinc
Normalized sinc function: sinc(x) = sin(pix) / (pix), with sinc(0) = 1.
sparse_matmul_24
Matrix multiply a @ b where b is stored in 2:4 semi- structured format. The last-dim strides of b’s original dense shape must be a multiple of 4 (guaranteed by SemiStructuredSparseTensor::compress).
sparsity_ratio
Compute the sparsity ratio of a tensor: fraction of exact zeros.
split_t
Split tensor into pieces of given sizes along dim.
stack
Stack a slice of tensors along a new dimension dim.
sum_dim
Sum along a specific dimension.
tanh
Compute tanh(x), attaching a backward node when gradients are enabled.
tensor
Create a 1-D tensor from a slice (shape inferred).
topk
Return the k largest elements and their indices along the last dimension.
tril
Lower triangular part of a 2-D tensor.
triu
Upper triangular part of a 2-D tensor.
unique
Return the sorted unique elements of a 1-D tensor.
unique_consecutive
Remove consecutive duplicate elements from a 1-D tensor.
validate_cond_branches
Validate that two branch functions produce outputs with matching shapes and counts, using the given operands for a test evaluation.
view_t
View tensor with new shape. Like PyTorch’s tensor.view(shape).
vjp
Compute the vector-Jacobian product (VJP): v^T @ J.
vmap
Vectorize a function over a batch dimension.
vmap2
Vectorize a two-argument function over batch dimensions.
where_cond
Ternary selection: output[i] = condition[i] ? x[i] : y[i].
xlogy
x * log(y), with the convention that xlogy(0, y) = 0 for any y.
zeros
Create a tensor filled with zeros.
zeros_like
Create a tensor of zeros with the same shape as other.

Type Aliases§

FerrotorchResult
Convenience alias for ferrotorch results.
Kernel
A dispatched kernel: takes the op’s input tensors, the currently-active keyset (after all higher-priority keys have been resolved), and a reference to the dispatcher so the kernel can redispatch to a lower-priority key.