Skip to main content Crate ferrotorch Copy item path Source autograd cpu_pool CPU tensor buffer pool — caching allocator for host memory. creation data Data loading, datasets, samplers, and transforms. device dispatch Multi-dispatch key system for composable tensor backends. CL-397. distributions Probability distributions for sampling and variational inference. dtype einops Einops-style tensor rearrangement operations. einsum Einstein summation (einsum) for ferrotorch tensors. error fft FFT operations for tensors, powered by rustfft. flex_attention Flexible attention with customizable score modification. gpu_dispatch GPU backend dispatch layer. grad_fns hub Model hub for downloading and caching pretrained models. jit JIT tracing, IR graph, optimization passes, and code generation. linalg Advanced linear algebra operations bridging to ferray-linalg. meta_propagate Helpers for propagating the meta device through tensor operations. nested nn Neural network modules and layers. ops optim Optimizers and learning rate schedulers. prelude Prelude module — import everything commonly needed. profiler Performance profiling and Chrome trace export. profiler_hook Thread-local profiler hook for auto-instrumented tensor ops. pruning quantize Post-training quantization (PTQ) for ferrotorch tensors. serialize Model serialization: ONNX export, PyTorch import, safetensors, GGUF. shape sparse special Special mathematical functions (torch.special equivalent). storage tensor train Training loop, Learner, callbacks, and metrics. vision Computer vision models, datasets, and transforms. vmap Vectorized map (vmap) — apply a function over a batch dimension. AnomalyMode Global anomaly detection mode. CooTensor A 2-D sparse tensor in COO (Coordinate List) format with separate
row and column index arrays. CsrTensor A 2-D sparse tensor in CSR (Compressed Sparse Row) format. CumExtremeResult Result of cummax / cummin: values tensor and indices tensor. DispatchKeySet A set of active DispatchKey s, stored as a u16 bitmask for
constant-time membership testing and iteration. Dispatcher A kernel registration table keyed by (op_name, dispatch_key).
Looking up a kernel is a single HashMap probe. DualTensor A dual-number tensor: primal + epsilon * tangent. FakeQuantize Simulates quantization during training by quantizing and immediately
dequantizing values, while allowing gradients to flow through via the
straight-through estimator (STE). ForwardBacktrace A captured forward-pass backtrace, stored on tensors when anomaly mode is on. HistogramObserver Histogram-based observer that collects a distribution of values. HookHandle An opaque handle returned by register_hook / register_post_accumulate_grad_hook. MinMaxObserver Tracks the running min/max of observed values. NestedTensor A nested (ragged) tensor — a collection of tensors with differing sizes
along one dimension (the “ragged” dimension). PackedNestedTensor A nested (jagged) tensor stored as one contiguous flat buffer
with an offsets array marking the start of each component. PerChannelMinMaxObserver Tracks per-channel running min/max of observed values. QParams Computed quantization parameters (scale and zero_point). QatLayer A layer with associated FakeQuantize modules for QAT. QatModel Wraps a collection of named weight tensors for quantization-aware training. QuantizedTensor A tensor stored in quantized (integer) representation. SemiStructuredSparseTensor A tensor stored in the NVIDIA 2:4 structured sparsity format. SparseTensor A sparse tensor in COO (Coordinate List) format. Tensor The central type. A dynamically-shaped tensor with gradient tracking
and device placement. TensorId A unique, monotonically increasing tensor identifier. TensorStorage The underlying data buffer for a tensor, tagged with its device. AutocastCategory Policy: which operations should be cast to reduced precision. AutocastDtype The reduced-precision dtype used during autocast regions. DType Runtime descriptor for the element type stored in an array. Device Device on which a tensor’s data resides. DispatchKey One of the 16 possible dispatch keys, ordered from lowest to
highest priority. The u8 repr matches the bit position in
DispatchKeySet ’s internal u16 bitmask, so the priority
ordering is both the enum declaration order and the numeric
order of the discriminants. EinopsReduction Reduction operation for reduce . FerrotorchError Errors produced by ferrotorch operations. GeluApproximate Selects the GELU approximation method. MemoryFormat Describes the physical memory layout of a tensor. QuantDtype Target integer dtype for quantized storage. QuantScheme Granularity of quantization parameters (scale / zero_point). StorageBuffer Device-specific data buffer. Element Trait bound for types that can be stored in a ferray array. Float Marker trait for float element types that support autograd. GradFn The backward function trait for reverse-mode automatic differentiation. Observer Trait for quantization observers that collect data statistics. apply_2_4_mask Apply 2:4 structured sparsity mask. arange Create a 1-D tensor with values from start to end (exclusive) with step step. autocast Execute a closure with mixed-precision autocast enabled. autocast_dtype Returns the target dtype for autocast regions on this thread. autocast_guard Primary entry point for op implementations to query autocast policy. backward Compute gradients of all leaf tensors that contribute to root. backward_with_grad Run backward pass through the computation graph. broadcast_shapes Compute the broadcasted shape of two shapes, following NumPy/PyTorch rules. bucketize Discretize input values into buckets defined by boundaries. cat Concatenate tensors along an axis. cdist Pairwise distance matrix between two sets of vectors. check_gradient_anomaly Check a gradient tensor for NaN or Inf values (anomaly check). chunk_t Split tensor into chunks roughly equal pieces along dim. clamp Differentiable elementwise clamp: c[i] = x[i].clamp(min, max). cond Differentiable conditional: execute true_fn or false_fn based on
predicate, with autograd support. contiguous_t Make tensor contiguous (copy data if needed). cos Differentiable elementwise cosine: c[i] = cos(x[i]). cummax Cumulative maximum along dim. cummin Cumulative minimum along dim. cumprod Differentiable cumulative product along dim. cumsum Differentiable cumulative sum along dim. dequantize Dequantize back to a floating-point tensor. detect_anomaly Execute a closure with anomaly detection enabled. diag Extract the diagonal of a 2-D tensor, or construct a 2-D diagonal matrix
from a 1-D tensor. diagflat Construct a diagonal matrix from a 1-D tensor (flattened if needed). digamma Digamma function: psi(x) = d/dx ln(Gamma(x)). dual_add Forward rule for addition: d(a + b) = da + db. dual_cos Forward rule for cos: d(cos(a)) = -da * sin(a). dual_div Forward rule for division: d(a / b) = (da * b - a * db) / b^2. dual_exp Forward rule for exp: d(exp(a)) = da * exp(a). dual_log Forward rule for log: d(log(a)) = da / a. dual_matmul Forward rule for matrix multiplication: d(A @ B) = dA @ B + A @ dB. dual_mul Forward rule for multiplication: d(a * b) = a * db + da * b. dual_neg Forward rule for negation: d(-a) = -da. dual_relu Forward rule for ReLU: d(relu(a)) = da * (a > 0). dual_sigmoid Forward rule for sigmoid: d(sigmoid(a)) = da * sigmoid(a) * (1 - sigmoid(a)). dual_sin Forward rule for sin: d(sin(a)) = da * cos(a). dual_sub Forward rule for subtraction: d(a - b) = da - db. dual_tanh Forward rule for tanh: d(tanh(a)) = da * (1 - tanh(a)^2). einsum Einstein summation. einsum_differentiable Differentiable Einstein summation. If any input requires grad and grad
is enabled, attaches [EinsumBackward]. enable_grad Re-enable gradient computation inside a no_grad block. erf Error function: erf(x) = (2/sqrt(pi)) * integral(0, x, exp(-t^2) dt). erfc Complementary error function: erfc(x) = 1 - erf(x). erfinv Inverse error function: erfinv(erf(x)) = x. exp Differentiable elementwise exponential: c[i] = exp(x[i]). expm1 exp(x) - 1 – numerically stable for small x. eye Create an identity matrix of size n x n. fake_quantize_differentiable Differentiable fake quantize per-tensor (affine). fft 1-D complex-to-complex FFT along the last dimension. fft2 2-D FFT (complex-to-complex) along the last two spatial dimensions. fft_differentiable Differentiable 1-D FFT. Attaches FftBackward when grad is needed. fixed_point Find a fixed point of f starting from x0, then compute its derivative
w.r.t. params using the implicit function theorem. flex_attention Compute flexible multi-head attention with an optional score modification
function. from_slice Create a tensor from a slice, copying the data. from_vec Create a tensor from a Vec<T>, taking ownership. full Create a tensor filled with a given value. full_like Create a tensor filled with value with the same shape as other. gather Gather values from input along dim using index. gelu Compute gelu(x) with the default exact (erf-based) approximation. gelu_with Compute gelu(x) with configurable approximation, attaching a backward
node when gradients are enabled. grad Compute gradients of outputs with respect to inputs. grad_norm Compute the L2 norm of gradients of outputs with respect to inputs. gradient_penalty Compute the gradient penalty for WGAN-GP. hessian Compute the Hessian matrix of a scalar function at a point. histc Histogram — count elements in equal-width bins. ifft 1-D inverse FFT along the last dimension. ifft2 2-D inverse FFT (complex-to-complex) along the last two spatial dimensions. ifft_differentiable Differentiable 1-D inverse FFT. Attaches IfftBackward when grad is needed. irfft 1-D complex-to-real inverse FFT. irfft_differentiable Differentiable 1-D inverse real FFT. Attaches IrfftBackward when grad is needed. is_autocast_debug Returns true if autocast debug event recording is active on this thread. is_autocast_enabled Returns true if mixed-precision autocast is currently enabled on this thread. is_grad_enabled Returns true if gradient tracking is currently enabled on this thread. jacfwd Compute the full Jacobian matrix using forward-mode AD. jacobian Compute the Jacobian matrix of a function at a point. jvp Compute the Jacobian-vector product (JVP): J @ v. jvp_exact Compute the exact Jacobian-vector product using forward-mode AD. lgamma Log-gamma function: lgamma(x) = log(|Gamma(x)|). linspace Create a 1-D tensor of num evenly spaced values from start to end (inclusive). log Differentiable elementwise natural log: c[i] = ln(x[i]). log1p log(1 + x) – numerically stable for small x. logcumsumexp Differentiable log-cumulative-sum-exp along dim. magnitude_prune Unstructured magnitude pruning: zero out the smallest weights. mean_dim Mean along a specific dimension. meshgrid Create coordinate grids from 1-D coordinate vectors. nested_scaled_dot_product_attention Scaled dot-product attention over nested tensors. no_grad Execute a closure with gradient tracking disabled. normalize_axis Normalize a possibly-negative axis index to a positive one. ones Create a tensor filled with ones. ones_like Create a tensor of ones with the same shape as other. permute_t Permute tensor dimensions. Like PyTorch’s tensor.permute(dims). prepare_qat Prepare a set of named parameters for quantization-aware training. quantize Quantize a floating-point tensor. quantize_named_tensors Quantize every weight tensor in a module, returning a name -> QuantizedTensor
map suitable for serialization or quantized inference. quantized_matmul Multiply two quantized 2-D matrices and return a quantized result. rand Create a tensor with random values uniformly distributed in [0, 1). rand_like Create a random tensor [0,1) with the same shape as other. randn Create a tensor with random values from a standard normal distribution. randn_like Create a random normal tensor with the same shape as other. rearrange Rearrange tensor dimensions using an einops-style pattern. rearrange_with Rearrange with explicit axis sizes for ambiguous splits. reduce Reduce along axes that appear on the left but not the right. repeat Repeat tensor elements along new or existing axes. rfft 1-D real-to-complex FFT along the last dimension. rfft_differentiable Differentiable 1-D real FFT. Attaches RfftBackward when grad is needed. roll Roll (circular shift) a tensor along a dimension. scalar Create a scalar (0-D) tensor. scan Differentiable sequential scan over a sequence of tensors. scatter Scatter src values into a clone of input along dim using index. scatter_add Scatter-add src values into a clone of input along dim. searchsorted Find insertion indices for values in a sorted 1-D boundaries tensor. select Extract a single slice along dim at position index, removing the
dimension. set_autocast_debug Enable or disable autocast event recording on this thread. set_grad_enabled Programmatically set whether gradients are enabled. sigmoid Compute sigmoid(x), attaching a backward node when gradients are enabled. sin Differentiable elementwise sine: c[i] = sin(x[i]). sinc Normalized sinc function: sinc(x) = sin(pix) / (pi x), with sinc(0) = 1. sparse_matmul_24 Matrix multiply a @ b where b is stored in 2:4 semi-
structured format. The last-dim strides of b’s original
dense shape must be a multiple of 4 (guaranteed by
SemiStructuredSparseTensor::compress ). sparsity_ratio Compute the sparsity ratio of a tensor: fraction of exact zeros. split_t Split tensor into pieces of given sizes along dim. stack Stack a slice of tensors along a new dimension dim. sum_dim Sum along a specific dimension. tanh Compute tanh(x), attaching a backward node when gradients are enabled. tensor Create a 1-D tensor from a slice (shape inferred). topk Return the k largest elements and their indices along the last dimension. tril Lower triangular part of a 2-D tensor. triu Upper triangular part of a 2-D tensor. unique Return the sorted unique elements of a 1-D tensor. unique_consecutive Remove consecutive duplicate elements from a 1-D tensor. validate_cond_branches Validate that two branch functions produce outputs with matching
shapes and counts, using the given operands for a test evaluation. view_t View tensor with new shape. Like PyTorch’s tensor.view(shape). vjp Compute the vector-Jacobian product (VJP): v^T @ J. vmap Vectorize a function over a batch dimension. vmap2 Vectorize a two-argument function over batch dimensions. where_cond Ternary selection: output[i] = condition[i] ? x[i] : y[i]. xlogy x * log(y), with the convention that xlogy(0, y) = 0 for any y. zeros Create a tensor filled with zeros. zeros_like Create a tensor of zeros with the same shape as other. FerrotorchResult Convenience alias for ferrotorch results. Kernel A dispatched kernel: takes the op’s input tensors, the
currently-active keyset (after all higher-priority keys have
been resolved), and a reference to the dispatcher so the kernel
can redispatch to a lower-priority key.