ferrotorch-core 0.3.0

Core tensor and autograd engine for ferrotorch — PyTorch in Rust
Documentation

ferrotorch-core

Core tensor and autograd engine for ferrotorch — PyTorch in Rust.

What it provides

Tensor

  • Tensor<T> — N-dimensional tensor parameterized by element type (f32, f64, with f16/bf16 storage support). Reference-counted via Arc<TensorInner>, with shape, strides, offset, and an optional grad_fn for autograd.
  • Device abstractionDevice::Cpu, Device::Cuda(ordinal). Move tensors with .to(device), .cuda(), .cpu(), and the pinned-memory variant .to_pinned(device) for fast CPU→GPU transfer.
  • StorageTensorStorage over StorageBuffer::Cpu(Vec<T>) or StorageBuffer::Gpu(GpuBufferHandle) with on_device and on_device_pinned constructors.

Autograd

  • Reverse-mode autodiff with backward(), topological-sort backward pass, gradient accumulation, broadcast gradient reduction.
  • no_grad, enable_grad, set_grad_enabled for fine-grained autograd control.
  • Autocastautocast(dtype, || ...) mixed-precision regions with current_autocast_snapshot / with_autocast_state helpers (used by gradient checkpointing to preserve mixed-precision state across recomputation).
  • Gradient checkpointingcheckpoint, checkpoint_multi save GPU RNG and autocast state, recompute the forward pass during backward.
  • Higher-order autograd, anomaly mode, hooks (register_hook, register_post_accumulate_grad_hook), gradcheck.
  • Saved tensors, fixed-point derivatives for DEQ networks, forward-mode AD via DualNumber.

Operations

  • Creationzeros, ones, full, tensor, from_slice, from_vec, scalar, eye, arange, linspace, rand, randn, *_like variants
  • Arithmetic (differentiable) — add, sub, mul, div, neg, pow, sqrt, abs, with broadcasting and operator overloading
  • Transcendentalexp, log, log2, log10, log1p, sin, cos, tan, asin, acos, atan, sinh, cosh, tanh, expm1, erf, erfc
  • Reductionssum, mean, prod, sum_dim, mean_dim, nansum, nanmean, logsumexp, logsumexp_dim
  • Cumulative scans (GPU-native PTX) — cumsum, cumprod, cummax, cummin, logcumsumexp
  • Linear algebramm, bmm, matmul, dot, cholesky, inv, lstsq, qr, svd, eig, solve, with cuBLAS GPU dispatch
  • Shape ops (zero-copy views) — reshape, view, view_reshape, permute, transpose, narrow, squeeze, unsqueeze, flatten, expand, chunk, split, cat, stack
  • Indexingindex_select, gather, scatter, scatter_add, masked_fill, masked_select, where_, nonzero
  • Searchsearchsorted, bucketize, unique, topk, meshgrid
  • Einopsrearrange, repeat, reduce with readable string patterns and zero-copy fast paths for identity permutations
  • Einsum — differentiable Einstein summation
  • Activations (differentiable) — relu, gelu, silu, elu, mish, sigmoid, tanh, softmax, log_softmax
  • FFTfft, ifft, rfft, irfft, fft2, ifft2
  • SparseSparseTensor (COO format) with sparse arithmetic
  • Quantization — INT8/INT4 per-tensor and per-channel
  • Flexible attentionflex_attention with score-mod callbacks, composed from bmm + softmax + cat for full GPU dispatch
  • Pruning — magnitude, structured, random pruning utilities
  • Vmap — vectorized map (in-development)

Quick start

use ferrotorch_core::{tensor, Tensor, FerrotorchResult};

fn main() -> FerrotorchResult<()> {
    let x: Tensor<f32> = tensor(&[1.0, 2.0, 3.0])?.requires_grad_(true);
    let y = (&x * &x)?.sum_t()?;
    y.backward()?;
    println!("grad of sum(x*x) wrt x: {:?}", x.grad()?);
    Ok(())
}

Part of ferrotorch

This crate is one component of the ferrotorch workspace. See the workspace README for full documentation.

License

MIT OR Apache-2.0