ferrotorch-core
Core tensor and autograd engine for ferrotorch — PyTorch in Rust.
What it provides
Tensor
Tensor<T>— N-dimensional tensor parameterized by element type (f32,f64, withf16/bf16storage support). Reference-counted viaArc<TensorInner>, with shape, strides, offset, and an optionalgrad_fnfor autograd.- Device abstraction —
Device::Cpu,Device::Cuda(ordinal). Move tensors with.to(device),.cuda(),.cpu(), and the pinned-memory variant.to_pinned(device)for fast CPU→GPU transfer. - Storage —
TensorStorageoverStorageBuffer::Cpu(Vec<T>)orStorageBuffer::Gpu(GpuBufferHandle)withon_deviceandon_device_pinnedconstructors.
Autograd
- Reverse-mode autodiff with
backward(), topological-sort backward pass, gradient accumulation, broadcast gradient reduction. no_grad,enable_grad,set_grad_enabledfor fine-grained autograd control.- Autocast —
autocast(dtype, || ...)mixed-precision regions withcurrent_autocast_snapshot/with_autocast_statehelpers (used by gradient checkpointing to preserve mixed-precision state across recomputation). - Gradient checkpointing —
checkpoint,checkpoint_multisave GPU RNG and autocast state, recompute the forward pass during backward. - Higher-order autograd, anomaly mode, hooks (
register_hook,register_post_accumulate_grad_hook), gradcheck. - Saved tensors, fixed-point derivatives for DEQ networks,
forward-mode AD via
DualNumber.
Operations
- Creation —
zeros,ones,full,tensor,from_slice,from_vec,scalar,eye,arange,linspace,rand,randn,*_likevariants - Arithmetic (differentiable) —
add,sub,mul,div,neg,pow,sqrt,abs, with broadcasting and operator overloading - Transcendental —
exp,log,log2,log10,log1p,sin,cos,tan,asin,acos,atan,sinh,cosh,tanh,expm1,erf,erfc - Reductions —
sum,mean,prod,sum_dim,mean_dim,nansum,nanmean,logsumexp,logsumexp_dim - Cumulative scans (GPU-native PTX) —
cumsum,cumprod,cummax,cummin,logcumsumexp - Linear algebra —
mm,bmm,matmul,dot,cholesky,inv,lstsq,qr,svd,eig,solve, with cuBLAS GPU dispatch - Shape ops (zero-copy views) —
reshape,view,view_reshape,permute,transpose,narrow,squeeze,unsqueeze,flatten,expand,chunk,split,cat,stack - Indexing —
index_select,gather,scatter,scatter_add,masked_fill,masked_select,where_,nonzero - Search —
searchsorted,bucketize,unique,topk,meshgrid - Einops —
rearrange,repeat,reducewith readable string patterns and zero-copy fast paths for identity permutations - Einsum — differentiable Einstein summation
- Activations (differentiable) —
relu,gelu,silu,elu,mish,sigmoid,tanh,softmax,log_softmax - FFT —
fft,ifft,rfft,irfft,fft2,ifft2 - Sparse —
SparseTensor(COO format) with sparse arithmetic - Quantization — INT8/INT4 per-tensor and per-channel
- Flexible attention —
flex_attentionwith score-mod callbacks, composed frombmm + softmax + catfor full GPU dispatch - Pruning — magnitude, structured, random pruning utilities
- Vmap — vectorized map (in-development)
Quick start
use ;
Part of ferrotorch
This crate is one component of the ferrotorch workspace. See the workspace README for full documentation.
License
MIT OR Apache-2.0