baracuda-cutensor
Safe Rust wrappers for NVIDIA cuTENSOR — high-performance tensor primitives with arbitrary index permutations. Useful when you need tensor operations beyond what cuBLAS / cuDNN expose (e.g. arbitrary contractions, reductions over non-contiguous axes).
Coverage
Comprehensive:
- Handle: cuTENSOR context with stream binding.
- Descriptors:
TensorDescriptor— extents, strides, dtype.OperationDescriptor— what op to perform (Contraction, Reduction, Elementwise, ...).ComputeDescriptor— accumulator dtype.PlanPreference— heuristic / search policy.Plan— finalized plan ready for execution.
- Op catalog:
Contraction— generalized matmul over arbitrary index sets.Reduction— per-axis reductions with arbitrary output layout.ElementwiseBinary/ElementwiseTrinary— fused elementwise ops.Permutation— pure index permutation (transpose generalization).BlockSparseContraction/TrinaryContraction— specialized variants.
- Plan-cache I/O: serialize / deserialize plan caches across runs.
Stack-size note
cuTENSOR's planner can blow a 1 MiB Windows stack during
cutensorCreatePlan. If you hit that, run plan creation on a thread
with a larger stack:
let result = new
.stack_size
.spawn
.unwrap
.join;
(This is what the workspace's cutensor_matmul example does.)
Pairs with baracuda-cutensor-sys for the raw FFI surface.
Part of the baracuda workspace.
License
Dual MIT / Apache-2.0.