baracuda-cutensor

Safe Rust wrappers for NVIDIA cuTENSOR — high-performance tensor primitives with arbitrary index permutations. Useful when you need tensor operations beyond what cuBLAS / cuDNN expose (e.g. arbitrary contractions, reductions over non-contiguous axes).

Coverage

Comprehensive:

Handle: cuTENSOR context with stream binding.
Descriptors:
- TensorDescriptor — extents, strides, dtype.
- OperationDescriptor — what op to perform (Contraction, Reduction, Elementwise, ...).
- ComputeDescriptor — accumulator dtype.
- PlanPreference — heuristic / search policy.
- Plan — finalized plan ready for execution.
Op catalog:
- Contraction — generalized matmul over arbitrary index sets.
- Reduction — per-axis reductions with arbitrary output layout.
- ElementwiseBinary / ElementwiseTrinary — fused elementwise ops.
- Permutation — pure index permutation (transpose generalization).
- BlockSparseContraction / TrinaryContraction — specialized variants.
Plan-cache I/O: serialize / deserialize plan caches across runs.

Stack-size note

cuTENSOR's planner can blow a 1 MiB Windows stack during cutensorCreatePlan. If you hit that, run plan creation on a thread with a larger stack:

let result = std::thread::Builder::new()
    .stack_size(32 * 1024 * 1024)
    .spawn(|| /* plan creation here */)
    .unwrap()
    .join();

(This is what the workspace's cutensor_matmul example does.)

Pairs with baracuda-cutensor-sys for the raw FFI surface.

Part of the baracuda workspace.

License

Dual MIT / Apache-2.0.

baracuda-cutensor 0.0.1-alpha.68

baracuda-cutensor

Coverage

Stack-size note

License