baracuda-cutensor 0.0.1-alpha.68

Safe Rust wrappers for NVIDIA cuTENSOR. Scaffolding at v0.1.
Documentation

baracuda-cutensor

Safe Rust wrappers for NVIDIA cuTENSOR — high-performance tensor primitives with arbitrary index permutations. Useful when you need tensor operations beyond what cuBLAS / cuDNN expose (e.g. arbitrary contractions, reductions over non-contiguous axes).

Coverage

Comprehensive:

  • Handle: cuTENSOR context with stream binding.
  • Descriptors:
    • TensorDescriptor — extents, strides, dtype.
    • OperationDescriptor — what op to perform (Contraction, Reduction, Elementwise, ...).
    • ComputeDescriptor — accumulator dtype.
    • PlanPreference — heuristic / search policy.
    • Plan — finalized plan ready for execution.
  • Op catalog:
    • Contraction — generalized matmul over arbitrary index sets.
    • Reduction — per-axis reductions with arbitrary output layout.
    • ElementwiseBinary / ElementwiseTrinary — fused elementwise ops.
    • Permutation — pure index permutation (transpose generalization).
    • BlockSparseContraction / TrinaryContraction — specialized variants.
  • Plan-cache I/O: serialize / deserialize plan caches across runs.

Stack-size note

cuTENSOR's planner can blow a 1 MiB Windows stack during cutensorCreatePlan. If you hit that, run plan creation on a thread with a larger stack:

let result = std::thread::Builder::new()
    .stack_size(32 * 1024 * 1024)
    .spawn(|| /* plan creation here */)
    .unwrap()
    .join();

(This is what the workspace's cutensor_matmul example does.)

Pairs with baracuda-cutensor-sys for the raw FFI surface.

Part of the baracuda workspace.

License

Dual MIT / Apache-2.0.