Safe Rust wrappers for NVIDIA cuTENSOR (v2 API).

cuTENSOR is NVIDIA's high-performance tensor-primitive library — einsum-style contractions, element-wise ops, reductions, and permutations. This crate wraps the full v2 host API surface.

Concepts

[Handle] — per-process library handle; owns the plan cache.
[TensorDescriptor] — shape + strides + dtype for one tensor.
[OperationDescriptor] — an un-compiled op (contraction, reduction, elementwise binary/trinary, permutation). Created via [Contraction::new], [Reduction::new], [ElementwiseBinary::new], [ElementwiseTrinary::new], or [Permutation::new].
[PlanPreference] — algorithm selection + JIT mode.
[Plan] — compiled op, bound to a workspace size.
[Plan::contract] / [Plan::reduce] / etc. — execute the plan.

Example — `D = α * A ⊗ B + β * C` (matmul via contraction)

use baracuda_cutensor::*;
let handle = Handle::new()?;
let m = 64i64; let n = 64i64; let k = 32i64;
let a = TensorDescriptor::new(&handle, &[m, k], None, DataType::F32, 128)?;
let b = TensorDescriptor::new(&handle, &[k, n], None, DataType::F32, 128)?;
let c = TensorDescriptor::new(&handle, &[m, n], None, DataType::F32, 128)?;
let modes_a = &[0i32, 2]; // [m, k]
let modes_b = &[2, 1];     // [k, n]
let modes_c = &[0, 1];     // [m, n]
let op = unsafe {
    Contraction::new(&handle, &a, modes_a, &b, modes_b, &c, modes_c, &c, modes_c,
        core::ptr::null())
}?;
let pref = PlanPreference::default_for(&handle)?;
let ws = op.estimate_workspace(&pref, WorkspaceKind::Default)?;
let plan = Plan::new(&op, &pref, ws)?;
# Result::<(), Error>::Ok(())

baracuda-cutensor 0.0.1-alpha.2

Concepts

Example — D = α * A ⊗ B + β * C (matmul via contraction)

Example — `D = α * A ⊗ B + β * C` (matmul via contraction)