Expand description
Safe Rust wrappers for NVIDIA cuTENSOR (v2 API).
cuTENSOR is NVIDIA’s high-performance tensor-primitive library — einsum-style contractions, element-wise ops, reductions, and permutations. This crate wraps the full v2 host API surface.
§Concepts
Handle— per-process library handle; owns the plan cache.TensorDescriptor— shape + strides + dtype for one tensor.OperationDescriptor— an un-compiled op (contraction, reduction, elementwise binary/trinary, permutation). Created viaContraction::new,Reduction::new,ElementwiseBinary::new,ElementwiseTrinary::new, orPermutation::new.PlanPreference— algorithm selection + JIT mode.Plan— compiled op, bound to a workspace size.Plan::contract/Plan::reduce/ etc. — execute the plan.
§Example — D = α · A ⊗ B + β · C (matmul via contraction)
Einstein notation: D[m,n] = A[m,k] · B[k,n]. Mode IDs identify the
shared k index — pick any distinct integers per mode.
use baracuda_cutensor::*;
let handle = Handle::new()?;
let m = 64i64; let n = 64i64; let k = 32i64;
let a = TensorDescriptor::new(&handle, &[m, k], None, DataType::F32, 128)?;
let b = TensorDescriptor::new(&handle, &[k, n], None, DataType::F32, 128)?;
let c = TensorDescriptor::new(&handle, &[m, n], None, DataType::F32, 128)?;
let modes_a = &[0i32, 2]; // [m, k]
let modes_b = &[2, 1]; // [k, n]
let modes_c = &[0, 1]; // [m, n]
let op = unsafe {
Contraction::new(&handle, &a, modes_a, &b, modes_b, &c, modes_c, &c, modes_c,
core::ptr::null())
}?;
let pref = PlanPreference::default_for(&handle)?;
let ws = op.estimate_workspace(&pref, WorkspaceKind::Default)?;
let plan = Plan::new(&op, &pref, ws)?;§Example — reduce along an axis (sum over k)
D[m] = Σ_k A[m, k]. Modes present in A but absent from D are
reduced with the chosen BinaryOp (Add for sum).
use baracuda_cutensor::*;
let handle = Handle::new()?;
let m = 128i64; let k = 64i64;
let a = TensorDescriptor::new(&handle, &[m, k], None, DataType::F32, 128)?;
let d = TensorDescriptor::new(&handle, &[m], None, DataType::F32, 128)?;
let modes_a = &[0i32, 1]; // [m, k]
let modes_d = &[0i32]; // [m]
let op = unsafe {
Reduction::new(&handle, &a, modes_a, &d, modes_d, &d, modes_d,
BinaryOp::Add, core::ptr::null())
}?;
let pref = PlanPreference::default_for(&handle)?;
let ws = op.estimate_workspace(&pref, WorkspaceKind::Default)?;
let _plan = Plan::new(&op, &pref, ws)?;§Example — element-wise D = A + C via ElementwiseBinary
Same modes on every operand, no contraction or reduction — just a fused per-element op with optional unary pre-ops on each input.
use baracuda_cutensor::*;
let handle = Handle::new()?;
let n = 1024i64;
let a = TensorDescriptor::new(&handle, &[n], None, DataType::F32, 128)?;
let c = TensorDescriptor::new(&handle, &[n], None, DataType::F32, 128)?;
let d = TensorDescriptor::new(&handle, &[n], None, DataType::F32, 128)?;
let modes = &[0i32];
let op = unsafe {
ElementwiseBinary::new(
&handle,
&a, modes, UnaryOp::Identity,
&c, modes, UnaryOp::Identity,
&d, modes,
BinaryOp::Add,
core::ptr::null(),
)
}?;
let pref = PlanPreference::default_for(&handle)?;
let _plan = Plan::new(&op, &pref, /* workspace */ 0)?;Structs§
- Block
Sparse Contraction - Block-sparse contraction: the A operand is block-sparse, B/C/D dense.
- Block
Sparse Tensor Descriptor - A block-sparse tensor descriptor (cuTENSOR 2.x). Used on the A
operand of a
BlockSparseContraction. - Compute
Descriptor - A custom [compute descriptor]. Prefer the pre-defined ones
(
Handle::compute_desc_32f, …) unless you need attribute customization. - Contraction
- A contraction op:
D[mD] = α * op_a(A[mA]) * op_b(B[mB]) + β * op_c(C[mC]). - Elementwise
Binary - Elementwise binary op:
D[mD] = (α * op_a(A[mA])) op_ac (γ * op_c(C[mC])). - Elementwise
Trinary - Elementwise trinary op:
D[mD] = ((α * op_a(A) op_ab β * op_b(B)) op_abc γ * op_c(C)). - Handle
- cuTENSOR library handle.
- Operation
Descriptor - An un-compiled operation descriptor. Users typically create these
through constructors on
Contraction,Reduction,ElementwiseBinary,ElementwiseTrinary, orPermutation. - Permutation
- Tensor permutation (axis shuffle + optional unary op):
B[mB] = α * op_a(A[mA]). - Plan
- A compiled operation plan. Dispatch to the matching
executemethod based on the op kind that built it. - Plan
Preference - Plan preferences — algorithm selection + JIT mode.
- Reduction
- A reduction op:
D[mD] = reduce(A[mA])with user-chosen reduce op. - Tensor
Descriptor - A tensor descriptor: modes + extents + dtype + stride layout.
- Trinary
Contraction - A ternary contraction op:
E[mE] = α·op_a(A)·op_b(B)·op_c(C) + β·op_d(D).
Enums§
- Binary
Op - Binary combining operator (used between operands in elementwise / reduction ops).
- Data
Type - Element dtype for tensor descriptors.
- UnaryOp
- Per-operand unary operator (applied to A/B/C before the main op).
- Workspace
Kind - Workspace-size preference tier.
Functions§
- cudart_
version - cuTENSOR’s view of the CUDART version it was built against.
- force_
disable_ logging - Force-disable all cuTENSOR logging (tightest possible quiet).
- open_
log_ file - Open a log file path for cuTENSOR output.
- probe
- Verify cuTENSOR is loadable on this host.
- set_
log_ level - Set the cuTENSOR logger verbosity (0 = off, 1 = error, 2 = trace).
- set_
log_ mask - Bitmask of log categories (API calls, hints, traces, …). Full value list in cuTENSOR headers.
- version
- Encoded integer version from
cutensorGetVersion. Decode asmajor = v / 10000, minor = (v / 100) % 100, patch = v % 100.