Crate baracuda_cutensor

Expand description

Safe Rust wrappers for NVIDIA cuTENSOR (v2 API).

cuTENSOR is NVIDIA’s high-performance tensor-primitive library — einsum-style contractions, element-wise ops, reductions, and permutations. This crate wraps the full v2 host API surface.

§Concepts

Handle — per-process library handle; owns the plan cache.
TensorDescriptor — shape + strides + dtype for one tensor.
OperationDescriptor — an un-compiled op (contraction, reduction, elementwise binary/trinary, permutation). Created via Contraction::new, Reduction::new, ElementwiseBinary::new, ElementwiseTrinary::new, or Permutation::new.
PlanPreference — algorithm selection + JIT mode.
Plan — compiled op, bound to a workspace size.
Plan::contract / Plan::reduce / etc. — execute the plan.

§Example — `D = α · A ⊗ B + β · C` (matmul via contraction)

Einstein notation: D[m,n] = A[m,k] · B[k,n]. Mode IDs identify the shared k index — pick any distinct integers per mode.

use baracuda_cutensor::*;

let handle = Handle::new()?;
let m = 64i64; let n = 64i64; let k = 32i64;
let a = TensorDescriptor::new(&handle, &[m, k], None, DataType::F32, 128)?;
let b = TensorDescriptor::new(&handle, &[k, n], None, DataType::F32, 128)?;
let c = TensorDescriptor::new(&handle, &[m, n], None, DataType::F32, 128)?;
let modes_a = &[0i32, 2]; // [m, k]
let modes_b = &[2, 1];     // [k, n]
let modes_c = &[0, 1];     // [m, n]
let op = unsafe {
    Contraction::new(&handle, &a, modes_a, &b, modes_b, &c, modes_c, &c, modes_c,
        core::ptr::null())
}?;
let pref = PlanPreference::default_for(&handle)?;
let ws = op.estimate_workspace(&pref, WorkspaceKind::Default)?;
let plan = Plan::new(&op, &pref, ws)?;

§Example — reduce along an axis (sum over `k`)

D[m] = Σ_k A[m, k]. Modes present in A but absent from D are reduced with the chosen BinaryOp (Add for sum).

use baracuda_cutensor::*;

let handle = Handle::new()?;
let m = 128i64; let k = 64i64;
let a = TensorDescriptor::new(&handle, &[m, k], None, DataType::F32, 128)?;
let d = TensorDescriptor::new(&handle, &[m],    None, DataType::F32, 128)?;

let modes_a = &[0i32, 1]; // [m, k]
let modes_d = &[0i32];     // [m]
let op = unsafe {
    Reduction::new(&handle, &a, modes_a, &d, modes_d, &d, modes_d,
        BinaryOp::Add, core::ptr::null())
}?;
let pref = PlanPreference::default_for(&handle)?;
let ws = op.estimate_workspace(&pref, WorkspaceKind::Default)?;
let _plan = Plan::new(&op, &pref, ws)?;

§Example — element-wise `D = A + C` via `ElementwiseBinary`

Same modes on every operand, no contraction or reduction — just a fused per-element op with optional unary pre-ops on each input.

use baracuda_cutensor::*;

let handle = Handle::new()?;
let n = 1024i64;
let a = TensorDescriptor::new(&handle, &[n], None, DataType::F32, 128)?;
let c = TensorDescriptor::new(&handle, &[n], None, DataType::F32, 128)?;
let d = TensorDescriptor::new(&handle, &[n], None, DataType::F32, 128)?;

let modes = &[0i32];
let op = unsafe {
    ElementwiseBinary::new(
        &handle,
        &a, modes, UnaryOp::Identity,
        &c, modes, UnaryOp::Identity,
        &d, modes,
        BinaryOp::Add,
        core::ptr::null(),
    )
}?;
let pref = PlanPreference::default_for(&handle)?;
let _plan = Plan::new(&op, &pref, /* workspace */ 0)?;

Structs§

BlockSparseContraction: Block-sparse contraction: the A operand is block-sparse, B/C/D dense.
BlockSparseTensorDescriptor: A block-sparse tensor descriptor (cuTENSOR 2.x). Used on the A operand of a BlockSparseContraction.
ComputeDescriptor: A custom [compute descriptor]. Prefer the pre-defined ones (Handle::compute_desc_32f, …) unless you need attribute customization.
Contraction: A contraction op: D[mD] = α * op_a(A[mA]) * op_b(B[mB]) + β * op_c(C[mC]).
ElementwiseBinary: Elementwise binary op: D[mD] = (α * op_a(A[mA])) op_ac (γ * op_c(C[mC])).
ElementwiseTrinary: Elementwise trinary op: D[mD] = ((α * op_a(A) op_ab β * op_b(B)) op_abc γ * op_c(C)).
Handle: cuTENSOR library handle.
OperationDescriptor: An un-compiled operation descriptor. Users typically create these through constructors on Contraction, Reduction, ElementwiseBinary, ElementwiseTrinary, or Permutation.
Permutation: Tensor permutation (axis shuffle + optional unary op): B[mB] = α * op_a(A[mA]).
Plan: A compiled operation plan. Dispatch to the matching execute method based on the op kind that built it.
PlanPreference: Plan preferences — algorithm selection + JIT mode.
Reduction: A reduction op: D[mD] = reduce(A[mA]) with user-chosen reduce op.
TensorDescriptor: A tensor descriptor: modes + extents + dtype + stride layout.
TrinaryContraction: A ternary contraction op: E[mE] = α·op_a(A)·op_b(B)·op_c(C) + β·op_d(D).

Enums§

BinaryOp: Binary combining operator (used between operands in elementwise / reduction ops).
DataType: Element dtype for tensor descriptors.
UnaryOp: Per-operand unary operator (applied to A/B/C before the main op).
WorkspaceKind: Workspace-size preference tier.

Functions§

cudart_version: cuTENSOR’s view of the CUDART version it was built against.
force_disable_logging: Force-disable all cuTENSOR logging (tightest possible quiet).
open_log_file: Open a log file path for cuTENSOR output.
probe: Verify cuTENSOR is loadable on this host.
set_log_level: Set the cuTENSOR logger verbosity (0 = off, 1 = error, 2 = trace).
set_log_mask: Bitmask of log categories (API calls, hints, traces, …). Full value list in cuTENSOR headers.
version: Encoded integer version from cutensorGetVersion. Decode as major = v / 10000, minor = (v / 100) % 100, patch = v % 100.

Type Aliases§

Error: Error type for cuTENSOR operations.
Result: Result alias.