Crate baracuda_cudnn

Expand description

Safe Rust wrappers for NVIDIA cuDNN.

Layered on top of baracuda-cudnn-sys. Use this crate directly for typed, RAII-managed cuDNN handles + descriptors; reach for -sys only when adding a function the safe layer doesn’t expose yet.

§Scope

Covers the cuDNN classic API surface that baracuda-kernels’s Phase 7+ Conv2d / Pool2d / CTCLoss / BatchNorm / GroupNorm plans and the Phase 11 Conv1d/3d/Transpose/depthwise + Pool1d/3d/Adaptive fanout dispatch through. Concretely:

Handle management + stream binding.
Tensor / filter / convolution / pooling / activation / batch-norm / RNN / dropout / op-tensor / reduce-tensor / LRN / SpatialTransform / Attn descriptors.
Conv2d / Conv1d / Conv3d (FW + BW data + BW weight) with all algo enums.
Pool2d / Pool1d / Pool3d (Avg + Max, deterministic + non-det).
BatchNorm FW training/inference + BW + persistent mode.
LRN, Softmax (classic — modern softmax is bespoke in baracuda-kernels).
CTC loss FW + BW (the cuDNN path; bespoke baracuda-kernels::CtcLossPlan covers the non-cuDNN path).
Op-tensor + reduce-tensor (gluing primitives for fused ops).
RNN classic API (cells, sequences, persistent).
DropoutDescriptor + state management.

The cuDNN backend / graph API (the modern fusion API) is NOT wrapped here — baracuda-kernels builds bespoke fused kernels directly via baracuda-kernels-sys for the ops where graph-API fusion would be the win, so the maintenance cost of wrapping the graph API duplicate hasn’t been justified yet.

§Build requirement

cuDNN is a separate NVIDIA download not bundled with the stock CUDA toolkit. The baracuda-kernels-sys build script auto-discovers it via CUDNN_PATH / CUDNN_ROOT / CUDNN_HOME env vars or the standard Windows / Linux install paths — see the workspace README.md “Building” section for the full probe order.

Structs§

ActivationDescriptor: An activation descriptor.
AttnDescriptor: Multi-head attention descriptor.
BackendDescriptor: Thin wrapper over a cudnnBackendDescriptor_t. Used to build Graph-API operation graphs and execution plans. Callers set attributes with BackendDescriptor::set_attribute_raw using the constants in baracuda_cudnn_sys::cudnnBackendAttributeName_t / baracuda_cudnn_sys::cudnnBackendAttributeType_t.
BwdDataAlgoPerf: Per-algorithm performance record returned by the backward-data convolution finders. Algorithm-finder performance row. Mirrors cudnnConvolutionBwdDataAlgoPerf_t.
BwdFilterAlgoPerf: Per-algorithm performance record returned by the backward-filter convolution finders. Algorithm-finder performance row. Mirrors cudnnConvolutionBwdFilterAlgoPerf_t.
ConvolutionDescriptor: Convolution descriptor: padding, stride, dilation, and compute dtype.
CtcLossDescriptor: CTC (Connectionist Temporal Classification) loss descriptor.
DropoutDescriptor: A dropout descriptor: dropout probability + RNG state buffer.
FilterDescriptor: N × C × H × W 4-D filter.
FwdAlgoPerf: Per-algorithm performance record returned by the forward-convolution finders. Result row from cudnnFindConvolutionForwardAlgorithm / cudnnGetConvolutionForwardAlgorithm_v7.
Handle: cuDNN context handle.
LrnDescriptor: Local Response Normalization descriptor: window size + α / β / k coefficients.
OpTensorDescriptor: An op-tensor descriptor: binary element-wise op + compute dtype.
PoolingDescriptor: A pooling descriptor: pooling mode, window extent, padding, and stride.
ReduceTensorDescriptor: A reduce-tensor descriptor: reduction op + compute dtype.
RnnDataDescriptor: Owned RNN-data descriptor used by the v8 RNN forward / backward path.
RnnDescriptor: Owned RNN descriptor.
SeqDataDescriptor: Sequence-data descriptor used by multi-head attention.
SpatialTransformerDescriptor: Spatial-transformer descriptor: sampler kind + output shape.
TensorDescriptor: A 4-D tensor descriptor.

Enums§

ActivationMode: Activation function kind.
BackendAttrName: Re-export the backend attribute enums so callers don’t have to reach into the sys crate. Enum mirroring cudnnBackendAttributeName_t.
BackendAttrType: Re-export the backend attribute enums so callers don’t have to reach into the sys crate. Enum mirroring cudnnBackendAttributeType_t.
BackendDescType: Re-export the backend attribute enums so callers don’t have to reach into the sys crate. Enum mirroring cudnnBackendDescriptorType_t.
BatchNormMode: Batch-normalization parameter sharing pattern.
BnOp: Optional fused op for the *Ex BatchNorm variants.
BwdDataAlgo: Backward-data convolution algorithm selector.
BwdFilterAlgo: Backward-filter convolution algorithm selector.
ConvMode: Convolution mathematical mode.
DType: Element dtype for a tensor.
FwdAlgo: Forward-convolution algorithm selector. Gemm is the most broadly supported; ImplicitPrecompGemm / Winograd are faster where applicable.
MathType: Math-type selector for ConvolutionDescriptor::set_math_type — controls tensor-core eligibility.
NormAlgo: Generic-normalization kernel selector.
NormMode: Generic-normalization parameter sharing pattern (cuDNN 8+).
NormOp: Optional fused op for the generic-normalization API.
OpTensorOp: Element-wise op for OpTensorDescriptor / op_tensor.
PoolingMode: Pooling reduction kind.
RawMathType: Re-exports for callers that want raw type access. Math type for a convolution descriptor — controls tensor-core usage.
RawReorderType: Re-exports for callers that want raw type access. Filter / bias reorder selector for INT8 quantized inference.
ReduceOp: Reduction op for ReduceTensorDescriptor / reduce_tensor.
ReorderType: Filter / bias reorder selector for INT8 quantized inference paths.
SoftmaxAlgo: Numerical softmax algorithm.
SoftmaxMode: Axis the softmax normalizes over.
TensorFormat: Memory layout for a 4-D tensor.

Traits§

CudnnDataType: Trait mapping Rust element types to their cuDNN DType tag.

Functions§

activation_backward: dx = alpha * activation_backward(y, dy, x) + beta * dx.
activation_forward: Compute y = alpha * activation(x) + beta * y element-wise.
add_tensor: C = alpha * A + beta * C with broadcast. Useful for adding a per-channel bias to a feature map.
batch_normalization_backward: BN backward — matched with batch_normalization_forward_training.
batch_normalization_backward_ex: BN backward matching batch_normalization_forward_training_ex.
batch_normalization_backward_ex_workspace_size: Workspace bytes for batch_normalization_backward_ex.
batch_normalization_forward_inference: Inference-time BN forward: uses pre-computed running statistics (no state update). Use after model training is complete.
batch_normalization_forward_training: Training-time BN forward: updates running statistics and returns saved mean / inv_variance for use by batch_normalization_backward.
batch_normalization_forward_training_ex: BN training forward with optional fused activation / residual add.
batch_normalization_forward_training_ex_workspace_size: Workspace bytes for batch_normalization_forward_training_ex.
batch_normalization_training_ex_reserve_space_size: Reserve-space bytes for the *Ex BatchNorm pair.
build_rnn_dynamic: Finalize an RNN descriptor for a specific minibatch size.
convolution_backward_bias: Add the bias gradient: sum over spatial dims of dY.
convolution_backward_data: dX = alpha * conv_bwd_data(W, dY) + beta * dX.
convolution_backward_data_workspace_size: Workspace bytes required to run convolution_backward_data with the given algo and descriptors.
convolution_backward_filter: dW = alpha * conv_bwd_filter(X, dY) + beta * dW.
convolution_backward_filter_workspace_size: Workspace bytes required to run convolution_backward_filter with the given algo and descriptors.
convolution_bias_activation_forward: Fused convolution + bias + activation forward: Y = activation(alpha1 * conv(X, W) + alpha2 * Z + bias). Z may alias Y for in-place residual add.
convolution_forward: Y = alpha * conv(X, W) + beta * Y (forward pass).
convolution_forward_workspace_size: Query the minimum workspace (bytes) required to run algo with the given tensor / filter / conv descriptors.
ctc_loss: CTC (Connectionist Temporal Classification) loss.
ctc_loss_workspace_size: Bytes of scratch workspace needed for ctc_loss.
dropout_backward: Backward dropout: replays the mask saved in reserve to produce dx from dy. reserve must be the exact buffer populated by the matching dropout_forward call.
dropout_forward: Apply dropout to x, writing scaled survivors to y and the keep/drop mask into reserve for the matching backward call.
dropout_reserve_size: Size in bytes of the reserve buffer required for dropout on x.
dropout_states_size: Size in bytes of the state buffer required for a dropout RNG.
find_convolution_forward_algorithm: Run all candidate forward-convolution algorithms and return measured runtimes.
get_convolution_backward_data_algorithm: Heuristic-pick backward-data convolution algorithms.
get_convolution_backward_filter_algorithm: Heuristic-pick backward-filter convolution algorithms.
get_convolution_forward_algorithm: Heuristic-pick the top-N forward-convolution algorithms (cheap; doesn’t run them).
get_multi_head_attn_weights^⚠: Look up the descriptor of one of the (Q/K/V/O) weight matrices inside the packed weights buffer.
lrn_cross_channel_backward: Cross-channel LRN backward.
multi_head_attn_backward_data^⚠: Multi-head attention backward — data path (gradients w.r.t. Q/K/V).
multi_head_attn_backward_weights^⚠: Multi-head attention backward — weights path (gradient w.r.t. Q/K/V/O projection weights). Pass add_grad = true to accumulate into dweights (typical for multi-step training).
multi_head_attn_buffers: Buffer requirements (weights, work_space, reserve_space).
multi_head_attn_forward^⚠: Forward multi-head attention. The huge parameter list mirrors cuDNN’s cudnnMultiHeadAttnForward exactly; see the cuDNN reference for the meaning of each window / sequence-length array.
normalization_backward: Backward generic normalization.
normalization_backward_workspace_size: Workspace bytes for normalization_backward.
normalization_forward_inference: Inference-time generic normalization.
normalization_forward_training: Training-time forward generic normalization.
normalization_forward_training_workspace_size: Workspace bytes for normalization_forward_training.
normalization_training_reserve_space_size: Reserve-space bytes for the training fwd/bwd pair.
op_tensor: C = alpha1 * op(A) + alpha2 * op(B) + beta * C element-wise.
pooling_backward: dX = alpha * pool_backward(Y, dY, X) + beta * dX.
pooling_forward: Y = alpha * pool(X) + beta * Y (forward pass).
reduce_tensor: C = alpha * reduce(A) + beta * C over the axes where A’s extent is preserved and C’s is 1.
reduction_indices_size: Bytes of indices buffer required for index-returning reductions.
reorder_filter_and_bias^⚠: Pre-process filter / bias buffers for INT8 inference.
rnn_backward_data_v8^⚠: RNN backward — data path (gradients w.r.t. inputs and initial states).
rnn_backward_weights_v8^⚠: RNN backward — weights path (gradients w.r.t. the weight space). add_grad = true accumulates into dweight_space (typical for multi-step training); false overwrites.
rnn_forward^⚠: Forward pass of an RNN built via RnnDescriptor::set_v8 / build_rnn_dynamic. Pass fwd_mode = 0 for inference (no reserve space writes), 1 for training.
rnn_temp_space_sizes: Returns (work_space_size, reserve_space_size). fwd_mode = 0 for inference, 1 for training.
rnn_weight_space_size: Bytes the RNN’s weight space needs.
softmax_backward: dX = alpha * softmax_backward(Y, dY) + beta * dX.
softmax_forward: Y = alpha * softmax(X, algo, mode) + beta * Y.
spatial_tf_grid_generator: Compute the sampling grid from the affine transform theta.
spatial_tf_sampler: Bilinearly sample x at grid points to produce y.
version: cuDNN library version as a packed integer (e.g. 9106 for 9.1.6).

Type Aliases§

Error: Error type for cuDNN operations.
Result: Result alias.