Expand description
Safe Rust wrappers for NVIDIA cuDNN.
Layered on top of baracuda-cudnn-sys.
Use this crate directly for typed, RAII-managed cuDNN handles +
descriptors; reach for -sys only when adding a function the safe
layer doesn’t expose yet.
§Scope
Covers the cuDNN classic API surface that baracuda-kernels’s
Phase 7+ Conv2d / Pool2d / CTCLoss / BatchNorm / GroupNorm plans
and the Phase 11 Conv1d/3d/Transpose/depthwise + Pool1d/3d/Adaptive
fanout dispatch through. Concretely:
- Handle management + stream binding.
- Tensor / filter / convolution / pooling / activation / batch-norm / RNN / dropout / op-tensor / reduce-tensor / LRN / SpatialTransform / Attn descriptors.
- Conv2d / Conv1d / Conv3d (FW + BW data + BW weight) with all algo enums.
- Pool2d / Pool1d / Pool3d (Avg + Max, deterministic + non-det).
- BatchNorm FW training/inference + BW + persistent mode.
- LRN, Softmax (classic — modern softmax is bespoke in
baracuda-kernels). - CTC loss FW + BW (the cuDNN path; bespoke
baracuda-kernels::CtcLossPlancovers the non-cuDNN path). - Op-tensor + reduce-tensor (gluing primitives for fused ops).
- RNN classic API (cells, sequences, persistent).
- DropoutDescriptor + state management.
The cuDNN backend / graph API (the modern fusion API) is NOT
wrapped here — baracuda-kernels builds bespoke fused kernels
directly via baracuda-kernels-sys for the ops where graph-API
fusion would be the win, so the maintenance cost of wrapping the
graph API duplicate hasn’t been justified yet.
§Build requirement
cuDNN is a separate NVIDIA download not bundled with the stock
CUDA toolkit. The baracuda-kernels-sys build script auto-discovers
it via CUDNN_PATH / CUDNN_ROOT / CUDNN_HOME env vars or the
standard Windows / Linux install paths — see the workspace
README.md
“Building” section for the full probe order.
Structs§
- Activation
Descriptor - An activation descriptor.
- Attn
Descriptor - Multi-head attention descriptor.
- Backend
Descriptor - Thin wrapper over a
cudnnBackendDescriptor_t. Used to build Graph-API operation graphs and execution plans. Callers set attributes withBackendDescriptor::set_attribute_rawusing the constants inbaracuda_cudnn_sys::cudnnBackendAttributeName_t/baracuda_cudnn_sys::cudnnBackendAttributeType_t. - BwdData
Algo Perf - Per-algorithm performance record returned by the backward-data convolution finders.
Algorithm-finder performance row. Mirrors
cudnnConvolutionBwdDataAlgoPerf_t. - BwdFilter
Algo Perf - Per-algorithm performance record returned by the backward-filter convolution finders.
Algorithm-finder performance row. Mirrors
cudnnConvolutionBwdFilterAlgoPerf_t. - Convolution
Descriptor - Convolution descriptor: padding, stride, dilation, and compute dtype.
- CtcLoss
Descriptor - CTC (Connectionist Temporal Classification) loss descriptor.
- Dropout
Descriptor - A dropout descriptor: dropout probability + RNG state buffer.
- Filter
Descriptor N × C × H × W4-D filter.- FwdAlgo
Perf - Per-algorithm performance record returned by the forward-convolution finders.
Result row from
cudnnFindConvolutionForwardAlgorithm/cudnnGetConvolutionForwardAlgorithm_v7. - Handle
- cuDNN context handle.
- LrnDescriptor
- Local Response Normalization descriptor: window size + α / β / k coefficients.
- OpTensor
Descriptor - An op-tensor descriptor: binary element-wise op + compute dtype.
- Pooling
Descriptor - A pooling descriptor: pooling mode, window extent, padding, and stride.
- Reduce
Tensor Descriptor - A reduce-tensor descriptor: reduction op + compute dtype.
- RnnData
Descriptor - Owned RNN-data descriptor used by the v8 RNN forward / backward path.
- RnnDescriptor
- Owned RNN descriptor.
- SeqData
Descriptor - Sequence-data descriptor used by multi-head attention.
- Spatial
Transformer Descriptor - Spatial-transformer descriptor: sampler kind + output shape.
- Tensor
Descriptor - A 4-D tensor descriptor.
Enums§
- Activation
Mode - Activation function kind.
- Backend
Attr Name - Re-export the backend attribute enums so callers don’t have to reach
into the sys crate.
Enum mirroring
cudnnBackendAttributeName_t. - Backend
Attr Type - Re-export the backend attribute enums so callers don’t have to reach
into the sys crate.
Enum mirroring
cudnnBackendAttributeType_t. - Backend
Desc Type - Re-export the backend attribute enums so callers don’t have to reach
into the sys crate.
Enum mirroring
cudnnBackendDescriptorType_t. - Batch
Norm Mode - Batch-normalization parameter sharing pattern.
- BnOp
- Optional fused op for the
*ExBatchNorm variants. - BwdData
Algo - Backward-data convolution algorithm selector.
- BwdFilter
Algo - Backward-filter convolution algorithm selector.
- Conv
Mode - Convolution mathematical mode.
- DType
- Element dtype for a tensor.
- FwdAlgo
- Forward-convolution algorithm selector.
Gemmis the most broadly supported;ImplicitPrecompGemm/Winogradare faster where applicable. - Math
Type - Math-type selector for
ConvolutionDescriptor::set_math_type— controls tensor-core eligibility. - Norm
Algo - Generic-normalization kernel selector.
- Norm
Mode - Generic-normalization parameter sharing pattern (cuDNN 8+).
- NormOp
- Optional fused op for the generic-normalization API.
- OpTensor
Op - Element-wise op for
OpTensorDescriptor/op_tensor. - Pooling
Mode - Pooling reduction kind.
- RawMath
Type - Re-exports for callers that want raw type access. Math type for a convolution descriptor — controls tensor-core usage.
- RawReorder
Type - Re-exports for callers that want raw type access. Filter / bias reorder selector for INT8 quantized inference.
- Reduce
Op - Reduction op for
ReduceTensorDescriptor/reduce_tensor. - Reorder
Type - Filter / bias reorder selector for INT8 quantized inference paths.
- Softmax
Algo - Numerical softmax algorithm.
- Softmax
Mode - Axis the softmax normalizes over.
- Tensor
Format - Memory layout for a 4-D tensor.
Traits§
- Cudnn
Data Type - Trait mapping Rust element types to their cuDNN
DTypetag.
Functions§
- activation_
backward dx = alpha * activation_backward(y, dy, x) + beta * dx.- activation_
forward - Compute
y = alpha * activation(x) + beta * yelement-wise. - add_
tensor C = alpha * A + beta * Cwith broadcast. Useful for adding a per-channel bias to a feature map.- batch_
normalization_ backward - BN backward — matched with
batch_normalization_forward_training. - batch_
normalization_ backward_ ex - BN backward matching
batch_normalization_forward_training_ex. - batch_
normalization_ backward_ ex_ workspace_ size - Workspace bytes for
batch_normalization_backward_ex. - batch_
normalization_ forward_ inference - Inference-time BN forward: uses pre-computed running statistics (no state update). Use after model training is complete.
- batch_
normalization_ forward_ training - Training-time BN forward: updates running statistics and returns saved
mean/inv_variancefor use bybatch_normalization_backward. - batch_
normalization_ forward_ training_ ex - BN training forward with optional fused activation / residual add.
- batch_
normalization_ forward_ training_ ex_ workspace_ size - Workspace bytes for
batch_normalization_forward_training_ex. - batch_
normalization_ training_ ex_ reserve_ space_ size - Reserve-space bytes for the
*ExBatchNorm pair. - build_
rnn_ dynamic - Finalize an RNN descriptor for a specific minibatch size.
- convolution_
backward_ bias - Add the bias gradient: sum over spatial dims of
dY. - convolution_
backward_ data dX = alpha * conv_bwd_data(W, dY) + beta * dX.- convolution_
backward_ data_ workspace_ size - Workspace bytes required to run
convolution_backward_datawith the givenalgoand descriptors. - convolution_
backward_ filter dW = alpha * conv_bwd_filter(X, dY) + beta * dW.- convolution_
backward_ filter_ workspace_ size - Workspace bytes required to run
convolution_backward_filterwith the givenalgoand descriptors. - convolution_
bias_ activation_ forward - Fused convolution + bias + activation forward:
Y = activation(alpha1 * conv(X, W) + alpha2 * Z + bias).Zmay aliasYfor in-place residual add. - convolution_
forward Y = alpha * conv(X, W) + beta * Y(forward pass).- convolution_
forward_ workspace_ size - Query the minimum workspace (bytes) required to run
algowith the given tensor / filter / conv descriptors. - ctc_
loss - CTC (Connectionist Temporal Classification) loss.
- ctc_
loss_ workspace_ size - Bytes of scratch workspace needed for
ctc_loss. - dropout_
backward - Backward dropout: replays the mask saved in
reserveto producedxfromdy.reservemust be the exact buffer populated by the matchingdropout_forwardcall. - dropout_
forward - Apply dropout to
x, writing scaled survivors toyand the keep/drop mask intoreservefor the matching backward call. - dropout_
reserve_ size - Size in bytes of the reserve buffer required for dropout on
x. - dropout_
states_ size - Size in bytes of the state buffer required for a dropout RNG.
- find_
convolution_ forward_ algorithm - Run all candidate forward-convolution algorithms and return measured runtimes.
- get_
convolution_ backward_ data_ algorithm - Heuristic-pick backward-data convolution algorithms.
- get_
convolution_ backward_ filter_ algorithm - Heuristic-pick backward-filter convolution algorithms.
- get_
convolution_ forward_ algorithm - Heuristic-pick the top-N forward-convolution algorithms (cheap; doesn’t run them).
- get_
multi_ ⚠head_ attn_ weights - Look up the descriptor of one of the (Q/K/V/O) weight matrices inside the packed weights buffer.
- lrn_
cross_ channel_ backward - Cross-channel LRN backward.
- multi_
head_ ⚠attn_ backward_ data - Multi-head attention backward — data path (gradients w.r.t. Q/K/V).
- multi_
head_ ⚠attn_ backward_ weights - Multi-head attention backward — weights path (gradient w.r.t. Q/K/V/O
projection weights). Pass
add_grad = trueto accumulate intodweights(typical for multi-step training). - multi_
head_ attn_ buffers - Buffer requirements
(weights, work_space, reserve_space). - multi_
head_ ⚠attn_ forward - Forward multi-head attention. The huge parameter list mirrors cuDNN’s
cudnnMultiHeadAttnForwardexactly; see the cuDNN reference for the meaning of each window / sequence-length array. - normalization_
backward - Backward generic normalization.
- normalization_
backward_ workspace_ size - Workspace bytes for normalization_backward.
- normalization_
forward_ inference - Inference-time generic normalization.
- normalization_
forward_ training - Training-time forward generic normalization.
- normalization_
forward_ training_ workspace_ size - Workspace bytes for normalization_forward_training.
- normalization_
training_ reserve_ space_ size - Reserve-space bytes for the training fwd/bwd pair.
- op_
tensor C = alpha1 * op(A) + alpha2 * op(B) + beta * Celement-wise.- pooling_
backward dX = alpha * pool_backward(Y, dY, X) + beta * dX.- pooling_
forward Y = alpha * pool(X) + beta * Y(forward pass).- reduce_
tensor C = alpha * reduce(A) + beta * Cover the axes whereA’s extent is preserved andC’s is 1.- reduction_
indices_ size - Bytes of indices buffer required for index-returning reductions.
- reorder_
filter_ ⚠and_ bias - Pre-process filter / bias buffers for INT8 inference.
- rnn_
backward_ ⚠data_ v8 - RNN backward — data path (gradients w.r.t. inputs and initial states).
- rnn_
backward_ ⚠weights_ v8 - RNN backward — weights path (gradients w.r.t. the weight space).
add_grad = trueaccumulates intodweight_space(typical for multi-step training);falseoverwrites. - rnn_
forward ⚠ - Forward pass of an RNN built via
RnnDescriptor::set_v8/build_rnn_dynamic. Passfwd_mode = 0for inference (no reserve space writes),1for training. - rnn_
temp_ space_ sizes - Returns
(work_space_size, reserve_space_size).fwd_mode = 0for inference,1for training. - rnn_
weight_ space_ size - Bytes the RNN’s weight space needs.
- softmax_
backward dX = alpha * softmax_backward(Y, dY) + beta * dX.- softmax_
forward Y = alpha * softmax(X, algo, mode) + beta * Y.- spatial_
tf_ grid_ generator - Compute the sampling grid from the affine transform
theta. - spatial_
tf_ sampler - Bilinearly sample
xatgridpoints to producey. - version
- cuDNN library version as a packed integer (e.g.
9106for 9.1.6).