Skip to main content

Crate baracuda_cudnn

Crate baracuda_cudnn 

Source
Expand description

Safe Rust wrappers for NVIDIA cuDNN.

Layered on top of baracuda-cudnn-sys. Use this crate directly for typed, RAII-managed cuDNN handles + descriptors; reach for -sys only when adding a function the safe layer doesn’t expose yet.

§Scope

Covers the cuDNN classic API surface that baracuda-kernels’s Phase 7+ Conv2d / Pool2d / CTCLoss / BatchNorm / GroupNorm plans and the Phase 11 Conv1d/3d/Transpose/depthwise + Pool1d/3d/Adaptive fanout dispatch through. Concretely:

  • Handle management + stream binding.
  • Tensor / filter / convolution / pooling / activation / batch-norm / RNN / dropout / op-tensor / reduce-tensor / LRN / SpatialTransform / Attn descriptors.
  • Conv2d / Conv1d / Conv3d (FW + BW data + BW weight) with all algo enums.
  • Pool2d / Pool1d / Pool3d (Avg + Max, deterministic + non-det).
  • BatchNorm FW training/inference + BW + persistent mode.
  • LRN, Softmax (classic — modern softmax is bespoke in baracuda-kernels).
  • CTC loss FW + BW (the cuDNN path; bespoke baracuda-kernels::CtcLossPlan covers the non-cuDNN path).
  • Op-tensor + reduce-tensor (gluing primitives for fused ops).
  • RNN classic API (cells, sequences, persistent).
  • DropoutDescriptor + state management.

The cuDNN backend / graph API (the modern fusion API) is NOT wrapped here — baracuda-kernels builds bespoke fused kernels directly via baracuda-kernels-sys for the ops where graph-API fusion would be the win, so the maintenance cost of wrapping the graph API duplicate hasn’t been justified yet.

§Build requirement

cuDNN is a separate NVIDIA download not bundled with the stock CUDA toolkit. The baracuda-kernels-sys build script auto-discovers it via CUDNN_PATH / CUDNN_ROOT / CUDNN_HOME env vars or the standard Windows / Linux install paths — see the workspace README.md “Building” section for the full probe order.

Structs§

ActivationDescriptor
An activation descriptor.
AttnDescriptor
Multi-head attention descriptor.
BackendDescriptor
Thin wrapper over a cudnnBackendDescriptor_t. Used to build Graph-API operation graphs and execution plans. Callers set attributes with BackendDescriptor::set_attribute_raw using the constants in baracuda_cudnn_sys::cudnnBackendAttributeName_t / baracuda_cudnn_sys::cudnnBackendAttributeType_t.
BwdDataAlgoPerf
Per-algorithm performance record returned by the backward-data convolution finders. Algorithm-finder performance row. Mirrors cudnnConvolutionBwdDataAlgoPerf_t.
BwdFilterAlgoPerf
Per-algorithm performance record returned by the backward-filter convolution finders. Algorithm-finder performance row. Mirrors cudnnConvolutionBwdFilterAlgoPerf_t.
ConvolutionDescriptor
Convolution descriptor: padding, stride, dilation, and compute dtype.
CtcLossDescriptor
CTC (Connectionist Temporal Classification) loss descriptor.
DropoutDescriptor
A dropout descriptor: dropout probability + RNG state buffer.
FilterDescriptor
N × C × H × W 4-D filter.
FwdAlgoPerf
Per-algorithm performance record returned by the forward-convolution finders. Result row from cudnnFindConvolutionForwardAlgorithm / cudnnGetConvolutionForwardAlgorithm_v7.
Handle
cuDNN context handle.
LrnDescriptor
Local Response Normalization descriptor: window size + α / β / k coefficients.
OpTensorDescriptor
An op-tensor descriptor: binary element-wise op + compute dtype.
PoolingDescriptor
A pooling descriptor: pooling mode, window extent, padding, and stride.
ReduceTensorDescriptor
A reduce-tensor descriptor: reduction op + compute dtype.
RnnDataDescriptor
Owned RNN-data descriptor used by the v8 RNN forward / backward path.
RnnDescriptor
Owned RNN descriptor.
SeqDataDescriptor
Sequence-data descriptor used by multi-head attention.
SpatialTransformerDescriptor
Spatial-transformer descriptor: sampler kind + output shape.
TensorDescriptor
A 4-D tensor descriptor.

Enums§

ActivationMode
Activation function kind.
BackendAttrName
Re-export the backend attribute enums so callers don’t have to reach into the sys crate. Enum mirroring cudnnBackendAttributeName_t.
BackendAttrType
Re-export the backend attribute enums so callers don’t have to reach into the sys crate. Enum mirroring cudnnBackendAttributeType_t.
BackendDescType
Re-export the backend attribute enums so callers don’t have to reach into the sys crate. Enum mirroring cudnnBackendDescriptorType_t.
BatchNormMode
Batch-normalization parameter sharing pattern.
BnOp
Optional fused op for the *Ex BatchNorm variants.
BwdDataAlgo
Backward-data convolution algorithm selector.
BwdFilterAlgo
Backward-filter convolution algorithm selector.
ConvMode
Convolution mathematical mode.
DType
Element dtype for a tensor.
FwdAlgo
Forward-convolution algorithm selector. Gemm is the most broadly supported; ImplicitPrecompGemm / Winograd are faster where applicable.
MathType
Math-type selector for ConvolutionDescriptor::set_math_type — controls tensor-core eligibility.
NormAlgo
Generic-normalization kernel selector.
NormMode
Generic-normalization parameter sharing pattern (cuDNN 8+).
NormOp
Optional fused op for the generic-normalization API.
OpTensorOp
Element-wise op for OpTensorDescriptor / op_tensor.
PoolingMode
Pooling reduction kind.
RawMathType
Re-exports for callers that want raw type access. Math type for a convolution descriptor — controls tensor-core usage.
RawReorderType
Re-exports for callers that want raw type access. Filter / bias reorder selector for INT8 quantized inference.
ReduceOp
Reduction op for ReduceTensorDescriptor / reduce_tensor.
ReorderType
Filter / bias reorder selector for INT8 quantized inference paths.
SoftmaxAlgo
Numerical softmax algorithm.
SoftmaxMode
Axis the softmax normalizes over.
TensorFormat
Memory layout for a 4-D tensor.

Traits§

CudnnDataType
Trait mapping Rust element types to their cuDNN DType tag.

Functions§

activation_backward
dx = alpha * activation_backward(y, dy, x) + beta * dx.
activation_forward
Compute y = alpha * activation(x) + beta * y element-wise.
add_tensor
C = alpha * A + beta * C with broadcast. Useful for adding a per-channel bias to a feature map.
batch_normalization_backward
BN backward — matched with batch_normalization_forward_training.
batch_normalization_backward_ex
BN backward matching batch_normalization_forward_training_ex.
batch_normalization_backward_ex_workspace_size
Workspace bytes for batch_normalization_backward_ex.
batch_normalization_forward_inference
Inference-time BN forward: uses pre-computed running statistics (no state update). Use after model training is complete.
batch_normalization_forward_training
Training-time BN forward: updates running statistics and returns saved mean / inv_variance for use by batch_normalization_backward.
batch_normalization_forward_training_ex
BN training forward with optional fused activation / residual add.
batch_normalization_forward_training_ex_workspace_size
Workspace bytes for batch_normalization_forward_training_ex.
batch_normalization_training_ex_reserve_space_size
Reserve-space bytes for the *Ex BatchNorm pair.
build_rnn_dynamic
Finalize an RNN descriptor for a specific minibatch size.
convolution_backward_bias
Add the bias gradient: sum over spatial dims of dY.
convolution_backward_data
dX = alpha * conv_bwd_data(W, dY) + beta * dX.
convolution_backward_data_workspace_size
Workspace bytes required to run convolution_backward_data with the given algo and descriptors.
convolution_backward_filter
dW = alpha * conv_bwd_filter(X, dY) + beta * dW.
convolution_backward_filter_workspace_size
Workspace bytes required to run convolution_backward_filter with the given algo and descriptors.
convolution_bias_activation_forward
Fused convolution + bias + activation forward: Y = activation(alpha1 * conv(X, W) + alpha2 * Z + bias). Z may alias Y for in-place residual add.
convolution_forward
Y = alpha * conv(X, W) + beta * Y (forward pass).
convolution_forward_workspace_size
Query the minimum workspace (bytes) required to run algo with the given tensor / filter / conv descriptors.
ctc_loss
CTC (Connectionist Temporal Classification) loss.
ctc_loss_workspace_size
Bytes of scratch workspace needed for ctc_loss.
dropout_backward
Backward dropout: replays the mask saved in reserve to produce dx from dy. reserve must be the exact buffer populated by the matching dropout_forward call.
dropout_forward
Apply dropout to x, writing scaled survivors to y and the keep/drop mask into reserve for the matching backward call.
dropout_reserve_size
Size in bytes of the reserve buffer required for dropout on x.
dropout_states_size
Size in bytes of the state buffer required for a dropout RNG.
find_convolution_forward_algorithm
Run all candidate forward-convolution algorithms and return measured runtimes.
get_convolution_backward_data_algorithm
Heuristic-pick backward-data convolution algorithms.
get_convolution_backward_filter_algorithm
Heuristic-pick backward-filter convolution algorithms.
get_convolution_forward_algorithm
Heuristic-pick the top-N forward-convolution algorithms (cheap; doesn’t run them).
get_multi_head_attn_weights
Look up the descriptor of one of the (Q/K/V/O) weight matrices inside the packed weights buffer.
lrn_cross_channel_backward
Cross-channel LRN backward.
multi_head_attn_backward_data
Multi-head attention backward — data path (gradients w.r.t. Q/K/V).
multi_head_attn_backward_weights
Multi-head attention backward — weights path (gradient w.r.t. Q/K/V/O projection weights). Pass add_grad = true to accumulate into dweights (typical for multi-step training).
multi_head_attn_buffers
Buffer requirements (weights, work_space, reserve_space).
multi_head_attn_forward
Forward multi-head attention. The huge parameter list mirrors cuDNN’s cudnnMultiHeadAttnForward exactly; see the cuDNN reference for the meaning of each window / sequence-length array.
normalization_backward
Backward generic normalization.
normalization_backward_workspace_size
Workspace bytes for normalization_backward.
normalization_forward_inference
Inference-time generic normalization.
normalization_forward_training
Training-time forward generic normalization.
normalization_forward_training_workspace_size
Workspace bytes for normalization_forward_training.
normalization_training_reserve_space_size
Reserve-space bytes for the training fwd/bwd pair.
op_tensor
C = alpha1 * op(A) + alpha2 * op(B) + beta * C element-wise.
pooling_backward
dX = alpha * pool_backward(Y, dY, X) + beta * dX.
pooling_forward
Y = alpha * pool(X) + beta * Y (forward pass).
reduce_tensor
C = alpha * reduce(A) + beta * C over the axes where A’s extent is preserved and C’s is 1.
reduction_indices_size
Bytes of indices buffer required for index-returning reductions.
reorder_filter_and_bias
Pre-process filter / bias buffers for INT8 inference.
rnn_backward_data_v8
RNN backward — data path (gradients w.r.t. inputs and initial states).
rnn_backward_weights_v8
RNN backward — weights path (gradients w.r.t. the weight space). add_grad = true accumulates into dweight_space (typical for multi-step training); false overwrites.
rnn_forward
Forward pass of an RNN built via RnnDescriptor::set_v8 / build_rnn_dynamic. Pass fwd_mode = 0 for inference (no reserve space writes), 1 for training.
rnn_temp_space_sizes
Returns (work_space_size, reserve_space_size). fwd_mode = 0 for inference, 1 for training.
rnn_weight_space_size
Bytes the RNN’s weight space needs.
softmax_backward
dX = alpha * softmax_backward(Y, dY) + beta * dX.
softmax_forward
Y = alpha * softmax(X, algo, mode) + beta * Y.
spatial_tf_grid_generator
Compute the sampling grid from the affine transform theta.
spatial_tf_sampler
Bilinearly sample x at grid points to produce y.
version
cuDNN library version as a packed integer (e.g. 9106 for 9.1.6).

Type Aliases§

Error
Error type for cuDNN operations.
Result
Result alias.