hodu_cuda_kernels 0.2.4

hodu cuda kernels
docs.rs failed to build hodu_cuda_kernels-0.2.4
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

hodu_cuda_kernels

High-performance CUDA kernels for tensor operations on NVIDIA GPUs.

cuBLAS Integration

Supported Operations

  • matmul: Batched matrix multiplication with GEMM
  • dot: 2D matrix multiplication with GEMM

Supported Data Types

  • bf16: BFloat16 (compute in FP32, I/O in BF16)
  • f16: Float16/Half (compute in FP32, I/O in FP16)
  • f32: Float32 (native precision)
  • f64: Float64 (native precision)

Features

  • Automatic fallback to custom CUDA kernels for unsupported types or non-contiguous matrices
  • Handles non-contiguous matrices via leading dimension parameters
  • Transparent row-major to column-major layout conversion