Expand description
Reference CUDA kernels for iro-cuda-ffi.
This crate provides sample kernels that demonstrate proper iro-cuda-ffi usage patterns and serve as integration tests for the iro-cuda-ffi core crate.
§Available Kernels
vector_add_f32: Element-wise vector additionfma_chain_f32: Deep compute chain (FMA)saxpy_f32: Single-precision A*X + Ydaxpy_f64: Double-precision A*X + Yscale_f32: Vector scalingreduce_sum_f32: Parallel sum reductionreduce_max_f32: Parallel max reduction
§Example
ⓘ
use iro_cuda_ffi::prelude::*;
use iro_cuda_ffi_kernels::vector_add_f32;
let stream = Stream::new()?;
let a = DeviceBuffer::from_slice_sync(&stream, &[1.0f32, 2.0, 3.0, 4.0])?;
let b = DeviceBuffer::from_slice_sync(&stream, &[5.0f32, 6.0, 7.0, 8.0])?;
let mut c = DeviceBuffer::<f32>::zeros(4)?;
vector_add_f32(&stream, &a, &b, &mut c)?;
let result = c.to_vec(&stream)?;
assert_eq!(result, vec![6.0, 8.0, 10.0, 12.0]);§Device-to-device copy
ⓘ
use iro_cuda_ffi::prelude::*;
let stream = Stream::new()?;
let src = DeviceBuffer::from_slice_sync(&stream, &[1.0f32, 2.0, 3.0])?;
let mut dst = DeviceBuffer::<f32>::alloc(src.len())?;
dst.copy_from_device_sync(&stream, &src)?;
let result = dst.to_vec(&stream)?;
assert_eq!(result, vec![1.0, 2.0, 3.0]);Constants§
- MAX_
GRID_ X - Maximum grid_x dimension per CUDA spec (2^31 - 1).
Functions§
- daxpy_
f64 - DAXPY operation: y = a * x + y (in-place, double precision)
- fma_
chain_ f32 - Deep compute chain:
out = fma_chain(a, b, iters) - reduce_
max_ f32 - Parallel max reduction (first pass).
- reduce_
max_ full - Computes the maximum of all elements in the input vector.
- reduce_
sum_ f32 - Parallel sum reduction (first pass).
- reduce_
sum_ full - Computes the sum of all elements in the input vector.
- reduction_
output_ size - Returns the number of output elements needed for reduction.
- saxpy_
f32 - SAXPY operation: y = a * x + y (in-place)
- scale_
f32 - Scale vector: out = a * x
- vector_
add_ f32 - Element-wise vector addition: out = a + b
- verify_
abi_ linked - Ensures the ABI asserts translation unit is linked.