Crate iro_cuda_ffi_kernels

Crate iro_cuda_ffi_kernels 

Source
Expand description

Reference CUDA kernels for iro-cuda-ffi.

This crate provides sample kernels that demonstrate proper iro-cuda-ffi usage patterns and serve as integration tests for the iro-cuda-ffi core crate.

§Available Kernels

  • vector_add_f32: Element-wise vector addition
  • fma_chain_f32: Deep compute chain (FMA)
  • saxpy_f32: Single-precision A*X + Y
  • daxpy_f64: Double-precision A*X + Y
  • scale_f32: Vector scaling
  • reduce_sum_f32: Parallel sum reduction
  • reduce_max_f32: Parallel max reduction

§Example

use iro_cuda_ffi::prelude::*;
use iro_cuda_ffi_kernels::vector_add_f32;

let stream = Stream::new()?;

let a = DeviceBuffer::from_slice_sync(&stream, &[1.0f32, 2.0, 3.0, 4.0])?;
let b = DeviceBuffer::from_slice_sync(&stream, &[5.0f32, 6.0, 7.0, 8.0])?;
let mut c = DeviceBuffer::<f32>::zeros(4)?;

vector_add_f32(&stream, &a, &b, &mut c)?;

let result = c.to_vec(&stream)?;
assert_eq!(result, vec![6.0, 8.0, 10.0, 12.0]);

§Device-to-device copy

use iro_cuda_ffi::prelude::*;

let stream = Stream::new()?;
let src = DeviceBuffer::from_slice_sync(&stream, &[1.0f32, 2.0, 3.0])?;
let mut dst = DeviceBuffer::<f32>::alloc(src.len())?;

dst.copy_from_device_sync(&stream, &src)?;
let result = dst.to_vec(&stream)?;
assert_eq!(result, vec![1.0, 2.0, 3.0]);

Constants§

MAX_GRID_X
Maximum grid_x dimension per CUDA spec (2^31 - 1).

Functions§

daxpy_f64
DAXPY operation: y = a * x + y (in-place, double precision)
fma_chain_f32
Deep compute chain: out = fma_chain(a, b, iters)
reduce_max_f32
Parallel max reduction (first pass).
reduce_max_full
Computes the maximum of all elements in the input vector.
reduce_sum_f32
Parallel sum reduction (first pass).
reduce_sum_full
Computes the sum of all elements in the input vector.
reduction_output_size
Returns the number of output elements needed for reduction.
saxpy_f32
SAXPY operation: y = a * x + y (in-place)
scale_f32
Scale vector: out = a * x
vector_add_f32
Element-wise vector addition: out = a + b
verify_abi_linked
Ensures the ABI asserts translation unit is linked.