Struct KaioDevice

Source

pub struct KaioDevice { /* private fields */ }

Expand description

A KAIO GPU device — wraps a CUDA context and its default stream.

Created via KaioDevice::new with a device ordinal (0 for the first GPU). All allocation and transfer operations go through the default stream.

§Example

let device = KaioDevice::new(0)?;
let buf = device.alloc_from(&[1.0f32, 2.0, 3.0])?;
let host = buf.to_host(&device)?;

Implementations§

Source §

impl KaioDevice

Source

pub fn new(ordinal: usize) -> Result<Self>

Create a new device targeting the GPU at the given ordinal.

Ordinal 0 is the first GPU. Returns an error if no GPU exists at that ordinal or if the CUDA driver fails to initialize.

Source

pub fn info(&self) -> Result<DeviceInfo>

Query basic information about this device.

Source

pub fn ordinal(&self) -> usize

CUDA device ordinal (0-indexed) this device wraps.

Used by bridge crates (e.g. kaio-candle) to cross-check that a host-framework device and a KaioDevice refer to the same GPU.

Source

pub fn alloc_from<T: DeviceRepr>(&self, data: &[T]) -> Result<GpuBuffer<T>>

Allocate device memory and copy data from a host slice.

Source

pub fn alloc_zeros<T: DeviceRepr + ValidAsZeroBits>( &self, len: usize, ) -> Result<GpuBuffer<T>>

Allocate zero-initialized device memory.

Source

pub fn stream(&self) -> &Arc<CudaStream>

Access the underlying CUDA stream for kernel launch operations.

Used with cudarc’s launch_builder to launch kernels. In Phase 2, the proc macro will generate typed wrappers that hide this.

Source

pub fn load_ptx(&self, ptx_text: &str) -> Result<KaioModule>

👎Deprecated since 0.2.1:

use load_module(&PtxModule) — runs PtxModule::validate() for readable SM-mismatch errors

Load a PTX module from source text and return a crate::module::KaioModule.

The PTX text is passed to the CUDA driver’s cuModuleLoadData — no NVRTC compilation occurs. The driver JIT-compiles the PTX for the current GPU.

§Deprecated — prefer `load_module`

The module path runs PtxModule::validate before the driver sees the PTX, catching SM mismatches (e.g. mma.sync on sub-Ampere targets) with readable KaioError::Validation errors instead of cryptic ptxas failures deep in the driver.

This function remains public for raw-PTX use cases (external PTX files, hand-written PTX for research, bypassing validation intentionally). It is not scheduled for removal in the 0.2.x line.

§Migration

Before:

let ptx_text: String = build_my_ptx();
let module = device.load_ptx(&ptx_text)?;

After:

use kaio_core::ir::PtxModule;
let ptx_module: PtxModule = build_my_module("sm_80");
let module = device.load_module(&ptx_module)?;

Source

pub fn load_module(&self, module: &PtxModule) -> Result<KaioModule>

Validate, emit, and load a kaio_core::ir::PtxModule on the device.

This is the preferred entrypoint when the caller has an in-memory PtxModule (as opposed to raw PTX text). Before the PTX text is handed to the driver, kaio_core::ir::PtxModule::validate checks that the module’s target SM supports every feature used by its kernels — raising KaioError::Validation if e.g. a mma.sync op is present but the target is sm_70.

Surfacing the error at this layer gives the user a readable message (“mma.sync.m16n8k16 requires sm_80+, target is sm_70”) instead of a cryptic ptxas error from deep in the driver.