pub struct KaioDevice { /* private fields */ }Expand description
A KAIO GPU device — wraps a CUDA context and its default stream.
Created via KaioDevice::new with a device ordinal (0 for the first GPU).
All allocation and transfer operations go through the default stream.
§Example
let device = KaioDevice::new(0)?;
let buf = device.alloc_from(&[1.0f32, 2.0, 3.0])?;
let host = buf.to_host(&device)?;Implementations§
Source§impl KaioDevice
impl KaioDevice
Sourcepub fn new(ordinal: usize) -> Result<Self>
pub fn new(ordinal: usize) -> Result<Self>
Create a new device targeting the GPU at the given ordinal.
Ordinal 0 is the first GPU. Returns an error if no GPU exists at that ordinal or if the CUDA driver fails to initialize.
Sourcepub fn info(&self) -> Result<DeviceInfo>
pub fn info(&self) -> Result<DeviceInfo>
Query basic information about this device.
Sourcepub fn ordinal(&self) -> usize
pub fn ordinal(&self) -> usize
CUDA device ordinal (0-indexed) this device wraps.
Used by bridge crates (e.g. kaio-candle) to cross-check that a
host-framework device and a KaioDevice refer to the same GPU.
Sourcepub fn alloc_from<T: DeviceRepr>(&self, data: &[T]) -> Result<GpuBuffer<T>>
pub fn alloc_from<T: DeviceRepr>(&self, data: &[T]) -> Result<GpuBuffer<T>>
Allocate device memory and copy data from a host slice.
Sourcepub fn alloc_zeros<T: DeviceRepr + ValidAsZeroBits>(
&self,
len: usize,
) -> Result<GpuBuffer<T>>
pub fn alloc_zeros<T: DeviceRepr + ValidAsZeroBits>( &self, len: usize, ) -> Result<GpuBuffer<T>>
Allocate zero-initialized device memory.
Sourcepub fn stream(&self) -> &Arc<CudaStream>
pub fn stream(&self) -> &Arc<CudaStream>
Access the underlying CUDA stream for kernel launch operations.
Used with cudarc’s launch_builder to launch kernels. In Phase 2,
the proc macro will generate typed wrappers that hide this.
Sourcepub fn load_ptx(&self, ptx_text: &str) -> Result<KaioModule>
👎Deprecated since 0.2.1: use load_module(&PtxModule) — runs PtxModule::validate() for readable SM-mismatch errors
pub fn load_ptx(&self, ptx_text: &str) -> Result<KaioModule>
use load_module(&PtxModule) — runs PtxModule::validate() for readable SM-mismatch errors
Load a PTX module from source text and return a crate::module::KaioModule.
The PTX text is passed to the CUDA driver’s cuModuleLoadData —
no NVRTC compilation occurs. The driver JIT-compiles the PTX for
the current GPU.
§Deprecated — prefer load_module
The module path runs
PtxModule::validate
before the driver sees the PTX, catching SM mismatches (e.g.
mma.sync on sub-Ampere targets) with readable
KaioError::Validation
errors instead of cryptic ptxas failures deep in the driver.
This function remains public for raw-PTX use cases (external PTX files, hand-written PTX for research, bypassing validation intentionally). It is not scheduled for removal in the 0.2.x line.
§Migration
Before:
let ptx_text: String = build_my_ptx();
let module = device.load_ptx(&ptx_text)?;After:
use kaio_core::ir::PtxModule;
let ptx_module: PtxModule = build_my_module("sm_80");
let module = device.load_module(&ptx_module)?;Sourcepub fn load_module(&self, module: &PtxModule) -> Result<KaioModule>
pub fn load_module(&self, module: &PtxModule) -> Result<KaioModule>
Validate, emit, and load a kaio_core::ir::PtxModule on the device.
This is the preferred entrypoint when the caller has an in-memory
PtxModule (as opposed to raw PTX text). Before the PTX text is
handed to the driver, kaio_core::ir::PtxModule::validate
checks that the module’s target SM supports every feature used by
its kernels — raising
KaioError::Validation if
e.g. a mma.sync op is present but the target is sm_70.
Surfacing the error at this layer gives the user a readable
message (“mma.sync.m16n8k16 requires sm_80+, target is sm_70”)
instead of a cryptic ptxas error from deep in the driver.