pub struct KaioDevice { /* private fields */ }Expand description
A KAIO GPU device — wraps a CUDA context and its default stream.
Created via KaioDevice::new with a device ordinal (0 for the first GPU).
All allocation and transfer operations go through the default stream.
§Example
let device = KaioDevice::new(0)?;
let buf = device.alloc_from(&[1.0f32, 2.0, 3.0])?;
let host = buf.to_host(&device)?;Implementations§
Source§impl KaioDevice
impl KaioDevice
Sourcepub fn new(ordinal: usize) -> Result<Self>
pub fn new(ordinal: usize) -> Result<Self>
Create a new device targeting the GPU at the given ordinal.
Ordinal 0 is the first GPU. Returns an error if no GPU exists at that ordinal or if the CUDA driver fails to initialize.
Sourcepub fn info(&self) -> Result<DeviceInfo>
pub fn info(&self) -> Result<DeviceInfo>
Query basic information about this device.
Sourcepub fn alloc_from<T: DeviceRepr>(&self, data: &[T]) -> Result<GpuBuffer<T>>
pub fn alloc_from<T: DeviceRepr>(&self, data: &[T]) -> Result<GpuBuffer<T>>
Allocate device memory and copy data from a host slice.
Sourcepub fn alloc_zeros<T: DeviceRepr + ValidAsZeroBits>(
&self,
len: usize,
) -> Result<GpuBuffer<T>>
pub fn alloc_zeros<T: DeviceRepr + ValidAsZeroBits>( &self, len: usize, ) -> Result<GpuBuffer<T>>
Allocate zero-initialized device memory.
Sourcepub fn stream(&self) -> &Arc<CudaStream>
pub fn stream(&self) -> &Arc<CudaStream>
Access the underlying CUDA stream for kernel launch operations.
Used with cudarc’s launch_builder to launch kernels. In Phase 2,
the proc macro will generate typed wrappers that hide this.
Sourcepub fn load_ptx(&self, ptx_text: &str) -> Result<KaioModule>
pub fn load_ptx(&self, ptx_text: &str) -> Result<KaioModule>
Load a PTX module from source text and return a crate::module::KaioModule.
The PTX text is passed to the CUDA driver’s cuModuleLoadData —
no NVRTC compilation occurs. The driver JIT-compiles the PTX for
the current GPU.
§Example
let module = device.load_ptx(&ptx_text)?;
let func = module.function("vector_add")?;Sourcepub fn load_module(&self, module: &PtxModule) -> Result<KaioModule>
pub fn load_module(&self, module: &PtxModule) -> Result<KaioModule>
Validate, emit, and load a kaio_core::ir::PtxModule on the device.
This is the preferred entrypoint when the caller has an in-memory
PtxModule (as opposed to raw PTX text). Before the PTX text is
handed to the driver, kaio_core::ir::PtxModule::validate
checks that the module’s target SM supports every feature used by
its kernels — raising
KaioError::Validation if
e.g. a mma.sync op is present but the target is sm_70.
Surfacing the error at this layer gives the user a readable
message (“mma.sync.m16n8k16 requires sm_80+, target is sm_70”)
instead of a cryptic ptxas error from deep in the driver.