morok-device

Device abstraction with lazy buffer allocation, zero-copy views, and LRU caching.

Example

use morok_device::{Buffer, BufferOptions, registry};
use morok_dtype::DType;

// CPU buffer (lazy allocation)
let cpu = registry::cpu();
let buf = Buffer::new(cpu, DType::float32(), &[1024], BufferOptions::default());

// CUDA buffer with unified memory (CPU-accessible)
let cuda = registry::cuda(0);
let opts = BufferOptions { cpu_accessible: true, ..Default::default() };
let unified = Buffer::allocate(cuda, DType::float32(), &[1024], opts)?;

// Zero-copy view
let view = buf.view(0, 512);

// Device-to-device copy
dst.copy_from(&src)?;

Features

Supported:

Lazy buffer allocation via OnceLock
Zero-copy buffer views with offset tracking
LRU allocation cache (per-size pooling)
CPU allocator
CUDA allocator (feature cuda)

CUDA Buffer Management (feature cuda; CUDA code generation is not yet available — CPU backends only):

Feature	Implementation	Notes
Unified memory	`cudaMallocManaged`	`cpu_accessible: true`
Device memory	`cuMemAlloc`	Faster GPU access
D2D copy	`memcpy_dtod`	Direct device-to-device
H2D/D2H copy	`memcpy_htod/dtoh`	Host transfers
Zero-init	`memset_zeros`	Stream-based

Planned:

Metal allocator
WebGPU allocator
Multi-GPU peer access
Custom stream management
Pinned host memory

Copy Matrix

All combinations supported:

	CPU	CudaDevice	CudaUnified
CPU	slice	H2D	slice
CudaDevice	D2H	D2D	D2D
CudaUnified	slice	D2D	slice

Device Registry

registry::cpu()              // CPU allocator
registry::cuda(0)            // CUDA device 0
registry::get_device("CUDA:1")  // Parse string

DeviceSpec::parse("cuda:0")  // Case-insensitive parsing
spec.canonicalize()          // → "CUDA:0"

Testing