morok-device
Device abstraction with lazy buffer allocation, zero-copy views, and LRU caching.
Example
use ;
use DType;
// CPU buffer (lazy allocation)
let cpu = cpu;
let buf = new;
// CUDA buffer with unified memory (CPU-accessible)
let cuda = cuda;
let opts = BufferOptions ;
let unified = allocate?;
// Zero-copy view
let view = buf.view;
// Device-to-device copy
dst.copy_from?;
Features
Supported:
- Lazy buffer allocation via
OnceLock - Zero-copy buffer views with offset tracking
- LRU allocation cache (per-size pooling)
- CPU allocator
- CUDA allocator (feature
cuda)
CUDA Buffer Management (feature cuda; CUDA code generation is not yet available — CPU backends only):
| Feature | Implementation | Notes |
|---|---|---|
| Unified memory | cudaMallocManaged |
cpu_accessible: true |
| Device memory | cuMemAlloc |
Faster GPU access |
| D2D copy | memcpy_dtod |
Direct device-to-device |
| H2D/D2H copy | memcpy_htod/dtoh |
Host transfers |
| Zero-init | memset_zeros |
Stream-based |
Planned:
- Metal allocator
- WebGPU allocator
- Multi-GPU peer access
- Custom stream management
- Pinned host memory
Copy Matrix
All combinations supported:
| CPU | CudaDevice | CudaUnified | |
|---|---|---|---|
| CPU | slice | H2D | slice |
| CudaDevice | D2H | D2D | D2D |
| CudaUnified | slice | D2D | slice |
Device Registry
cpu // CPU allocator
cuda // CUDA device 0
get_device // Parse string
parse // Case-insensitive parsing
spec.canonicalize // → "CUDA:0"
Testing