axonml-core
Overview
axonml-core is the foundational layer of the AxonML machine learning framework. It provides the Device abstraction, the Scalar/Numeric/Float trait hierarchy, reference-counted Storage<T> with pooled GPU allocations, and five compute backends (CPU, CUDA, Vulkan, Metal, WebGPU) that underpin every tensor operation in the framework.
Features
-
Device Abstraction -
Deviceenum (Cpu, Cuda, Vulkan, Metal, Wgpu) with per-variant device index, runtime availability checks, andbest_available_backend()selector (CUDA > Metal > Vulkan > WebGPU > CPU). -
Type-Safe Data Types -
DTyperuntime enum covering F16, F32, F64, I8, I16, I32, I64, U8, U32, U64, Bool withsize_of/is_float/is_signed/is_integerqueries. Compile-timeScalar/Numeric/Floattrait hierarchy for zero-cost generic dispatch. -
Reference-Counted Storage -
Storage<T>wraps either a hostVec<T>or aPooledCudaSlicebehindArc<RwLock<...>>. Supports zero-copy views via offset+len slicing,to_device()for CPU<->GPU transfer, deep copy, and RAIIas_slice()/as_slice_mut()guards. -
Five Compute Backends - CPU (rayon-parallel, matrixmultiply GEMM/GEMV, always available), CUDA (cuBLAS + 15+ custom PTX kernel modules), Vulkan (ash + gpu-allocator, SPIR-V compute), Metal (Apple Silicon, compute pipelines), WebGPU (wgpu for browser/cross-platform).
-
GPU Memory Pool -
cuda_poolreturns freed CUDA allocations to a size-bucketed free list instead of callingcudaFree, amortising allocator cost across training steps. -
Device Capabilities -
DeviceCapabilitiesexposes name, total/available memory, f16/f64 support, max threads per block, and CUDA compute capability. -
Allocator Trait -
Allocatorextension point with aDefaultAllocatorthat performs 64-byte-aligned host allocations and reports system memory via sysinfo.
Modules
| Module | Description |
|---|---|
device |
Device enum (Cpu, Cuda, Vulkan, Metal, Wgpu) + DeviceCapabilities with availability and capability queries |
dtype |
DType runtime enum and Scalar / Numeric / Float trait hierarchy; F16Wrapper and BoolWrapper adapters |
storage |
Reference-counted Storage<T> with zero-copy views, device transfer, and pooled GPU slices |
allocator |
Allocator trait and DefaultAllocator (64-byte-aligned CPU allocator) |
backends |
Backend trait, BackendType, GpuMemory, GpuStream, plus CPU/CUDA/Vulkan/Metal/WGPU implementations |
error |
Error / Result types for shape mismatches, device errors, and allocation failures |
Backends (under backends/)
| Backend | File | Status |
|---|---|---|
| CPU | cpu.rs |
Always compiled; rayon-parallel ops, matrixmultiply GEMM |
| CUDA | cuda.rs + cuda_kernels/ + cuda_pool.rs |
Feature cuda; cuBLAS + PTX kernels for elementwise, activations, attention, Q4_K/Q6_K dequant-in-shader matmul, softmax, layernorm, RMSNorm, transpose, embedding gather |
| cuDNN | cudnn_ops.rs |
Feature cudnn; conv2d forward/backward via cuDNN |
| Vulkan | vulkan.rs |
Feature vulkan; ash + gpu-allocator, full buffer/pipeline/dispatch (~982 lines) |
| Metal | metal.rs |
Feature metal; full buffer/pipeline/dispatch on Apple Silicon (~769 lines) |
| WebGPU | wgpu_backend.rs |
Feature wgpu; full buffer/pipeline/dispatch via wgpu (~1710 lines) |
Cargo Features
| Feature | Pulls In | Purpose |
|---|---|---|
std (default) |
— | Standard library support |
cuda |
cudarc |
NVIDIA CUDA backend |
cudnn |
cuda + cudarc cuDNN |
cuDNN conv ops |
vulkan |
ash, gpu-allocator |
Vulkan compute backend |
metal |
metal, objc (macOS only) |
Apple Metal backend |
wgpu |
wgpu, pollster |
WebGPU / cross-platform backend |
Usage
Add this to your Cargo.toml:
[]
= "0.6.1"
Basic Example
use ;
// Check device availability
let device = Cpu;
assert!;
// Create storage on CPU
let storage = zeros;
assert_eq!;
// Create storage from data
let data = vec!;
let storage = from_vec;
// Create a view (zero-copy slice)
let view = storage.slice.unwrap;
assert_eq!;
Device Capabilities
use Device;
let device = Cpu;
let caps = device.capabilities;
println!;
println!;
println!;
println!;
Data Types
use ;
// Query dtype properties
assert!;
assert_eq!;
// Use type traits
Picking a Backend
use ;
let backend = best_available_backend;
match backend
Tests
Run the test suite:
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Last updated: 2026-04-16 (v0.6.1)