rust-ai-core
Foundation layer for the rust-ai ecosystem, providing unified abstractions for device selection, error handling, configuration validation, and CubeCL interop.
rust-ai-core is the shared foundation that enables a future AI framework built on transparency, traceability, performance, ease of use, repeatability, and customization depth.
Design Philosophy
CUDA-first: All operations prefer GPU execution. CPU is a fallback that emits warnings, not a silent alternative. This ensures users are aware when they're not getting optimal performance.
Ecosystem Integration: rust-ai-core serves as the foundation for all rust-ai crates, ensuring consistent behavior, unified error handling, and seamless interoperability across the entire stack.
Features
- Unified Device Selection: CUDA-first with environment variable overrides
- Common Error Types:
CoreErrorhierarchy shared across all crates - Trait Interfaces:
ValidatableConfig,Quantize,Dequantize,GpuDispatchable - CubeCL Interop: Candle ↔ CubeCL tensor conversion utilities
Installation
Rust
[]
= "0.2"
# With CUDA support
= { = "0.2", = ["cuda"] }
Python
The Python package provides bindings for memory estimation, device detection, and dtype utilities.
Quick Start
Rust
use ;
Python
# Memory estimation for AI training planning
, , , = 1, 32, 4096, 128
=
# Tensor memory estimation
=
=
# Memory tracking for GPU budget management
= # 8 GB limit
# Device detection
=
# Data type utilities
Environment Variables
| Variable | Description |
|---|---|
RUST_AI_FORCE_CPU |
Set to 1 or true to force CPU execution |
RUST_AI_CUDA_DEVICE |
CUDA device ordinal (default: 0) |
Legacy variables from individual crates are also supported:
AXOLOTL_FORCE_CPU,AXOLOTL_CUDA_DEVICEVSA_OPTIM_FORCE_CPU,VSA_OPTIM_CUDA_DEVICE
Modules
device
CUDA-first device selection with fallback warnings.
use ;
// Explicit configuration
let config = new
.with_cuda_device
.with_force_cpu
.with_crate_name;
let device = get_device?;
// In hot paths, warn if on CPU
warn_if_cpu;
error
Common error types shared across the ecosystem.
use ;
traits
Common trait interfaces for configuration and GPU dispatch.
use ;
cubecl (feature: cuda)
CubeCL ↔ Candle tensor interop.
use ;
if has_cubecl_cuda_support
Crate Integration
All rust-ai crates depend on rust-ai-core as their foundation:
rust-ai-core (Foundation Layer)
│
├── trit-vsa - Ternary Vector Symbolic Architectures
├── bitnet-quantize - 1.58-bit quantization
├── peft-rs - LoRA, DoRA, AdaLoRA adapters
├── qlora-rs - 4-bit quantization + QLoRA
├── unsloth-rs - GPU-optimized transformer kernels
├── vsa-optim-rs - VSA optimizers and operations
├── axolotl-rs - High-level fine-tuning orchestration
└── tritter-accel - Ternary GPU acceleration
Each crate uses rust-ai-core's:
- Device selection: Consistent CUDA-first device logic
- Error types: Shared
CoreErrorhierarchy with domain-specific extensions - Traits: Common interfaces (
ValidatableConfig,Quantize, etc.) - CubeCL interop: Unified Candle ↔ CubeCL tensor conversion
Public API Reference
Core Types
DeviceConfig- Configuration builder for device selectionCoreError- Unified error type with domain-specific variantsResult<T>- Type alias forstd::result::Result<T, CoreError>TensorBuffer- Intermediate representation for Candle ↔ CubeCL conversion
Traits
-
ValidatableConfig- Configuration validation interface -
Quantize<Q>- Tensor quantization (full precision → quantized) -
Dequantize<Q>- Tensor dequantization (quantized → full precision) -
GpuDispatchable- GPU/CPU dispatch pattern for operations with both implementations
Device Selection Functions
get_device(config: &DeviceConfig) -> Result<Device>- Get device with CUDA-first fallbackwarn_if_cpu(device: &Device, crate_name: &str)- Emit one-time CPU warning
CubeCL Interop (feature: cuda)
has_cubecl_cuda_support() -> bool- Check if CubeCL CUDA runtime is availablecandle_to_cubecl_handle(tensor: &Tensor) -> Result<TensorBuffer>- Convert Candle tensor to CubeCL buffercubecl_to_candle_tensor(buffer: &TensorBuffer, device: &Device) -> Result<Tensor>- Convert CubeCL buffer to Candle tensorallocate_output_buffer(shape: &[usize], dtype: DType) -> Result<TensorBuffer>- Pre-allocate CubeCL output buffer
Error Handling Philosophy
rust-ai-core provides a structured error hierarchy that balances specificity with ergonomics:
use ;
Crates should extend CoreError with domain-specific variants:
Future Framework Goals
rust-ai-core is designed to enable a future AI framework with these principles:
- Transparency - Clear, understandable operations at every level
- Traceability - Track what happens at each step with detailed logging
- Performance - GPU-accelerated, optimized for production workloads
- Ease of use - Simple high-level API with sensible defaults
- Repeatability - Deterministic, reproducible results
- Customization depth - Users can go as deep as they want, from high-level APIs to custom kernels
See ARCHITECTURE.md for design decisions and extension points.
Python API Reference
The rust-ai-core-bindings package exposes the following functions:
Memory Estimation
# Estimate tensor memory
->
# Estimate attention layer memory (Q, K, V + attention weights + output)
->
Memory Tracking
# Create a memory tracker with optional limit
->
# Record allocation (raises if exceeds limit)
# Record deallocation
# Query tracker state
->
->
->
# Reset tracker to initial state
Device Detection
# Check CUDA availability
->
# Get device information (returns dict with type, ordinal, name)
->
Data Type Utilities
# Get bytes per element for dtype
->
# Check if dtype is floating point
->
# Get accumulator dtype for mixed precision
->
Supported Data Types
"f32"- 32-bit float"f16"- 16-bit float"bf16"- Brain float 16"f64"- 64-bit float"i64"- 64-bit integer"u32"- 32-bit unsigned integer"u8"- 8-bit unsigned integer
Logging
# Initialize logging (call once at startup)
# debug, info, warn, error
License
MIT License - see LICENSE-MIT
Contributing
Contributions welcome! Please ensure:
- All public items have documentation
- Tests pass:
cargo test - Lints pass:
cargo clippy --all-targets --all-features - Code is formatted:
cargo fmt