Hodu, a user-friendly ML framework built in Rust.
Hodu (호두) is a Korean word meaning "walnut".
About Hodu
Hodu is a machine learning library built with user convenience at its core, designed for both rapid prototyping and seamless production deployment—including embedded environments.
Core Differentiators
Built on Rust's foundation of memory safety and zero-cost abstractions, Hodu offers unique advantages:
- Hybrid Execution Model: Seamlessly switch between dynamic execution for rapid prototyping and static computation graphs for optimized production deployment
- Memory Safety by Design: Leverage Rust's ownership system to eliminate common ML deployment issues like memory leaks and data races
- Embedded-First Architecture: Full
no_stdsupport enables ML inference on microcontrollers and resource-constrained devices - Zero-Cost Abstractions: High-level APIs that compile down to efficient machine code without runtime overhead
Execution Modes and Compilers
Dynamic Execution: Immediate tensor operations for rapid prototyping
- CPU operations
- Metal GPU support for macOS (with
metalfeature) - CUDA GPU acceleration (with
cudafeature)
Static Execution: Compiled computation graphs with two compiler backends
-
HODU Compiler: Self implementation with
no_stdsupport- Optimized constant caching eliminates repeated device transfers
- CPU, Metal, and CUDA device support
- Embedded-friendly for resource-constrained environments
-
XLA Compiler: JIT compilation via OpenXLA/PJRT (requires
std)- Advanced graph-level optimizations with compilation caching
- Production-grade performance comparable to JAX
- CPU and CUDA device support
[!WARNING]
This is a personal learning and development project. As such:
- The framework is under active development
- Features may be experimental or incomplete
- Functionality is not guaranteed for production use
It is recommended to use the latest version.
Get started
Requirements
Required
- Rust 1.90.0 or later (latest stable version recommended)
Optional
-
OpenBLAS 0.3.30+ (recommended) - For optimized linear algebra operations on CPU
- macOS:
brew install openblas gfortran - Linux:
sudo apt install libopenblas-dev pkg-config gfortran - Windows: Install via vcpkg or MinGW
- macOS:
-
LLVM/Clang - Required when building with the
xlafeature- macOS:
brew install llvm - Linux:
sudo apt install llvm clang - Windows: Install from LLVM releases
- macOS:
-
CUDA Toolkit - Required when using the
cudafeature- Download from NVIDIA CUDA Toolkit
-
Xcode Command Line Tools - Required when using the
metalfeature on macOSxcode-select --install
Examples
Here are some examples that demonstrate matrix multiplication using both dynamic execution and static computation graphs.
Dynamic Execution
This example shows direct tensor operations that are executed immediately:
use *;
With the cuda feature enabled, you can use CUDA in dynamic execution with the following setting:
- set_runtime_device(Device::CPU);
+ set_runtime_device(Device::CUDA(0));
Static Computation Graphs
For more complex workflows or when you need reusable computation graphs, you can use the Builder pattern:
use *;
With the cuda feature enabled, you can use CUDA in static computation graphs with the following setting:
let mut script = builder.build()?;
+ script.set_device(Device::CUDA(0));
With the xla feature enabled, you can use XLA in static computation graphs with the following setting:
let mut script = builder.build()?;
+ script.set_compiler(Compiler::XLA);
Features
Default Features
| Feature | Description | Dependencies |
|---|---|---|
std |
Standard library support | - |
serde |
Serialization/deserialization support | - |
rayon |
Parallel processing support | std |
Optional Features
| Feature | Description | Dependencies | Required Features |
|---|---|---|---|
cuda |
NVIDIA CUDA GPU support | CUDA toolkit | - |
metal |
Apple Metal GPU support | Metal framework | std |
xla |
Google XLA compiler backend | XLA libraries | std |
XLA Feature Requirements
Building with the xla feature requires:
- LLVM and Clang installed on your system
- RAM: 8GB+ free memory
- Disk Space: 20GB+ free storage
Optional Data Type Features
By default, Hodu supports these data types: bool, f8e4m3, bf16, f16, f32, u8, u32, i8, i32.
Additional data types can be enabled with feature flags to reduce compilation time:
| Feature | Description |
|---|---|
f8e5m2 |
Enable 8-bit floating point (E5M2) support |
f64 |
Enable 64-bit floating point support |
u16 |
Enable unsigned 16-bit integer support |
u64 |
Enable unsigned 64-bit integer support |
i16 |
Enable signed 16-bit integer support |
i64 |
Enable signed 64-bit integer support |
Compilation Performance: Disabling unused data types can reduce compilation time by up to 30-40%. If you don't need these specific data types, consider building without these features.
Supported Platforms
Standard Environments
| Target Triple | Backend | Device | Features | Status |
|---|---|---|---|---|
| x86_64-unknown-linux-gnu | HODU | CPU | std |
✅ Stable |
| HODU | CUDA | std, cuda |
✅ Stable | |
| XLA | CPU | std, xla |
✅ Stable | |
| XLA | CUDA | std, xla, cuda |
🚧 In Development | |
| aarch64-unknown-linux-gnu | HODU | CPU | std |
✅ Stable |
| XLA | CPU | std, xla |
✅ Stable | |
| x86_64-apple-darwin | HODU | CPU | std |
🧪 Experimental |
| XLA | CPU | std, xla |
🚧 In Development | |
| aarch64-apple-darwin | HODU | CPU | std |
✅ Stable |
| HODU | Metal | std, metal |
🧪 Experimental | |
| XLA | CPU | std, xla |
✅ Stable | |
| x86_64-pc-windows-msvc | HODU | CPU | std |
🧪 Experimental |
| HODU | CUDA | std, cuda |
🧪 Experimental | |
| XLA | CPU | std, xla |
🚧 In Development | |
| XLA | CUDA | std, xla, cuda |
🚧 In Development |
Embedded Environments
🧪 Experimental: Embedded platforms are supported but are experimental and not extensively tested in production environments.
Note: Development should be done in a standard (std) host environment. Cross-compilation for embedded targets is supported.
| Target Triple | Backend | Device | Features | Status |
|---|---|---|---|---|
| thumbv7em-none-eabihf | HODU | CPU | (no default) | 🧪 Experimental |
| aarch64-unknown-none | HODU | CPU | (no default) | 🧪 Experimental |
| HODU | CUDA | cuda |
🧪 Experimental (Jetson) | |
| armv7a-none-eabi | HODU | CPU | (no default) | 🧪 Experimental |
For bare-metal and RTOS environments on ARM processors.
# example 1
# ARM Cortex-M (microcontrollers)
# example 2
# ARM Cortex-A 32-bit (application processors)
With OpenBLAS (Optional)
For better performance, you can cross-compile OpenBLAS. Here's an example for ARM Cortex-M:
# Install ARM cross-compiler
# macOS: brew install arm-none-eabi-gcc
# Linux: sudo apt install gcc-arm-none-eabi
# Clone and build OpenBLAS (example for ARMV7)
# Build Hodu with OpenBLAS
Note: Adjust the compiler, target, and build flags according to your specific ARM platform.
NVIDIA Jetson Series
Jetson devices (Nano, Xavier NX, AGX Xavier, Orin series) are ARM Cortex-A based systems with integrated NVIDIA GPUs.
Note: CUDA feature works without standard library (
no_std) for embedded deployment.
# Build with CUDA support for Jetson
Requirements:
- CUDA toolkit (from JetPack SDK or standalone)
- CUDA Compute Capability 5.3+ (Jetson Nano and newer)
Environment Variables
Common environment variables for cross-compilation:
OPENBLAS_DIR,OPENBLAS_INCLUDE_DIR,OPENBLAS_LIB_DIR- OpenBLAS paths for cross-compilationHODU_DISABLE_BLAS- Force disable OpenBLASHODU_DISABLE_NATIVE- Disable native CPU optimizationsHODU_DISABLE_SIMD- Disable SIMD auto-detection
Docs
CHANGELOG - Project changelog and version history
TODOS - Planned features and improvements
CONTRIBUTING - Contribution guide
Guide
- Tensor Creation Guide (Korean) - 텐서 생성 가이드
- Tensor Creation Guide (English) - Tensor creation guide
- Tensor Data Type Guide - Tensor data type guide
- Tensor Operations Guide - Tensor operations guide (only English)
- Neural Network Modules Guide (Korean) - 신경망 모듈 가이드
- Neural Network Modules Guide (English) - Neural network modules guide
- Tensor Utils Guide (Korean) - 텐서 유틸리티 가이드 (DataLoader, Dataset, Sampler)
- Tensor Utils Guide (English) - Tensor utilities guide (DataLoader, Dataset, Sampler)
- Builder/Script Guide (Korean) - Builder/Script 가이드
- Builder/Script Guide (English) - Builder/Script guide
- Gradient Tape Management Guide (Korean) - 그래디언트 테이프 관리 가이드
- Gradient Tape Management Guide (English) - Gradient tape management guide
Related Projects
Here are some other Rust ML frameworks you might find interesting:
- maidenx - The predecessor project to Hodu
- cetana - An advanced machine learning library empowering developers to build intelligent applications with ease.
Inspired by
Hodu draws inspiration from the following amazing projects:
- maidenx - The predecessor project to Hodu
- candle - Minimalist ML framework for Rust
- GoMlx - An Accelerated Machine Learning Framework For Go
Credits
Hodu Character Design: Created by Eira