Hodu, a user-friendly ML framework built in Rust.

Hodu (호두) is a Korean word meaning "walnut".

About Hodu

Hodu is a machine learning library built with user convenience at its core, designed for both rapid prototyping and seamless production deployment—including embedded environments.

Core Differentiators

Built on Rust's foundation of memory safety and zero-cost abstractions, Hodu offers unique advantages:

Hybrid Execution Model: Seamlessly switch between dynamic execution for rapid prototyping and static computation graphs for optimized production deployment
Memory Safety by Design: Leverage Rust's ownership system to eliminate common ML deployment issues like memory leaks and data races
Embedded-First Architecture: Full no_std support enables ML inference on microcontrollers and resource-constrained devices
Zero-Cost Abstractions: High-level APIs that compile down to efficient machine code without runtime overhead

Execution Modes and Compilers

Dynamic Execution: Immediate tensor operations for rapid prototyping

CPU operations
Metal GPU support for macOS (with metal feature)
CUDA GPU acceleration (with cuda feature)

Static Execution: Compiled computation graphs with two compiler backends

HODU Compiler: Self implementation with no_std support
- Optimized constant caching eliminates repeated device transfers
- CPU, Metal, and CUDA device support
- Embedded-friendly for resource-constrained environments
XLA Compiler: JIT compilation via OpenXLA/PJRT (requires std)
- Advanced graph-level optimizations with compilation caching
- Production-grade performance comparable to JAX
- CPU and CUDA device support

[!WARNING]

This is a personal learning and development project. As such:

The framework is under active development

Features may be experimental or incomplete

Functionality is not guaranteed for production use

It is recommended to use the latest version.

[!CAUTION]

Current Development Status:

CUDA GPU support is not yet fully implemented and is under active development

Get started

Requirements

Required

Rust 1.90.0 or later (latest stable version recommended)

Optional

OpenBLAS 0.3.30+ (recommended) - For optimized linear algebra operations on CPU
- macOS: brew install openblas
- Linux: sudo apt install libopenblas-dev
- Windows: Install via vcpkg or MinGW
LLVM/Clang - Required when building with the xla feature
- macOS: brew install llvm
- Linux: sudo apt install llvm clang
- Windows: Install from LLVM releases
CUDA Toolkit - Required when using the cuda feature
- Download from NVIDIA CUDA Toolkit
Xcode Command Line Tools - Required when using the metal feature on macOS
- xcode-select --install

Examples

Here are some examples that demonstrate matrix multiplication using both dynamic execution and static computation graphs.

Dynamic Execution

This example shows direct tensor operations that are executed immediately:

use hodu::prelude::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Set the runtime device (CPU, CUDA, Metal)
    set_runtime_device(Device::CPU);

    // Create random tensors
    let a = Tensor::randn(&[2, 3], 0f32, 1.)?;
    let b = Tensor::randn(&[3, 4], 0f32, 1.)?;

    // Matrix multiplication
    let c = a.matmul(&b)?;

    println!("{}", c);
    println!("{:?}", c);

    Ok(())
}

With the cuda feature enabled, you can use CUDA in dynamic execution with the following setting:

- set_runtime_device(Device::CPU);
+ set_runtime_device(Device::CUDA(0));

Static Computation Graphs

For more complex workflows or when you need reusable computation graphs, you can use the Builder pattern:

use hodu::prelude::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create a new computation graph
    let builder = Builder::new("matmul_example".to_string());
    builder.start()?;

    // Define input placeholders
    let a = Tensor::input("a", &[2, 3])?;
    let b = Tensor::input("b", &[3, 4])?;

    // Define computation
    let c = a.matmul(&b)?;

    // Mark output
    builder.add_output("result", c)?;
    builder.end()?;

    // Build and execute script
    let mut script = builder.build()?;

    // Provide actual data
    let a_data = Tensor::randn(&[2, 3], 0f32, 1.)?;
    let b_data = Tensor::randn(&[3, 4], 0f32, 1.)?;
    script.set_input("a", a_data);
    script.set_input("b", b_data);

    // Execute and get results
    let output = script.run()?;
    println!("{}", output["result"]);
    println!("{:?}", output["result"]);

    Ok(())
}

With the cuda feature enabled, you can use CUDA in static computation graphs with the following setting:

let mut script = builder.build()?;
+ script.set_device(Device::CUDA(0));

With the xla feature enabled, you can use XLA in static computation graphs with the following setting:

let mut script = builder.build()?;
+ script.set_compiler(Compiler::XLA);

Features

Default Features

Feature	Description	Dependencies
`std`	Standard library support	-
`serde`	Serialization/deserialization support	-
`rayon`	Parallel processing support	`std`

Optional Features

Feature	Description	Dependencies	Required Features
`cuda`	NVIDIA CUDA GPU support	CUDA toolkit	-
`metal`	Apple Metal GPU support	Metal framework	`std`
`xla`	Google XLA compiler backend	XLA libraries	`std`

XLA Feature Requirements

Building with the xla feature requires:

LLVM and Clang installed on your system
RAM: 8GB+ free memory
Disk Space: 20GB+ free storage

Optional Data Type Features

By default, Hodu supports these data types: bool, f8e4m3, bf16, f16, f32, u8, u32, i8, i32.

Additional data types can be enabled with feature flags to reduce compilation time:

Feature	Description
`f8e5m2`	Enable 8-bit floating point (E5M2) support
`f64`	Enable 64-bit floating point support
`u16`	Enable unsigned 16-bit integer support
`u64`	Enable unsigned 64-bit integer support
`i16`	Enable signed 16-bit integer support
`i64`	Enable signed 64-bit integer support

Compilation Performance: Disabling unused data types can reduce compilation time by up to 30-40%. If you don't need these specific data types, consider building without these features.

Supported Platforms

Standard Environments

Target Triple	Backend	Device	Features	Status
x86_64-unknown-linux-gnu	HODU	CPU	`std`	✅ Stable
	HODU	CUDA	`std`, `cuda`	🚧 In Development
	XLA	CPU	`std`, `xla`	✅ Stable
	XLA	CUDA	`std`, `xla`, `cuda`	🚧 In Development
aarch64-unknown-linux-gnu	HODU	CPU	`std`	✅ Stable
	XLA	CPU	`std`, `xla`	✅ Stable
x86_64-apple-darwin	HODU	CPU	`std`	🧪 Experimental
	XLA	CPU	`std`, `xla`	🚧 In Development
aarch64-apple-darwin	HODU	CPU	`std`	✅ Stable
	HODU	Metal	`std`, `metal`	🧪 Experimental
	XLA	CPU	`std`, `xla`	✅ Stable
x86_64-pc-windows-msvc	HODU	CPU	`std`	✅ Stable
	HODU	CUDA	`std`, `cuda`	🚧 In Development
	XLA	CPU	`std`, `xla`	🚧 In Development
	XLA	CUDA	`std`, `xla`, `cuda`	🚧 In Development

Embedded Environments

🧪 Experimental: Embedded platforms (ARM Cortex-M, RISC-V, Embedded Linux) are supported via no_std feature but are experimental and not extensively tested in production environments.

Note: Development should be done in a standard (std) host environment. Cross-compilation for embedded targets is supported.

ARM Cortex-M

Basic Build

rustup target add thumbv7em-none-eabihf
cargo build --target thumbv7em-none-eabihf --no-default-features

With OpenBLAS (Optional)

For better performance, you can cross-compile OpenBLAS for ARM on your host machine:

Build OpenBLAS for ARM on host (e.g., macOS/Linux):

# Install ARM cross-compiler
# macOS: brew install arm-none-eabi-gcc
# Linux: sudo apt install gcc-arm-none-eabi

# Clone and build OpenBLAS
git clone https://github.com/xianyi/OpenBLAS.git
cd OpenBLAS
make CC=arm-none-eabi-gcc TARGET=ARMV7 NO_SHARED=1 NO_LAPACK=1
make install PREFIX=/opt/arm-cortex-m-openblas

Build Hodu with the cross-compiled OpenBLAS:

# The OpenBLAS binaries are on host filesystem but built for ARM
export OPENBLAS_DIR=/opt/arm-cortex-m-openblas
cargo build --target thumbv7em-none-eabihf --no-default-features

Note: The build script runs on the host machine and accesses OpenBLAS from the host filesystem, even though the resulting binaries are for the target ARM architecture.

Environment Variables

OPENBLAS_DIR, OPENBLAS_INCLUDE_DIR, OPENBLAS_LIB_DIR - OpenBLAS paths for cross-compilation
HODU_DISABLE_BLAS - Force disable OpenBLAS
HODU_DISABLE_NATIVE - Disable native CPU optimizations
HODU_DISABLE_SIMD - Disable SIMD auto-detection

Docs

CHANGELOG - Project changelog and version history

TODOS - Planned features and improvements

CONTRIBUTING - Contribution guide

Guide

Tensor Creation Guide (Korean) - 텐서 생성 가이드
Tensor Creation Guide (English) - Tensor creation guide
Tensor Data Type Guide - Tensor data type guide
Tensor Operations Guide - Tensor operations guide (only English)
Neural Network Modules Guide (Korean) - 신경망 모듈 가이드
Neural Network Modules Guide (English) - Neural network modules guide
Tensor Utils Guide (Korean) - 텐서 유틸리티 가이드 (DataLoader, Dataset, Sampler)
Tensor Utils Guide (English) - Tensor utilities guide (DataLoader, Dataset, Sampler)
Builder/Script Guide (Korean) - Builder/Script 가이드
Builder/Script Guide (English) - Builder/Script guide
Gradient Tape Management Guide (Korean) - 그래디언트 테이프 관리 가이드
Gradient Tape Management Guide (English) - Gradient tape management guide

Related Projects

Here are some other Rust ML frameworks you might find interesting:

maidenx - The predecessor project to Hodu
cetana - An advanced machine learning library empowering developers to build intelligent applications with ease.

Inspired by

Hodu draws inspiration from the following amazing projects:

maidenx - The predecessor project to Hodu
candle - Minimalist ML framework for Rust
GoMlx - An Accelerated Machine Learning Framework For Go

Credits

Hodu Character Design: Created by Eira

hodu 0.2.1