aha

Lightweight AI Inference Engine — All-in-one Solution for Text, Vision, Speech, and OCR

aha is a high-performance, cross-platform AI inference engine built with Rust and the Candle framework. It brings state-of-the-art AI models to your local machine—no API keys, no cloud dependencies, just pure, fast AI running directly on your hardware.

Supported Models

Category	Models
Text	Qwen3, MiniCPM4, MiniCPM5, LFM2, LFM2.5
Vision	Qwen2.5-VL, Qwen3-VL, Qwen3.5, LFM2.5-VL, LFM2-VL
OCR	DeepSeek-OCR, DeepSeek-OCR-2 , PaddleOCR-VL PaddleOCR-VL1.5, Hunyuan-OCR, GLM-OCR
ASR	GLM-ASR-Nano, Fun-ASR-Nano, Qwen3-ASR
TTS	VoxCPM, VoxCPM1.5, VoxCPM2, Moss-TTS-Nano
Image	RMBG-2.0 (background removal)
Embedding	Qwen3-Embedding, all-MiniLM-L6-v2
Reranker	Qwen3-Reranker

Changelog

2026-05-29

generate code refactored

2026-05-28

generate code refactoring progress 1/3

2026-05-27

add MiniCPM5

2026-05-24

update doc

2026-05-11

add Moss-TTS-Nano，its performance is worse than the original Python version

2026-05-09

merge pr/eastgold15/46, add aha-ui

View full changelog →

Why aha?

🚀 High-Performance Inference - Powered by Candle framework for efficient tensor computation and model inference
🔧 Unified Interface — One tool for text, vision, speech, and OCR
📦 Local-First — All processing runs locally, no data leaves your machine
🎯 Cross-Platform — Works on Linux, macOS, and Windows
⚡ GPU Accelerated — Optional CUDA support for faster inference
🛡️ Memory Safe — Built with Rust for reliability
🧠 Attention Optimization - Optional Flash Attention support for optimized long sequence processing

Quick Start

Installation

git clone https://github.com/jhqxxx/aha.git
cd aha
cargo build --release

Optional Features:

# CUDA (NVIDIA GPU acceleration)
cargo build --release --features cuda

# Metal (Apple GPU acceleration for macOS)
cargo build --release --features metal

# Flash Attention (faster inference)
cargo build --release --features cuda,flash-attn

# FFmpeg (multimedia processing)
cargo build --release --features ffmpeg

CLI Quick Reference


# List all supported models
aha list

# Download model only
aha download -m Qwen/Qwen3-ASR-0.6B

# Download model and start service
aha cli -m Qwen/Qwen3-ASR-0.6B

# Run inference directly (without starting service)
aha run -m Qwen/Qwen3-ASR-0.6B -i "audio.wav"

# Run local all-MiniLM-L6-v2 embedding (native safetensors)
aha run -m all-minilm-l6-v2 -i "Rust embedding test" --weight-path D:\model_download\all-MiniLM-L6-v2

# Start service only (model already downloaded)
aha serv -m Qwen/Qwen3-ASR-0.6B -p 10100

Chat

aha serv -m Qwen/Qwen3-0.6B -p 10100

Then use the unified (OpenAI-compatible) API:

curl http://localhost:10100/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-0.6B",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": false
  }
'

aha-ui

cd aha-ui

use npm

install npm

refer to https://nodejs.org/en/download

# Download and install nvm:
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.4/install.sh | bash
# in lieu of restarting the shell
\. "$HOME/.nvm/nvm.sh"
# Download and install Node.js:
nvm install 24
# Verify the Node.js version:
node -v # Should print "v24.15.0".
# Verify npm version:
npm -v # Should print "11.12.1".

npm run aha-ui

# Make sure in the aha-ui directory
# and make sure that aha has been compiled
npm install
npm run tauri dev

npm build & install & run

npm run tauri build
# target in
# -- aha-ui/src-tauri/target/release/bundle/deb/aha-ui_0.1.0_amd64.deb
# -- aha-ui/src-tauri/target/release/bundle/rpm/aha-ui-0.1.0-1.x86_64.rpm
# -- aha-ui/src-tauri/target/release/bundle/appimage/aha-ui_0.1.0_amd64.AppImage

use pnpm

install pnpm

curl -fsSL https://get.pnpm.io/install.sh | sh -

pnpm run aha-ui

# Make sure in the aha-ui directory
# and make sure that aha has been compiled
pnpm run tauri dev

pnpm build & install & run

pnpm run tauri build
# target in
# -- aha-ui/src-tauri/target/release/bundle/deb/aha-ui_0.1.0_amd64.deb
# -- aha-ui/src-tauri/target/release/bundle/rpm/aha-ui-0.1.0-1.x86_64.rpm
# -- aha-ui/src-tauri/target/release/bundle/appimage/aha-ui_0.1.0_amd64.AppImage

Documentation

Document	Description
Getting Started	First steps with aha
Installation	Detailed installation guide
CLI Reference	Command-line interface
API Documentation	Library & REST API
Supported Models	Available AI models
Concepts	Architecture & design
Development	Contributing guide
Changelog	Version history

Development

Using aha as a Library

cargo add aha

// VoxCPM example
use aha::models::voxcpm::generate::VoxCPMGenerate;
use aha::utils::audio_utils::save_wav_mono;
use anyhow::Result;

fn main() -> Result<()> {
    let model_path = "xxx/OpenBMB/VoxCPM2/";

    let mut voxcpm_generate = VoxCPMGenerate::init(model_path, None, None)?;
    let generate = voxcpm_generate.inference(
        "aha是一个基于Rust和Candle框架的本地AI推理引擎，支持多模态模型（文本、视觉、语音、OCR）。".to_string(),
        None,
        None,
        2,
        1000,
        10,
        2.0,
        6.0,
    )?;

    save_wav_mono(&generate, "voxcpm2.wav", voxcpm_generate.sample_rate() as u32)?;
    Ok(())
}

Extending New Models

Create new model file in src/models/
Export in src/models/mod.rs
Add support for CLI model inference in src/exec/
Add tests and examples in tests/

Features

High-performance inference via Candle framework
Multi-modal model support (vision, language, speech)
Clean, easy-to-use API design
Minimal dependencies, compact binaries
Flash Attention support for long sequences
FFmpeg support for multimedia processing

License

Apache-2.0 — See LICENSE for details.

Acknowledgments

Candle - Excellent Rust ML framework
All model authors and contributors

Wechat & Donate

Wechat Group	Donate

aha 0.2.6

aha