aha
Lightweight AI Inference Engine — All-in-one Solution for Text, Vision, Speech, and OCR
aha is a high-performance, cross-platform AI inference engine built with Rust and the Candle framework. It brings state-of-the-art AI models to your local machine—no API keys, no cloud dependencies, just pure, fast AI running directly on your hardware.
Supported Models
| Category | Models |
|---|---|
| Text | Qwen3, MiniCPM4, MiniCPM5, LFM2, LFM2.5 |
| Vision | Qwen2.5-VL, Qwen3-VL, Qwen3.5, LFM2.5-VL, LFM2-VL |
| OCR | DeepSeek-OCR, DeepSeek-OCR-2 , PaddleOCR-VL PaddleOCR-VL1.5, Hunyuan-OCR, GLM-OCR |
| ASR | GLM-ASR-Nano, Fun-ASR-Nano, Qwen3-ASR |
| TTS | VoxCPM, VoxCPM1.5, VoxCPM2, Moss-TTS-Nano |
| Image | RMBG-2.0 (background removal) |
| Embedding | Qwen3-Embedding, all-MiniLM-L6-v2 |
| Reranker | Qwen3-Reranker |
Changelog
2026-05-29
- generate code refactored
2026-05-28
- generate code refactoring progress 1/3
2026-05-27
- add MiniCPM5
2026-05-24
- update doc
2026-05-11
- add Moss-TTS-Nano,its performance is worse than the original Python version
2026-05-09
- merge pr/eastgold15/46, add aha-ui
Why aha?
- 🚀 High-Performance Inference - Powered by Candle framework for efficient tensor computation and model inference
- 🔧 Unified Interface — One tool for text, vision, speech, and OCR
- 📦 Local-First — All processing runs locally, no data leaves your machine
- 🎯 Cross-Platform — Works on Linux, macOS, and Windows
- ⚡ GPU Accelerated — Optional CUDA support for faster inference
- 🛡️ Memory Safe — Built with Rust for reliability
- 🧠 Attention Optimization - Optional Flash Attention support for optimized long sequence processing
Quick Start
Installation
Optional Features:
# CUDA (NVIDIA GPU acceleration)
# Metal (Apple GPU acceleration for macOS)
# Flash Attention (faster inference)
# FFmpeg (multimedia processing)
CLI Quick Reference
# List all supported models
# Download model only
# Download model and start service
# Run inference directly (without starting service)
# Run local all-MiniLM-L6-v2 embedding (native safetensors)
# Start service only (model already downloaded)
Chat
Then use the unified (OpenAI-compatible) API:
aha-ui
use npm
install npm
refer to https://nodejs.org/en/download
# Download and install nvm:
|
# in lieu of restarting the shell
\.
# Download and install Node.js:
# Verify the Node.js version:
# Verify npm version:
npm run aha-ui
# Make sure in the aha-ui directory
# and make sure that aha has been compiled
npm build & install & run
# target in
# -- aha-ui/src-tauri/target/release/bundle/deb/aha-ui_0.1.0_amd64.deb
# -- aha-ui/src-tauri/target/release/bundle/rpm/aha-ui-0.1.0-1.x86_64.rpm
# -- aha-ui/src-tauri/target/release/bundle/appimage/aha-ui_0.1.0_amd64.AppImage
use pnpm
install pnpm
|
pnpm run aha-ui
# Make sure in the aha-ui directory
# and make sure that aha has been compiled
pnpm build & install & run
# target in
# -- aha-ui/src-tauri/target/release/bundle/deb/aha-ui_0.1.0_amd64.deb
# -- aha-ui/src-tauri/target/release/bundle/rpm/aha-ui-0.1.0-1.x86_64.rpm
# -- aha-ui/src-tauri/target/release/bundle/appimage/aha-ui_0.1.0_amd64.AppImage
Documentation
| Document | Description |
|---|---|
| Getting Started | First steps with aha |
| Installation | Detailed installation guide |
| CLI Reference | Command-line interface |
| API Documentation | Library & REST API |
| Supported Models | Available AI models |
| Concepts | Architecture & design |
| Development | Contributing guide |
| Changelog | Version history |
Development
Using aha as a Library
cargo add aha
// VoxCPM example
use VoxCPMGenerate;
use save_wav_mono;
use Result;
Extending New Models
- Create new model file in src/models/
- Export in src/models/mod.rs
- Add support for CLI model inference in src/exec/
- Add tests and examples in tests/
Features
- High-performance inference via Candle framework
- Multi-modal model support (vision, language, speech)
- Clean, easy-to-use API design
- Minimal dependencies, compact binaries
- Flash Attention support for long sequences
- FFmpeg support for multimedia processing
License
Apache-2.0 — See LICENSE for details.
Acknowledgments
- Candle - Excellent Rust ML framework
- All model authors and contributors
Wechat & Donate
| Wechat Group | Donate |
|---|---|
![]() |
![]() |

