aha 0.2.5

aha model inference library, now supports Qwen(2.5VL/3/3VL/3.5/ASR/3Embedding/3Reranker), MiniCPM4, VoxCPM/1.5, DeepSeek-OCR/2, Hunyuan-OCR, PaddleOCR-VL/1.5, RMBG2.0, GLM(ASR-Nano-2512/OCR), Fun-ASR-Nano-2512, LFM(2/2.5/2VL/2.5VL)
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
# Installation Guide

This guide covers installing and setting up AHA on your system.

## Table of Contents

- [Prerequisites]#prerequisites
- [Installation Methods]#installation-methods
- [Platform-Specific Instructions]#platform-specific-instructions
- [Feature Flags]#feature-flags
- [Verification]#verification
- [Troubleshooting]#troubleshooting

## Prerequisites

### Required

- **Rust toolchain**: Rust 1.85 or later (edition 2024)
  ```bash
  curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
  ```

- **Git**: For cloning the repository
  ```bash
  # Ubuntu/Debian
  sudo apt-get install git
  
  # macOS
  brew install git
  
  # Windows
  # Download from https://git-scm.com/download/win
  ```

### Optional (for FFmpeg feature)

- **FFmpeg development libraries**: Required for audio/video processing

## Installation Methods

### Method 1: Build from Source

Clone the repository and build:

```bash
git clone https://github.com/jhqxxx/aha.git
cd aha

# Build release version
cargo build --release

# The binary will be at target/release/aha
```

### Method 2: Install from Crates.io (when available)

```bash
cargo install aha
```

### Method 3: Install with Features

Build with specific features enabled:

```bash
# With CUDA support (NVIDIA GPUs)
cargo build --release --features cuda

# With Metal support (Apple Silicon)
cargo build --release --features metal

# With Flash Attention
cargo build --release --features cuda,flash-attn

# With FFmpeg support
cargo build --release --features ffmpeg
```

## Platform-Specific Instructions

### Linux

#### Ubuntu/Debian

```bash
# Install build dependencies
sudo apt-get update
sudo apt-get install -y build-essential pkg-config git clang cmake

# For FFmpeg feature
sudo apt-get install -y ffmpeg libavutil-dev libavcodec-dev \
    libavformat-dev libavfilter-dev libavdevice-dev \
    libswresample-dev libswscale-dev

# For CUDA support, install CUDA toolkit
# See https://developer.nvidia.com/cuda-downloads
```

#### Fedora/RHEL

```bash
# Install build dependencies
sudo dnf install gcc gcc-c++ make git clang pkg-config

# For FFmpeg feature
sudo dnf install ffmpeg-devel

# For CUDA support
sudo dnf install cuda-devel
```

### macOS

#### Apple Silicon (M1/M2/M3/M4)

```bash
# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Install command line tools
xcode-select --install

# For FFmpeg feature
brew install ffmpeg

# Build with Metal support for GPU acceleration
cargo build --release --features metal
```

#### Intel Mac

```bash
# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Install command line tools
xcode-select --install

# For FFmpeg feature
brew install ffmpeg

# For CUDA support (if you have NVIDIA GPU)
# Install CUDA from https://developer.nvidia.com/cuda-downloads
cargo build --release --features cuda
```

### Windows

#### Using MSVC

```bash
# Install Rust from https://rustup.rs/
# Install Visual Studio Build Tools from https://visualstudio.microsoft.com/downloads/

# For FFmpeg feature
# Download FFmpeg from https://ffmpeg.org/download.html
# Set FFMPEG_DIR environment variable to your FFmpeg installation

# Build
cargo build --release
```

#### Using WSL2 (Recommended)

```bash
# Follow Linux instructions inside WSL2
wsl
sudo apt-get update
sudo apt-get install -y build-essential pkg-config git clang cmake
```

## Feature Flags

aha supports several optional features:

### cuda

Enables CUDA support for NVIDIA GPU acceleration.

```bash
cargo build --release --features cuda
```

**Requirements**:
- NVIDIA GPU
- CUDA Toolkit 11.0 or later
- cuDNN library

**Benefits**:
- 10-50x faster inference
- Support for larger models
- Lower CPU usage

### metal

Enables Metal support for Apple Silicon GPU acceleration.

```bash
cargo build --release --features metal
```

**Requirements**:
- Apple Silicon (M1/M2/M3/M4)
- macOS 11.0 or later

**Benefits**:
- 5-20x faster inference
- Lower power consumption
- Support for larger models

### flash-attn

Enables Flash Attention for optimized long-sequence processing.

```bash
cargo build --release --features cuda,flash-attn
```

**Requirements**:
- CUDA feature enabled
- Supported GPU architecture (compute capability 7.0+)

**Benefits**:
- Reduced memory usage
- Faster inference for long sequences
- Especially beneficial for vision models

**Note**: Must be used with `cuda` feature.

### ffmpeg

Enables FFmpeg support for audio/video processing.

```bash
cargo build --release --features ffmpeg
```

**Requirements**:
- FFmpeg development libraries
- Platform-specific (see above)

**Benefits**:
- Extended audio format support (MP3, AAC, etc.)
- Video processing capabilities
- Better audio resampling

### Combining Features

You can combine multiple features:

```bash
# Maximum performance on NVIDIA GPU
cargo build --release --features cuda,flash-attn

# Apple Silicon with audio support
cargo build --release --features metal,ffmpeg

# Everything enabled
cargo build --release --features cuda,flash-attn,ffmpeg
```

## Verification

After installation, verify that AHA is working:

```bash
# Check version
./target/release/aha --version

# List supported models
./target/release/aha list

# (Or if installed to PATH)
aha --version
aha list
```

Expected output for `aha list`:

```shell
#Supported models:
Available models:

Model ID                                 Owner                type       Download  
--------------------------------------------------------------------------------
LiquidAI/LFM2-1.2B                       LiquidAI             llm          ✔       
LiquidAI/LFM2.5-1.2B-Instruct            LiquidAI             llm          ✔       
LiquidAI/LFM2.5-VL-1.6B                  LiquidAI             vlm          ✔       
LiquidAI/LFM2-VL-1.6B                    LiquidAI             vlm          ✔       
OpenBMB/MiniCPM4-0.5B                    OpenBMB              llm          ✔       
Qwen/Qwen2.5-VL-3B-Instruct              Qwen                 vlm          ✔       
Qwen/Qwen2.5-VL-7B-Instruct              Qwen                 vlm                  
Qwen/Qwen3-0.6B                          Qwen                 llm          ✔  
Qwen/Qwen3-1.7B                          Qwen                 llm               
Qwen/Qwen3-4B                            Qwen                 llm            
Qwen/Qwen3.5-0.8B                        Qwen                 vlm          ✔       
Qwen/Qwen3.5-2B                          Qwen                 vlm                  
Qwen/Qwen3.5-4B                          Qwen                 vlm                  
Qwen/Qwen3.5-9B                          Qwen                 vlm                  
qwen3.5-gguf                             none                 vlm                  
Qwen/Qwen3-ASR-0.6B                      Qwen                 asr          ✔       
Qwen/Qwen3-ASR-1.7B                      Qwen                 asr                  
Qwen/Qwen3-VL-2B-Instruct                Qwen                 vlm          ✔       
Qwen/Qwen3-VL-4B-Instruct                Qwen                 vlm                  
Qwen/Qwen3-VL-8B-Instruct                Qwen                 vlm                  
Qwen/Qwen3-VL-32B-Instruct               Qwen                 vlm                  
deepseek-ai/DeepSeek-OCR                 deepseek-ai          ocr          ✔       
deepseek-ai/DeepSeek-OCR-2               deepseek-ai          ocr                  
Tencent-Hunyuan/HunyuanOCR               Tencent-Hunyuan      ocr          ✔       
PaddlePaddle/PaddleOCR-VL                PaddlePaddle         ocr          ✔       
PaddlePaddle/PaddleOCR-VL-1.5            PaddlePaddle         ocr                  
AI-ModelScope/RMBG-2.0                   AI-ModelScope        image        ✔       
OpenBMB/VoxCPM-0.5B                      OpenBMB              tts          ✔       
OpenBMB/VoxCPM1.5                        OpenBMB              tts          ✔       
ZhipuAI/GLM-ASR-Nano-2512                ZhipuAI              asr          ✔       
FunAudioLLM/Fun-ASR-Nano-2512            FunAudioLLM          asr          ✔       
ZhipuAI/GLM-OCR                          ZhipuAI              ocr          ✔

```

## Troubleshooting

### Build Errors

#### "error: linking with cc failed"

This usually indicates missing system dependencies.

**Solution**: Install required build tools for your platform (see Platform-Specific Instructions).

#### "error: CUDA not found"

CUDA feature is enabled but CUDA toolkit is not installed.

**Solution**: 
- Install CUDA toolkit from https://developer.nvidia.com/cuda-downloads
- Or build without CUDA: `cargo build --release`

#### "error: Metal not available"

Metal feature is enabled but not on supported hardware.

**Solution**:
- Ensure you're on Apple Silicon
- Or build without Metal: `cargo build --release`

### Runtime Errors

#### "error while loading shared libraries"

Missing runtime libraries.

**Solution**: Install required libraries (see Platform-Specific Instructions).

#### "Out of memory"

Model is too large for available RAM/VRAM.

**Solution**:
- Use a smaller model
- Close other applications
- Enable GPU acceleration for better memory efficiency

#### "Model download failed"

Network issue or insufficient disk space.

**Solution**:
- Check internet connection
- Ensure sufficient disk space in `~/.aha/`
- Try again: download will resume if interrupted

### Performance Issues

#### Slow inference

**Solutions**:
1. Enable GPU acceleration: `--features cuda` or `--features metal`
2. Enable Flash Attention: `--features "cuda,flash-attn"`
3. Use a smaller model
4. Check if GPU is being used (should see GPU usage in monitoring tools)

#### High CPU usage

**Solutions**:
1. Enable GPU acceleration
2. Reduce batch size
3. Use model with lower precision

## System Requirements
*Different models require different hardware and software, for reference.*

### Minimum Requirements

- **CPU**: x86_64 or ARM64
- **RAM**: 8 GB (16 GB recommended)
- **Disk**: 10 GB for models (varies by model)
- **OS**: Linux, macOS, or Windows

### Recommended Requirements

- **CPU**: Modern multi-core processor
- **RAM**: 32 GB or more
- **GPU**: NVIDIA GPU (with CUDA) or Apple Silicon
- **Disk**: SSD with 50+ GB free space
- **OS**: Linux (Ubuntu 22.04+) or macOS (Monterey+)

## Model Sizes

Approximate download sizes for popular models:

| Model | Size | RAM Usage |
|-------|------|-----------|
| Qwen/Qwen3-0.6B | ~1.2 GB | ~2 GB |
| Qwen/Qwen3-VL-2B-Instruct | ~4 GB | ~6 GB |
| Qwen/Qwen3-VL-8B-Instruct | ~16 GB | ~20 GB |
| Qwen/Qwen3-VL-32B-Instruct | ~64 GB | ~70 GB |

## Next Steps

After successful installation:

1. Read the [Getting Started Guide]./getting-started.md
2. Download your first model: `aha download -m Qwen/Qwen3-0.6B`
3. Start the service: `aha cli -m Qwen/Qwen3-0.6B`
4. Explore the [API Reference]./api.md

## See Also

- [Getting Started]./getting-started.md - Quick start guide
- [CLI Reference]./cli.md - Command-line usage
- [API Reference]./api.md - REST API documentation
- [Development]./development.md - Contributing guide