aha 0.2.5

aha model inference library, now supports Qwen(2.5VL/3/3VL/3.5/ASR/3Embedding/3Reranker), MiniCPM4, VoxCPM/1.5, DeepSeek-OCR/2, Hunyuan-OCR, PaddleOCR-VL/1.5, RMBG2.0, GLM(ASR-Nano-2512/OCR), Fun-ASR-Nano-2512, LFM(2/2.5/2VL/2.5VL)
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
# CLI Reference

Complete command-line interface reference for aha.

AHA is a high-performance model inference library based on the Candle framework, supporting various multimodal models including vision, language, and audio models.

```bash
aha [COMMAND] [OPTIONS]
```

## Global Options

| Option | Description | Default |
|--------|-------------|---------|
| `-a, --address <ADDRESS>` | Service listen address | 127.0.0.1 |
| `-p, --port <PORT>` | Service listen port | 10100 |
| `-m, --model <MODEL>` | Model type (required) | - |
| `--weight-path <WEIGHT_PATH>` | Local model weight path | - |
| `--save-dir <SAVE_DIR>` | Model download save directory | ~/.aha/ |
| `--download-retries <DOWNLOAD_RETRIES>` | Download retry count | 3 |
| `--gguf-path <GGUF_PATH>` | Local GGUF weight(required when using GGUF models) | - |
| `--mmproj-path <MMPROJ_PATH>` | Local mmproj GGUF weight | - |
| `--onnx-path <ONNX_PATH>` | Local ONNX weight(required when using ONNX models) | - |
| `--config-path <ONNX_PATH>` | extra config path for gguf/onnx | - |
| `-h, --help` | Display help information | - |
| `-V, --version` | Display version number | - |

## Commands

### cli - Download model and start service 

Download the specified model and start an HTTP service. 
Download only supports models in safetensors format; for GGUF/ONNX models, you must specify a local file path.

**Syntax:**
```bash
aha cli [OPTIONS] --model <MODEL>
```

**Options:**

| Option | Description | Default |
|--------|-------------|---------|
| `-a, --address <ADDRESS>` | Service listen address | 127.0.0.1 |
| `-p, --port <PORT>` | Service listen port | 10100 |
| `-m, --model <MODEL>` | Model type (required) | - |
| `--weight-path <WEIGHT_PATH>` | Local model weight path (skip download if specified) | - |
| `--save-dir <SAVE_DIR>` | Model download save directory | ~/.aha/ |
| `--download-retries <DOWNLOAD_RETRIES>` | Download retry count | 3 |
| `--gguf-path <GGUF_PATH>` | Local GGUF weight(required when using GGUF models) | - |
| `--mmproj-path <MMPROJ_PATH>` | Local mmproj GGUF weight | - |
| `--onnx-path <ONNX_PATH>` | Local ONNX weight(required when using ONNX models) | - |
| `--config-path <ONNX_PATH>` | extra config path for gguf/onnx | - |

**Examples:**

```bash
# Download model and start service (default port 10100)
aha cli -m Qwen/Qwen3-VL-2B-Instruct

# Specify port and save directory
aha cli -m Qwen/Qwen3-VL-2B-Instruct -p 8080 --save-dir /data/models

# Use local model (skip download)
aha cli -m Qwen/Qwen3-VL-2B-Instruct --weight-path /path/to/model

# use gguf-path and mmproj-path
aha cli -m qwen3.5-gguf --gguf-path /path/to/xxx.gguf --mmproj-path /path/to/mmproj-xxx.gguf

```

### run - Direct model inference

Run model inference directly without starting an HTTP service. Suitable for one-time inference tasks or batch processing.

**Syntax:**
```bash
aha run [OPTIONS] --model <MODEL> --input <INPUT> [--input <INPUT2>] [--weight-path <WEIGHT_PATH>] [--gguf-path <GGUF_PATH>] [--mmproj-path <MMPROJ_PATH>] [--onnx-path <ONNX_PATH>] [--config-path <CONFIG_PATH>]
```

**Options:**

| Option | Description | Default |
|--------|-------------|---------|
| `-m, --model <MODEL>` | Model type (required) | - |
| `-i, --input <INPUT>` | Input text or file path (model-specific interpretation, supports 1-2 parameters: input1: prompt text, input2: file path) | - |
| `-o, --output <OUTPUT>` | Output file path (optional, auto-generated if not specified) | - |
| `--weight-path <WEIGHT_PATH>` | Local model weight path (required when using safetensors models) | - |
| `--gguf-path <GGUF_PATH>` | Local GGUF model weight path(required when using GGUF models) | - |
| `--mmproj-path <MMPROJ_PATH>` | Local mmproj GGUF weight path(optional,If not specified, the module will not be loaded) | - |
| `--onnx-path <ONNX_PATH>` | Local ONNX weight(required when using ONNX models) | - |
| `--config-path <ONNX_PATH>` | extra config path for gguf/onnx | - |

**Examples:**

```bash
# VoxCPM1.5 text-to-speech (single input)
aha run -m OpenBMB/VoxCPM1.5 -i "太阳当空照" -o output.wav --weight-path /path/to/model

# VoxCPM1.5 read input from file (single input)
aha run -m OpenBMB/VoxCPM1.5 -i "file://./input.txt" --weight-path /path/to/model

# MiniCPM4 text generation (single input)
aha run -m OpenBMB/MiniCPM4-0.5B -i "你好" --weight-path /path/to/model

# DeepSeek OCR image recognition (single input)
aha run -m deepseek-ai/DeepSeek-OCR -i "image.jpg" --weight-path /path/to/model

# RMBG2.0 background removal (single input)
aha run -m AI-ModelScope/RMBG-2.0 -i "photo.png" -o "no_bg.png" --weight-path /path/to/model

# GLM-ASR speech recognition (two inputs: prompt text + audio file)
aha run -m ZhipuAI/GLM-ASR-Nano-2512 -i "请转写这段音频" -i "audio.wav" --weight-path /path/to/model

# Fun-ASR speech recognition (two inputs: prompt text + audio file)
aha run -m FunAudioLLM/Fun-ASR-Nano-2512 -i "语音转写:" -i "audio.wav" --weight-path /path/to/model

# qwen3 text generation (single input)
aha run -m Qwen/Qwen3-0.6B -i "你好" --weight-path /path/to/model

# qwen3 GGUF text generation (single input)
aha run -m qwen3-0.6b -i "hello" --artifact-format gguf --gguf-path /path/to/Qwen3-0.6B-Q8_0.gguf

# qwen2.5vl image understanding (two inputs: prompt text + image file)
aha run -m Qwen/Qwen2.5-VL-3B-Instruct -i "请分析图片并提取所有可见文本内容,按从左到右、从上到下的布局,返回纯文本" -i "image.jpg" --weight-path /path/to/model

# Qwen3-ASR speech recognition (single input: audio file)
aha run -m Qwen/Qwen3-ASR-0.6B -i "audio.wav" --weight-path /path/to/model

# Qwen3.5-GGUF without mmproj (single input: prompt text)
aha run -m qwen3.5-gguf -i 你如何看待AI --gguf-path /path/to/xxx.gguf

# Qwen3.5-GGUF with mmproj (two inputs:prompt text + file)
aha run -m qwen3.5-gguf -i 提取图片中的文本 -i https://ai.bdstatic.com/file/C56CC9B274CF460CA33
63E59ECD94423 --gguf-path /path/to/xxx.gguf --mmproj-path /path/to/mmproj-xxx.gguf


```

### serv - Start service

Start HTTP service with a model. 
Safetensors model: The `--weight-path` is optional - if not specified, it defaults to `~/.aha/{model_id}`.
GGUF/ONNX model: The `--gguf-path`/ `--onnx-path` must be specified

**Syntax:**
```bash
aha serv [OPTIONS] --model <MODEL> [--weight-path <WEIGHT_PATH>] [--gguf-path <GGUF_PATH>] [--mmproj-path <MMPROJ_PATH>] [--onnx-path <ONNX_PATH>] [--config-path <CONFIG_PATH>]
```

**Options:**

| Option | Description | Default |
|--------|-------------|---------|
| `-a, --address <ADDRESS>` | Service listen address | 127.0.0.1 |
| `-p, --port <PORT>` | Service listen port | 10100 |
| `-m, --model <MODEL>` | Model type (required) | - |
| `--weight-path <WEIGHT_PATH>` | Local model weight path (optional) | ~/.aha/{model_id} |
| `--allow-remote-shutdown` | Allow remote shutdown requests (not recommended) | false |
| `--gguf-path <GGUF_PATH>` | Local GGUF model weight path(required when using GGUF models) | - |
| `--mmproj-path <MMPROJ_PATH>` | Local mmproj GGUF weight path(optional,If not specified, the module will not be loaded) | - |
| `--onnx-path <ONNX_PATH>` | Local ONNX model directory/file path(required when using ONNX models) | - |
| `--config-path <ONNX_PATH>` | extra config path for gguf/onnx | - |

**Examples:**

```bash
# Start service with default model path (~/.aha/{model_id})
aha serv -m Qwen/Qwen3-VL-2B-Instruct

# Start service with local model
aha serv -m Qwen/Qwen3-VL-2B-Instruct --weight-path /path/to/model

# Start with specified port
aha serv -m Qwen/Qwen3-VL-2B-Instruct -p 8080

# Specify listen address
aha serv -m Qwen/Qwen3-VL-2B-Instruct -a 0.0.0.0

# Enable remote shutdown (not recommended for production)
aha serv -m Qwen/Qwen3-VL-2B-Instruct --allow-remote-shutdown
```

### ps - List running services

List all currently running AHA services with their process IDs, ports, and status.

**Syntax:**
```bash
aha ps [OPTIONS]
```

**Options:**

| Option | Description | Default |
|--------|-------------|---------|
| `-c, --compact` | Compact output format (show service IDs only) | false |

**Examples:**

```bash
# List all running services (table format)
aha ps

# Compact output (service IDs only)
aha ps -c
```

**Output Format:**

```
Service ID           PID        Model                Port       Address         Status
-------------------------------------------------------------------------------------
56860@10100          56860      N/A                  10100      127.0.0.1       Running
```

**Fields:**
- `Service ID`: Unique identifier in format `pid@port`
- `PID`: Process ID
- `Model`: Model name (N/A if not detected)
- `Port`: Service port number
- `Address`: Service listen address
- `Status`: Service status (Running, Stopping, Unknown)

### download - Download model

Download the specified model only, without starting the service.

**Syntax:**
```bash
aha download [OPTIONS] --model <MODEL>
```

**Options:**

| Option | Description | Default |
|--------|-------------|---------|
| `-m, --model <MODEL>` | Model type (required) | - |
| `-s, --save-dir <SAVE_DIR>` | Model download save directory | ~/.aha/ |
| `--download-retries <DOWNLOAD_RETRIES>` | Download retry count | 3 |

**Examples:**

```bash
# Download model to default directory
aha download -m Qwen/Qwen3-VL-2B-Instruct

# Specify save directory
aha download -m Qwen/Qwen3-VL-2B-Instruct -s /data/models

# Specify download retry count
aha download -m Qwen/Qwen3-VL-2B-Instruct --download-retries 5

# Download MiniCPM4-0.5B model
aha download -m OpenBMB/MiniCPM4-0.5B -s models
```

### delete - Delete downloaded model

Delete a downloaded model from the default location (`~/.aha/{model_id}`).

**Syntax:**
```bash
aha delete [OPTIONS] --model <MODEL>
```

**Options:**

| Option | Description | Default |
|--------|-------------|---------|
| `-m, --model <MODEL>` | Model type (required) | - |

**Examples:**

```bash
# Delete RMBG2.0 model from default location
aha delete -m AI-ModelScope/RMBG-2.0

# Delete Qwen3-VL-2B model
aha delete --model Qwen/Qwen3-VL-2B-Instruct
```

**Behavior:**
- Displays model information (ID, location, size) before deletion
- Requires confirmation (y/N) before proceeding
- Shows "Model not found" message if the model directory doesn't exist
- Shows "Model deleted successfully" message after completion

### list - List all supported models

List all supported models with their ModelScope IDs.

**Syntax:**
```bash
aha list [OPTIONS]
```

**Options:**

| Option | Description | Default |
|--------|-------------|---------|
| `-j, --json` | Output in JSON format (includes name, model_id, and type fields) | false |

**Examples:**

```bash
# List models in table format (default)
aha list

# List models in JSON format
aha list --json

# Short form
aha list -j
```

**JSON Output Format:**

When using `--json`, the output includes:
- `name`: Model identifier used with `-m` flag
- `model_id`: Full ModelScope model ID
- `type`: Model category (`llm`, `ocr`, `asr`, or `image`)

Example:
```json
[
  {
    "name": "Qwen/Qwen3-VL-2B-Instruct",
    "model_id": "Qwen/Qwen3-VL-2B-Instruct",
    "type": "llm"
  },
  {
    "name": "deepseek-ai/DeepSeek-OCR",
    "model_id": "deepseek-ai/DeepSeek-OCR",
    "type": "ocr"
  }
]
```

**Model Types:**
- `llm`: Language models (text generation, chat, etc.)
- `ocr`: Optical Character Recognition models
- `asr`: Automatic Speech Recognition models
- `image`: Image processing models
- `tts`: Text to speech

## Common Use Cases

### Scenario 1: Quick start inference service

```bash
# One command to download and start service
aha cli -m Qwen/Qwen3-VL-2B-Instruct
```

### Scenario 2: Start service with existing model

```bash
# Assuming model is downloaded to /data/models/Qwen/Qwen3-VL-2B-Instruct
aha serv -m Qwen/Qwen3-VL-2B-Instruct --weight-path /data/models/Qwen/Qwen3-VL-2B-Instruct
```

### Scenario 3: Pre-download model

```bash
# Download model to specified directory for later use
aha download -m Qwen/Qwen3-VL-2B-Instruct -s /data/models

# Later start with local model
aha serv -m Qwen/Qwen3-VL-2B-Instruct --weight-path /data/models/Qwen/Qwen3-VL-2B-Instruct
```

### Scenario 4: Custom service port and address

```bash
# Start service on 0.0.0.0:8080, allow external access
aha cli -m Qwen/Qwen3-VL-2B-Instruct -a 0.0.0.0 -p 8080
```

## API Endpoints

After the service starts, the following API endpoints are available:

### Chat Completion Endpoint
- **Endpoint**: `POST /chat/completions`
- **Function**: Multimodal chat and text generation
- **Supported Models**: Qwen2.5VL, Qwen3, Qwen3VL, DeepSeekOCR, GLM-ASR-Nano-2512, Fun-ASR-Nano-2512, etc.
- **Format**: OpenAI Chat Completion format
- **Streaming Support**: Yes

### Image Processing Endpoint
- **Endpoint**: `POST /images/remove_background`
- **Function**: Image background removal
- **Supported Models**: RMBG-2.0
- **Format**: OpenAI Chat Completion format
- **Streaming Support**: No

### Audio Generation Endpoint
- **Endpoint**: `POST /audio/speech`
- **Function**: Speech synthesis and generation
- **Supported Models**: VoxCPM, VoxCPM1.5
- **Format**: OpenAI Chat Completion format
- **Streaming Support**: No

### Embeddings Endpoint
- **Endpoint**: `POST /embeddings` or `POST /v1/embeddings`
- **Function**: Text embedding generation
- **Supported Models**: Qwen3-Embedding family
- **Format**: OpenAI embeddings format
- **Streaming Support**: No

### Rerank Endpoint
- **Endpoint**: `POST /rerank` or `POST /v1/rerank`
- **Function**: Query-document reranking
- **Supported Models**: Qwen3-Reranker family
- **Format**: Rerank JSON response (`results[index,relevance_score,document]`)
- **Streaming Support**: No

### Shutdown Endpoint
- **Endpoint**: `POST /shutdown`
- **Function**: Gracefully shut down the server
- **Security**: Localhost only by default, use `--allow-remote-shutdown` flag to enable remote access (not recommended)
- **Format**: JSON response


## Getting Help

```bash
# View main help
aha --help

# View subcommand help
aha cli --help
aha serv --help
aha download --help

# View version information
aha --version
```

## See Also

- [Getting Started]./getting-started.md - Quick start guide
- [API Documentation]./api.md - REST API reference
- [Supported Models]./supported-tools.md - Available models