aha 0.2.5 - Docs.rs

# API Reference

Complete reference for the AHA REST API.

## Overview

AHA provides an OpenAI-compatible REST API for running AI model inference. All endpoints follow standard HTTP conventions and return JSON responses.

### Base URL

By default, the API server runs on:
```
http://127.0.0.1:10100
```

You can customize this when starting the service:
```bash
aha cli -m Qwen/Qwen3-0.6B -a 0.0.0.0 -p 8080
```

### Authentication

Currently, AHA does not require authentication. All endpoints are publicly accessible on the configured address/port.

**Security Note**: If you expose the API to external networks, consider implementing authentication through a reverse proxy (e.g., nginx, traefik).

### Content Types

All requests should use:
```
Content-Type: application/json
```

### Response Format

Success responses follow this structure:
```json
{
  "data": { ... },
  "model": "model-name",
  "usage": {
    "total_tokens": 30
  }
}
```

Error responses:
```json
{
  "error": {
    "message": "Error description",
    "type": "error_type",
    "code": "error_code"
  }
}
```

## Endpoints

### Health Check

Check the service health status. This endpoint is useful for container orchestration (Kubernetes), load balancers, and monitoring systems.

#### Endpoint
```
GET /health
```

#### Response

**Healthy (HTTP 200):**

```json
{
  "status": "ok"
}
```

**Unhealthy (HTTP 503):**

```json
{
  "status": "unhealthy",
  "error": "model not initialized"
}
```

#### Example

```bash
curl http://127.0.0.1:10100/health
```

### Models

Get information about the currently loaded model (OpenAI API compatible format).

#### Endpoint
```
GET /models
```

#### Response

**Success (HTTP 200):**

```json
{
  "object": "list",
  "data": [
    {
      "id": "Qwen/Qwen3-0.6B",
      "object": "model",
      "created": null,
      "owned_by": "Qwen"
    }
  ]
}
```

**Not Initialized (HTTP 503):**

```json
{
  "error": "model not initialized"
}
```

#### Fields

| Field | Type | Description |
|-------|------|-------------|
| `object` | string | Fixed value: "list" |
| `data` | array | Array of model objects (currently contains one loaded model) |
| `id` | string | Model identifier in kebab-case (e.g., "Qwen/Qwen3-0.6B") |
| `object` | string | Fixed value: "model" |
| `created` | integer\|null | Unix timestamp (currently null) |
| `owned_by` | string | Model owner/organization name |

#### Example

```bash
curl http://127.0.0.1:10100/models
```

### Chat Completions

Generate chat completions or text responses.

#### Endpoint
```
POST /chat/completions
```

#### Request Body

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `model` | string | Yes | Model identifier (e.g., "Qwen/Qwen3-0.6B") |
| `messages` | array | Yes | Array of message objects |
| `temperature` | number | No | Sampling temperature (0-2, default: 1) |
| `top_p` | number | No | Nucleus sampling (0-1, default: 1) |
| `max_tokens` | integer | No | Maximum tokens to generate |
| `stream` | boolean | No | Enable streaming (default: false) |

#### Message Object

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `role` | string | Yes | "system", "user", or "assistant" |
| `content` | string/array | Yes | Message content (string or multimodal array) |

#### Multimodal Content

For vision/audio models, content can be an array:

```json
{
  "role": "user",
  "content": [
    {"type": "text", "text": "Describe this image"},
    {"type": "image", "image_url": {"url": "file:///path/to/image.jpg"}}
  ]
}
```

Supported content types:
- `text` - Text content
- `image_url` - Image file (file://, base64://, https:// or http://)
- `audio_url` - Audio file (file://, base64://, https:// or http://)

#### Examples

**Simple Chat:**

```bash
curl http://127.0.0.1:10100/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-0.6B",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'
```

**With System Message:**

```bash
curl http://127.0.0.1:10100/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-0.6B",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain Rust in one sentence."}
    ],
    "max_tokens": 50,
    "temperature": 0.7
  }'
```

**Vision Understanding:**

```bash
curl http://127.0.0.1:10100/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-VL-2B-Instruct",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What is in this image?"},
          {"type": "image", "image_url": {"url": "file:///path/to/image.jpg"}}
        ]
      }
    ]
  }'
```

**OCR (Text Extraction):**

```bash
curl http://127.0.0.1:10100/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-OCR",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "Extract all text"},
          {"type": "image", "image_url": {"url": "file:///path/to/document.png"}}
        ]
      }
    ]
  }'
```

**ASR (Speech Recognition):**

```bash
curl http://127.0.0.1:10100/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ZhipuAI/GLM-ASR-Nano-2512",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "Transcribe this audio"},
          {"type": "audio", "audio_url": {"url": "file:///path/to/audio.wav"}}
        ]
      }
    ]
  }'
```

> **Note:** For OpenAI-standard audio transcription with `multipart/form-data` file upload,
> see the [Audio Transcriptions](#audio-transcriptions) endpoint.

**Streaming Response:**

```bash
curl http://127.0.0.1:10100/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-0.6B",
    "messages": [
      {"role": "user", "content": "Tell me a story"}
    ],
    "stream": true
  }'
```

Streaming responses are sent as Server-Sent Events (SSE):
```
data: {"id": "1", "choices": [{"delta": {"content": "Once"}}]}

data: {"id": "1", "choices": [{"delta": {"content": " upon"}}]}

data: [DONE]
```

#### Response

**Non-streaming:**

```json
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "Qwen/Qwen3-0.6B",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "total_tokens": 19
  }
}
```

### Audio Speech

Generate speech from text (Text-to-Speech).

#### Endpoint
```
POST /audio/speech
```

#### Request Body

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `model` | string | Yes | Model identifier (e.g., "OpenBMB/VoxCPM-1.5") |
| `messages` | array | Yes | Array of message objects |

#### Example

```bash
curl http://127.0.0.1:10100/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "OpenBMB/VoxCPM-1.5",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "Hello, this is AHA speaking."},
          {"type": "audio", "audio_url": {"url": "https://package-release.coderbox.cn/aiway/test/other/%E5%93%AA%E5%90%92.wav"}}
        ]
      }
    ]
  }'
```

#### Response

Returns audio data in base64 WAV format.

### Audio Transcriptions

Transcribe audio files to text (Automatic Speech Recognition).

This endpoint provides OpenAI-compatible audio transcription using `multipart/form-data` format.

#### Endpoints

```
POST /audio/transcriptions
POST /v1/audio/transcriptions
```

Both endpoints use the same handler and return identical responses. The `/v1/audio/transcriptions` path follows OpenAI's standard API convention.

#### Request Body

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `file` | file | Yes | The audio file to transcribe (wav, mp3, m4a, etc.) |
| `model` | string | No | Model identifier (optional, ignored - uses loaded model) |
| `language` | string | No | Language code (e.g., "zh", "en", "yue") |
| `prompt` | string | No | Optional text to guide transcription (not implemented, ignored) |
| `response_format` | string | No | Response format, only "json" or "text" supported (default: "json") |
| `temperature` | number | No | Sampling temperature (0.0 to 1.0, default: 0.0) |

#### Supported Languages

| Code | Language | Code | Language |
|------|----------|------|----------|
| `zh` | Chinese | `en` | English |
| `yue` | Cantonese | `ar` | Arabic |
| `de` | German | `fr` | French |
| `es` | Spanish | `pt` | Portuguese |
| `id` | Indonesian | `it` | Italian |
| `ko` | Korean | `ru` | Russian |
| `th` | Thai | `vi` | Vietnamese |
| `ja` | Japanese | `tr` | Turkish |
| `hi` | Hindi | `ms` | Malay |
| `nl` | Dutch | `sv` | Swedish |
| `da` | Danish | `fi` | Finnish |
| `pl` | Polish | `cs` | Czech |
| `fil` | Filipino | `fa` | Persian |
| `el` | Greek | `ro` | Romanian |
| `hu` | Hungarian | `mk` | Macedonian |

#### Examples

**Basic transcription:**

```bash
curl -X POST http://127.0.0.1:10100/audio/transcriptions \
  -H "Authorization: Bearer NO_NEED" \
  -F file="@./audio.wav" \
  -F model="Qwen/Qwen3-ASR-0.6B"
```

**With language specification:**

```bash
curl -X POST http://127.0.0.1:10100/v1/audio/transcriptions \
  -H "Authorization: Bearer NO_NEED" \
  -F file="@./chinese_audio.wav" \
  -F model="Qwen/Qwen3-ASR-0.6B" \
  -F language="zh"
```

**With temperature:**

```bash
curl -X POST http://127.0.0.1:10100/v1/audio/transcriptions \
  -H "Authorization: Bearer NO_NEED" \
  -F file="@./audio.wav" \
  -F model="Qwen/Qwen3-ASR-0.6B" \
  -F temperature="0.0"
```

#### Response

**Success (HTTP 200):**

```json
{
  "text": "Transcribed text from the audio file"
}
```

**Error (HTTP 400):**

```json
{
  "error": {
    "message": "Audio file is required",
    "type": "invalid_request_error",
    "code": "missing_file"
  }
}
```

**Error (HTTP 503):**

```json
{
  "error": {
    "message": "Model not initialized",
    "type": "service_unavailable",
    "code": "model_not_loaded"
  }
}
```

#### File Upload Limit

Maximum audio file size: 100 MB

### Images Remove Background

Remove background from images.

#### Endpoint
```
POST /images/remove_background
```

#### Request Body

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `model` | string | Yes | Model identifier (e.g., "AI-ModelScope/RMBG-2.0") |
| `messages` | array | Yes | Array of message objects |

#### Example

**From File:**

```bash
curl http://127.0.0.1:10100/images/remove_background \
  -H "Content-Type: application/json" \
  -d '{
    "model": "AI-ModelScope/RMBG-2.0",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "image", "image_url": {"url": "file:///path/to/document.jpg"}}
        ]
      }
    ]
  }'
```

**From Base64:**

```bash
curl http://127.0.0.1:10100/images/remove_background \
  -H "Content-Type: application/json" \
  -d '{
    "model": "AI-ModelScope/RMBG-2.0",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "image", "image_url": {"url": "base64://$(base64 -w 0 photo.png)"}}
        ]
      }
    ]
  }'
```

#### Response

Returns the processed image in base64 PNG format.

### Embeddings
Generate text embeddings.

#### Endpoints
```
POST /embeddings
POST /v1/embeddings
```

#### Request Body
| Parameter | Type | Required | Description |
|------|------|------|------|
| `model` | string | No | Model identifier (optional, ignored - uses loaded model)  |
| `input` | string or array | Yes | Text or array of texts to embed |

#### Examples
Single text：
```bash
curl http://127.0.0.1:10100/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Hello world"
  }'
```

Multiple texts：
```bash
curl http://127.0.0.1:10100/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": ["Hello world", "How are you?", "Goodbye"]
  }'
```

#### Response
**Success (HTTP 200):**
```json
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.1, 0.2, 0.3, ...]
    }
  ],
  "model": "model-name"
}
```

**Error (HTTP 400):**
```json
{
  "error": "embedding input must be a string or an array of strings"
}
```

### Rerank
Re-rank a list of documents according to a query.

#### Endpoint
```
POST /rerank
POST /v1/rerank
```

#### Request Body
| Parameter | Type | Required | Description |
|------|------|------|------|
| `model` | string | No | 模型标识符 |
| `query` | string | Yes | Query text |
| `documents` | array | Yes | Array of document texts to re-rank |
| `top_n` | int | No | Return top N results (optional) |

#### Example
Basic re-ranking：
```bash
curl http://127.0.0.1:10100/rerank \
  -H "Content-Type: application/json" \
  -d '{
    "query": "artificial intelligence",
    "documents": [
      "Machine learning is a form of artificial intelligence",
      "Apple is a fruit",
      "Deep learning belongs to the field of artificial intelligence"
    ]
  }'
```

Limit return count：
```bash
curl http://127.0.0.1:10100/rerank \
  -H "Content-Type: application/json" \
  -d '{
    "query": "artificial intelligence",
    "documents": [
      "Machine learning is a form of artificial intelligence",
      "Apple is a fruit", 
      "Deep learning belongs to the field of artificial intelligence"
    ],
    "top_n": 2
  }'
```

#### Response
**Success (HTTP 200):**
```json
{
  "object": "list",
  "model": "model-name",
  "results": [
    {
      "index": 0,
      "relevance_score": 0.95,
      "document": "Machine learning is a form of artificial intelligence"
    },
    {
      "index": 2,
      "relevance_score": 0.87,
      "document": "Deep learning belongs to the field of artificial intelligence"
    }
  ]
}
```

**Error (HTTP 400):**
```json
{
  "error": "rerank query cannot be empty"
}
```

#### Parameter Description
| Parameter | Type | Description |
|------|------|-----|
| `model` | string | Model identifier |
| `object` | string | Fixed value: "list" |
| `results` | array | Re-ranked results array |
| `index` | int | Original document index |
| `relevance_score` | f32 | Relevance score (higher is more relevant) |
| `document` | string | Original document text |

### Graceful Shutdown

Gracefully shut down the AHA server. This endpoint initiates a graceful shutdown process that:
1. Stops accepting new connections
2. Waits for existing requests to complete (up to 1 second)
3. Cleans up PID files
4. Exits the process

#### Endpoint
```
POST /shutdown
```

#### Request Body

None (empty request)

#### Response

**Success (HTTP 200):**

```json
{
  "message": "Shutting down..."
}
```

**Forbidden (HTTP 403):**

When remote shutdown is not allowed:

```json
{
  "error": "Remote shutdown not allowed. Use --allow-remote-shutdown flag to enable (not recommended)."
}
```

#### Security

By default, the shutdown endpoint only allows requests from localhost (127.0.0.1). To enable remote shutdown, start the server with the `--allow-remote-shutdown` flag:

```bash
aha serv -m Qwen/Qwen3-0.6B --allow-remote-shutdown
```

**Warning:** Enabling remote shutdown is not recommended for production use unless properly secured.

#### Example

```bash
curl -X POST http://127.0.0.1:10100/shutdown
```

#### Logging

All shutdown requests are logged to stderr with the format:

```
[SHUTDOWN] Shutdown requested (remote_allowed: false)
```

## Error Handling

### Error Codes

| Code | Description |
|------|-------------|
| 400 | Bad Request - Invalid parameters |
| 404 | Not Found - Model or endpoint not found |
| 500 | Internal Server Error - Model inference error |
| 503 | Service Unavailable - Model not loaded |

### Error Response Format

```json
{
  "error": {
    "message": "Model 'unknown-model' not found",
    "type": "invalid_request_error",
    "code": "model_not_found"
  }
}
```

## Rate Limiting

Currently, AHA does not implement rate limiting. The server can handle concurrent requests limited only by system resources.

## File Upload Limits

- String data: 5 MB
- File uploads: 100 MB

## OpenAI Compatibility

AHA's text generation API is designed to be compatible with OpenAI's API format. Multimodal APIs are derived from the text generation API with minimal changes:

### Python Example

```python
from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:10100",
    api_key="dummy"  # Not used but required by library
)

response = client.chat.completions.create(
    model="Qwen/Qwen3-0.6B",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)
```

### JavaScript Example

```javascript
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://127.0.0.1:10100',
  apiKey: 'dummy'  // Not used but required
});

const response = await client.chat.completions.create({
  model: 'Qwen/Qwen3-0.6B',
  messages: [{ role: 'user', content: 'Hello!' }]
});

console.log(response.choices[0].message.content);
```

## Best Practices

### 1. Use Streaming for Long Responses

For long text generation, use streaming to get responses incrementally:

```bash
curl ... -d '{"stream": true, ...}'
```

### 2. Set Appropriate Token Limits

Prevent excessively long responses:

```json
{
  "max_tokens": 500
}
```

### 3. Adjust Temperature

Control response creativity:
- `0.0-0.3`: Deterministic, focused
- `0.4-0.7`: Balanced (default: 1.0)
- `0.8-2.0`: Creative, varied

### 4. Use System Messages

Set behavior with system messages:

```json
{
  "messages": [
    {"role": "system", "content": "You are a technical writer."},
    {"role": "user", "content": "..."}
  ]
}
```

## See Also

- [Getting Started](./getting-started.md) - Quick start guide
- [CLI Reference](./cli.md) - Command-line usage
- [Installation](./installation.md) - Installation guide
- [Development](./development.md) - Contributing guide