velaclaw 0.3.0

Protocol-driven autonomous AI agent runtime with intelligent model selection and multi-model negotiation.
# VelaClaw Providers Reference

This document maps provider IDs, aliases, and credential environment variables.

Last verified: **February 20, 2026**.

## How to List Providers

```bash
velaclaw providers
```

## Credential Resolution Order

Runtime resolution order is:

1. Explicit credential from config/CLI
2. Provider-specific env var(s)
3. Generic fallback env vars: `VELACLAW_API_KEY` then `API_KEY`

For resilient fallback chains (`reliability.fallback_providers`), each fallback
provider resolves credentials independently. The primary provider's explicit
credential is not reused for fallback providers.

## Provider Catalog

| Canonical ID | Aliases | Local | Provider-specific env var(s) |
|---|---|---:|---|
| `openrouter` || No | `OPENROUTER_API_KEY` |
| `anthropic` || No | `ANTHROPIC_OAUTH_TOKEN`, `ANTHROPIC_API_KEY` |
| `openai` || No | `OPENAI_API_KEY` |
| `ollama` || Yes | `OLLAMA_API_KEY` (optional) |
| `gemini` | `google`, `google-gemini` | No | `GEMINI_API_KEY`, `GOOGLE_API_KEY` |
| `venice` || No | `VENICE_API_KEY` |
| `vercel` | `vercel-ai` | No | `VERCEL_API_KEY` |
| `cloudflare` | `cloudflare-ai` | No | `CLOUDFLARE_API_KEY` |
| `moonshot` | `kimi` | No | `MOONSHOT_API_KEY` |
| `kimi-code` | `kimi_coding`, `kimi_for_coding` | No | `KIMI_CODE_API_KEY`, `MOONSHOT_API_KEY` |
| `synthetic` || No | `SYNTHETIC_API_KEY` |
| `opencode` | `opencode-zen` | No | `OPENCODE_API_KEY` |
| `zai` | `z.ai` | No | `ZAI_API_KEY` |
| `glm` | `zhipu` | No | `GLM_API_KEY` |
| `minimax` | `minimax-intl`, `minimax-io`, `minimax-global`, `minimax-cn`, `minimaxi`, `minimax-oauth`, `minimax-oauth-cn`, `minimax-portal`, `minimax-portal-cn` | No | `MINIMAX_OAUTH_TOKEN`, `MINIMAX_API_KEY` |
| `bedrock` | `aws-bedrock` | No | `AWS_ACCESS_KEY_ID` + `AWS_SECRET_ACCESS_KEY` (optional: `AWS_REGION`) |
| `qianfan` | `baidu` | No | `QIANFAN_API_KEY` |
| `doubao` | `volcengine`, `ark`, `doubao-cn` | No | `ARK_API_KEY`, `DOUBAO_API_KEY` |
| `qwen` | `dashscope`, `qwen-intl`, `dashscope-intl`, `qwen-us`, `dashscope-us`, `qwen-code`, `qwen-oauth`, `qwen_oauth` | No | `QWEN_OAUTH_TOKEN`, `DASHSCOPE_API_KEY` |
| `groq` || No | `GROQ_API_KEY` |
| `mistral` || No | `MISTRAL_API_KEY` |
| `xai` | `grok` | No | `XAI_API_KEY` |
| `deepseek` || No | `DEEPSEEK_API_KEY` |
| `together` | `together-ai` | No | `TOGETHER_API_KEY` |
| `fireworks` | `fireworks-ai` | No | `FIREWORKS_API_KEY` |
| `perplexity` || No | `PERPLEXITY_API_KEY` |
| `cohere` || No | `COHERE_API_KEY` |
| `copilot` | `github-copilot` | No | (use config/`API_KEY` fallback with GitHub token) |
| `lmstudio` | `lm-studio` | Yes | (optional; local by default) |
| `llamacpp` | `llama.cpp` | Yes | `LLAMACPP_API_KEY` (optional; only if server auth is enabled) |
| `nvidia` | `nvidia-nim`, `build.nvidia.com` | No | `NVIDIA_API_KEY` |

### Gemini Notes

- Provider ID: `gemini` (aliases: `google`, `google-gemini`)
- Auth can come from `GEMINI_API_KEY`, `GOOGLE_API_KEY`, or Gemini CLI OAuth cache (`~/.gemini/oauth_creds.json`)
- API key requests use `generativelanguage.googleapis.com/v1beta`
- Gemini CLI OAuth requests use `cloudcode-pa.googleapis.com/v1internal` with Code Assist request envelope semantics

### Ollama Vision Notes

- Provider ID: `ollama`
- Vision input is supported through user message image markers: ``[IMAGE:<source>]``.
- After multimodal normalization, VelaClaw sends image payloads through Ollama's native `messages[].images` field.
- If a non-vision provider is selected, VelaClaw returns a structured capability error instead of silently ignoring images.

### Ollama Cloud Routing Notes

- Use `:cloud` model suffix only with a remote Ollama endpoint.
- Remote endpoint should be set in `api_url` (example: `https://ollama.com`).
- VelaClaw normalizes a trailing `/api` in `api_url` automatically.
- If `default_model` ends with `:cloud` while `api_url` is local or unset, config validation fails early with an actionable error.
- Local Ollama model discovery intentionally excludes `:cloud` entries to avoid selecting cloud-only models in local mode.

### llama.cpp Server Notes

- Provider ID: `llamacpp` (alias: `llama.cpp`)
- Default endpoint: `http://localhost:8080/v1`
- API key is optional by default; set `LLAMACPP_API_KEY` only when `llama-server` is started with `--api-key`.
- Model discovery: `velaclaw models refresh --provider llamacpp`

### Bedrock Notes

- Provider ID: `bedrock` (alias: `aws-bedrock`)
- API: [Converse API]https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html
- Authentication: AWS AKSK (not a single API key). Set `AWS_ACCESS_KEY_ID` + `AWS_SECRET_ACCESS_KEY` environment variables.
- Optional: `AWS_SESSION_TOKEN` for temporary/STS credentials, `AWS_REGION` or `AWS_DEFAULT_REGION` (default: `us-east-1`).
- Default onboarding model: `anthropic.claude-sonnet-4-5-20250929-v1:0`
- Supports native tool calling and prompt caching (`cachePoint`).
- Cross-region inference profiles supported (e.g., `us.anthropic.claude-*`).
- Model IDs use Bedrock format: `anthropic.claude-sonnet-4-6`, `anthropic.claude-opus-4-6-v1`, etc.

### Ollama Reasoning Toggle

You can control Ollama reasoning/thinking behavior from `config.toml`:

```toml
[runtime]
reasoning_enabled = false
```

Behavior:

- `false`: sends `think: false` to Ollama `/api/chat` requests.
- `true`: sends `think: true`.
- Unset: omits `think` and keeps Ollama/model defaults.

### Kimi Code Notes

- Provider ID: `kimi-code`
- Endpoint: `https://api.kimi.com/coding/v1`
- Default onboarding model: `kimi-for-coding` (alternative: `kimi-k2.5`)
- Runtime auto-adds `User-Agent: KimiCLI/0.77` for compatibility.

### NVIDIA NIM Notes

- Canonical provider ID: `nvidia`
- Aliases: `nvidia-nim`, `build.nvidia.com`
- Base API URL: `https://integrate.api.nvidia.com/v1`
- Model discovery: `velaclaw models refresh --provider nvidia`

Recommended starter model IDs (verified against NVIDIA API catalog on February 18, 2026):

- `meta/llama-3.3-70b-instruct`
- `deepseek-ai/deepseek-v3.2`
- `nvidia/llama-3.3-nemotron-super-49b-v1.5`
- `nvidia/llama-3.1-nemotron-ultra-253b-v1`

## Custom Chat Endpoints

Custom chat endpoints are manifest-backed after ZS-ML-015. Add the endpoint to
an ai-protocol provider manifest, set `AI_PROTOCOL_DIR`, and reference the
logical model id:

```toml
default_provider = "local-gateway/my-model"
default_model = "local-gateway/my-model"
```

## MiniMax OAuth Setup (config.toml)

Set the MiniMax provider and OAuth placeholder in config:

```toml
default_provider = "minimax-oauth"
api_key = "minimax-oauth"
```

Then provide one of the following credentials via environment variables:

- `MINIMAX_OAUTH_TOKEN` (preferred, direct access token)
- `MINIMAX_API_KEY` (legacy/static token)
- `MINIMAX_OAUTH_REFRESH_TOKEN` (auto-refreshes access token at startup)

Optional:

- `MINIMAX_OAUTH_REGION=global` or `cn` (defaults by provider alias)
- `MINIMAX_OAUTH_CLIENT_ID` to override the default OAuth client id

Channel compatibility note:

- For MiniMax-backed channel conversations, runtime history is normalized to keep valid `user`/`assistant` turn order.
- Channel-specific delivery guidance (for example Telegram attachment markers) is merged into the leading system prompt instead of being appended as a trailing `system` turn.

## Qwen Code OAuth Setup (config.toml)

Set Qwen Code OAuth mode in config:

```toml
default_provider = "qwen-code"
api_key = "qwen-oauth"
```

Credential resolution for `qwen-code`:

1. Explicit `api_key` value (if not the placeholder `qwen-oauth`)
2. `QWEN_OAUTH_TOKEN`
3. `~/.qwen/oauth_creds.json` (reuses Qwen Code cached OAuth credentials)
4. Optional refresh via `QWEN_OAUTH_REFRESH_TOKEN` (or cached refresh token)
5. If no OAuth placeholder is used, `DASHSCOPE_API_KEY` can still be used as fallback

Optional endpoint override:

- `QWEN_OAUTH_RESOURCE_URL` (normalized to `https://.../v1` if needed)
- If unset, `resource_url` from cached OAuth credentials is used when available

## Model Routing (`hint:<name>`)

You can route model calls by hint using `[[model_routes]]`:

```toml
[[model_routes]]
hint = "reasoning"
provider = "openai/gpt-5.2"
model = "openai/gpt-5.2"

[[model_routes]]
hint = "fast"
provider = "groq/llama-3.3-70b-versatile"
model = "groq/llama-3.3-70b-versatile"
```

Then call with a hint model name (for example from tool or integration paths):

```text
hint:reasoning
```

## Embedding Routing (`hint:<name>`)

You can route embedding calls with the same hint pattern using `[[embedding_routes]]`.
Set `[memory].embedding_model` to a `hint:<name>` value to activate routing.

```toml
[memory]
embedding_model = "hint:semantic"

[[embedding_routes]]
hint = "semantic"
provider = "openai"
model = "text-embedding-3-small"
dimensions = 1536

[[embedding_routes]]
hint = "archive"
provider = "custom:https://embed.example.com/v1"
model = "your-embedding-model-id"
dimensions = 1024
```

Supported embedding providers:

- `none`
- `openai`
- `custom:<url>` (OpenAI-compatible embeddings endpoint)

Optional per-route key override:

```toml
[[embedding_routes]]
hint = "semantic"
provider = "openai"
model = "text-embedding-3-small"
api_key = "sk-route-specific"
```

## Upgrading Models Safely

Use stable hints and update only route targets when providers deprecate model IDs.

Recommended workflow:

1. Keep call sites stable (`hint:reasoning`, `hint:semantic`).
2. Change only the target model under `[[model_routes]]` or `[[embedding_routes]]`.
3. Run:
   - `velaclaw doctor`
   - `velaclaw status`
4. Smoke test one representative flow (chat + memory retrieval) before rollout.

This minimizes breakage because integrations and prompts do not need to change when model IDs are upgraded.