Manage your Onde Inference account, fine-tune local models, and export them to GGUF, all from the terminal.
Install
Install onde-cli with your favorite tool.
npm
Homebrew
pip / uv / uvx
# or
# or with
Pre-built binary
Download a release from GitHub Releases:
# macOS Apple Silicon
&&
| Platform | File |
|---|---|
| macOS Apple Silicon | onde-macos-arm64 |
| macOS Intel | onde-macos-amd64 |
| Linux x64 | onde-linux-amd64 |
| Linux arm64 | onde-linux-arm64 |
| Windows x64 | onde-win-amd64.exe |
| Windows arm64 | onde-win-arm64.exe |
Usage
This opens the TUI. You can sign up or sign in right there.
| Key | What it does |
|---|---|
Tab |
Move between fields |
Enter |
Submit or sign out |
Ctrl+L |
Go to the sign-in screen |
Ctrl+N |
Go to the new account screen |
Ctrl+C |
Quit |
Fine-tuning
onde includes a LoRA fine-tuning pipeline for Qwen2, Qwen2.5, and Qwen3 models. It runs locally: Metal on Apple Silicon, CPU elsewhere. No cloud setup. No Python environment.
The flow is straightforward: download a safetensors base model, fine-tune it with LoRA, merge the adapter back into the base weights, then export to GGUF for use in the Onde SDK.
Training data format
Each line should be one complete conversation in Qwen's chat template:
{"text": "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhat is LoRA?<|im_end|>\n<|im_start|>assistant\nLoRA adds small trainable matrices to frozen layers, letting you fine-tune large models without updating all the weights.<|im_end|>"}
Save the file wherever you want. The TUI lets you point to it directly.
Running it
onde
→ Models tab (Tab from Apps)
→ Select a safetensors model (↑↓, Enter)
→ Press f
Only safetensors models can be fine-tuned. GGUF models are already quantized, so their weights are not differentiable.
Configure the run:
| Field | Default | Notes |
|---|---|---|
| Training data | ~/.onde/finetune/train.jsonl |
Path to your JSONL file |
| LoRA rank | 8 |
Higher means more capacity and more memory use |
| Epochs | 3 |
Full passes over the dataset |
| Learning rate | 0.0001 |
AdamW default |
Press Enter to start. In a healthy run, loss usually starts dropping by epoch 2. If it stays flat, try 0.0003.
After training
For rank 8 on a 0.6B model, the adapter is about 1.5 MB. From the fine-tune complete screen:
mto merge the adapter into the base modelgto export the merged model to GGUF
The resulting GGUF loads directly in the Onde SDK for on-device inference.
Supported base models
| Model | Size | Notes |
|---|---|---|
Qwen/Qwen3-0.6B |
~1.2 GB | Smallest and quickest to train |
Qwen/Qwen2.5-1.5B-Instruct |
~3.0 GB | Good default for instruction tuning |
Qwen/Qwen3-1.7B |
~3.4 GB | Newer small Qwen3 model |
Qwen/Qwen3-4B |
~8.0 GB | Best quality, better suited to macOS |
You can search for any of these from the Models tab with /.
Debug
Logs are written to ~/.cache/onde/debug.log.
License
Dual-licensed under MIT and Apache 2.0.
Copyright
© 2026 Onde Inference (Splitfire AB).