onde-cli 0.3.0

Terminal UI for signing up, signing in, and managing your Onde Inference account.
onde-cli-0.3.0 is not a library.

Manage your Onde Inference account, fine-tune local models, and export them to GGUF, all from the terminal.

Install

Install onde-cli with your favorite tool.

npm

npm install -g @ondeinference/cli

Homebrew

brew tap ondeinference/homebrew-tap
brew install onde

pip / uv / uvx

pip install onde-cli
# or
uv tool install onde-cli
uv run onde
# or with
uvx --from onde-cli onde

Pre-built binary

Download a release from GitHub Releases:

# macOS Apple Silicon
curl -Lo onde https://github.com/ondeinference/onde-cli/releases/latest/download/onde-macos-arm64
chmod +x onde && mv onde /usr/local/bin/onde
Platform File
macOS Apple Silicon onde-macos-arm64
macOS Intel onde-macos-amd64
Linux x64 onde-linux-amd64
Linux arm64 onde-linux-arm64
Windows x64 onde-win-amd64.exe
Windows arm64 onde-win-arm64.exe

Usage

onde

This opens the TUI. You can sign up or sign in right there.

Key What it does
Tab Move between fields
Enter Submit or sign out
Ctrl+L Go to the sign-in screen
Ctrl+N Go to the new account screen
Ctrl+C Quit

Fine-tuning

onde includes a LoRA fine-tuning pipeline for Qwen2, Qwen2.5, and Qwen3 models. It runs locally: Metal on Apple Silicon, CPU elsewhere. No cloud setup. No Python environment.

The flow is straightforward: download a safetensors base model, fine-tune it with LoRA, merge the adapter back into the base weights, then export to GGUF for use in the Onde SDK.

Training data format

Each line should be one complete conversation in Qwen's chat template:

{"text": "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhat is LoRA?<|im_end|>\n<|im_start|>assistant\nLoRA adds small trainable matrices to frozen layers, letting you fine-tune large models without updating all the weights.<|im_end|>"}

Save the file wherever you want. The TUI lets you point to it directly.

Running it

onde
  → Models tab (Tab from Apps)
  → Select a safetensors model (↑↓, Enter)
  → Press f

Only safetensors models can be fine-tuned. GGUF models are already quantized, so their weights are not differentiable.

Configure the run:

Field Default Notes
Training data ~/.onde/finetune/train.jsonl Path to your JSONL file
LoRA rank 8 Higher means more capacity and more memory use
Epochs 3 Full passes over the dataset
Learning rate 0.0001 AdamW default

Press Enter to start. In a healthy run, loss usually starts dropping by epoch 2. If it stays flat, try 0.0003.

After training

For rank 8 on a 0.6B model, the adapter is about 1.5 MB. From the fine-tune complete screen:

  • m to merge the adapter into the base model
  • g to export the merged model to GGUF

The resulting GGUF loads directly in the Onde SDK for on-device inference.

Supported base models

Model Size Notes
Qwen/Qwen3-0.6B ~1.2 GB Smallest and quickest to train
Qwen/Qwen2.5-1.5B-Instruct ~3.0 GB Good default for instruction tuning
Qwen/Qwen3-1.7B ~3.4 GB Newer small Qwen3 model
Qwen/Qwen3-4B ~8.0 GB Best quality, better suited to macOS

You can search for any of these from the Models tab with /.


Debug

Logs are written to ~/.cache/onde/debug.log.


License

Dual-licensed under MIT and Apache 2.0.

Copyright

© 2026 Onde Inference (Splitfire AB).