onde-cli 0.3.0

onde-cli-0.3.0 is not a library.

Manage your Onde Inference account, fine-tune local models, and export them to GGUF, all from the terminal.

Install

Install onde-cli with your favorite tool.

npm

npm install -g @ondeinference/cli

Homebrew

brew tap ondeinference/homebrew-tap
brew install onde

pip / uv / uvx

pip install onde-cli
# or
uv tool install onde-cli
uv run onde
# or with
uvx --from onde-cli onde

Pre-built binary

Download a release from GitHub Releases:

# macOS Apple Silicon
curl -Lo onde https://github.com/ondeinference/onde-cli/releases/latest/download/onde-macos-arm64
chmod +x onde && mv onde /usr/local/bin/onde

Platform	File
macOS Apple Silicon	`onde-macos-arm64`
macOS Intel	`onde-macos-amd64`
Linux x64	`onde-linux-amd64`
Linux arm64	`onde-linux-arm64`
Windows x64	`onde-win-amd64.exe`
Windows arm64	`onde-win-arm64.exe`

Usage

onde

This opens the TUI. You can sign up or sign in right there.

Key	What it does
`Tab`	Move between fields
`Enter`	Submit or sign out
`Ctrl+L`	Go to the sign-in screen
`Ctrl+N`	Go to the new account screen
`Ctrl+C`	Quit

Fine-tuning

onde includes a LoRA fine-tuning pipeline for Qwen2, Qwen2.5, and Qwen3 models. It runs locally: Metal on Apple Silicon, CPU elsewhere. No cloud setup. No Python environment.

The flow is straightforward: download a safetensors base model, fine-tune it with LoRA, merge the adapter back into the base weights, then export to GGUF for use in the Onde SDK.

Training data format

Each line should be one complete conversation in Qwen's chat template:

{"text": "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhat is LoRA?<|im_end|>\n<|im_start|>assistant\nLoRA adds small trainable matrices to frozen layers, letting you fine-tune large models without updating all the weights.<|im_end|>"}

Save the file wherever you want. The TUI lets you point to it directly.

Running it

onde
  → Models tab (Tab from Apps)
  → Select a safetensors model (↑↓, Enter)
  → Press f

Only safetensors models can be fine-tuned. GGUF models are already quantized, so their weights are not differentiable.

Configure the run:

Field	Default	Notes
Training data	`~/.onde/finetune/train.jsonl`	Path to your JSONL file
LoRA rank	`8`	Higher means more capacity and more memory use
Epochs	`3`	Full passes over the dataset
Learning rate	`0.0001`	AdamW default

Press Enter to start. In a healthy run, loss usually starts dropping by epoch 2. If it stays flat, try 0.0003.

After training

For rank 8 on a 0.6B model, the adapter is about 1.5 MB. From the fine-tune complete screen:

m to merge the adapter into the base model
g to export the merged model to GGUF

The resulting GGUF loads directly in the Onde SDK for on-device inference.

Supported base models

Model	Size	Notes
`Qwen/Qwen3-0.6B`	~1.2 GB	Smallest and quickest to train
`Qwen/Qwen2.5-1.5B-Instruct`	~3.0 GB	Good default for instruction tuning
`Qwen/Qwen3-1.7B`	~3.4 GB	Newer small Qwen3 model
`Qwen/Qwen3-4B`	~8.0 GB	Best quality, better suited to macOS

You can search for any of these from the Models tab with /.

Debug

Logs are written to ~/.cache/onde/debug.log.

License

Dual-licensed under MIT and Apache 2.0.