mold
Generate images from text on your own GPU. No cloud, no Python, no fuss.
Documentation | Getting Started | Models | API
That's it. Mold auto-downloads the model on first run and saves the image to your current directory.
Install
|
This downloads the latest pre-built binary to ~/.local/bin/mold. On Linux, the installer auto-detects your NVIDIA GPU and picks the right binary (RTX 40-series or RTX 50-series). macOS builds include Metal support.
Nix
From source
Add preview, expand, discord, or tui to the features list as needed.
Manual download
Pre-built binaries on the releases page.
Usage
Inline preview
Display generated images directly in the terminal (requires preview feature):
Piping
| |
Terminal UI (beta)
Model management
Remote rendering
# On your GPU server
# From your laptop
MOLD_HOST=http://gpu-server:7680
See the full CLI reference, configuration guide, and model catalog in the documentation.
Models
Supports 8 model families with 60+ variants:
| Family | Models | Highlights |
|---|---|---|
| FLUX.1 | schnell, dev, + fine-tunes | Best quality, 4-25 steps, LoRA support |
| Flux.2 Klein | 4B and 9B | Fast 4-step, low VRAM, default model |
| SDXL | base, turbo, + fine-tunes | Fast, flexible, negative prompts |
| SD 1.5 | base + fine-tunes | Lightweight, ControlNet support |
| SD 3.5 | large, medium, turbo | Triple encoder, high quality |
| Z-Image | turbo | Fast 9-step, Qwen3 encoder |
| Qwen-Image | 2512 | High resolution, CFG guidance |
| Wuerstchen | v2 | 42x latent compression |
Bare names auto-resolve: mold run flux-schnell "a cat" picks the best available variant.
See the full model catalog for sizes, VRAM requirements, and recommended settings.
Features
- txt2img, img2img, inpainting — full generation pipeline
- LoRA adapters — FLUX BF16 and GGUF quantized
- ControlNet — canny, depth, openpose (SD1.5)
- Prompt expansion — local LLM (Qwen3-1.7B) enriches short prompts
- Negative prompts — CFG-based models (SD1.5, SDXL, SD3, Wuerstchen)
- Pipe-friendly —
echo "a cat" | mold run | viu - - PNG metadata — embedded prompt, seed, model info
- Terminal preview — Kitty, Sixel, iTerm2, halfblock
- Smart VRAM — quantized encoders, block offloading, drop-and-reload
- Shell completions — bash, zsh, fish, elvish, powershell
- REST API —
mold servewith SSE streaming, auth, rate limiting - Discord bot — slash commands with role permissions and quotas
- Interactive TUI — generate, gallery, models, settings
Deployment
| Method | Guide |
|---|---|
| NixOS module | Deployment: NixOS |
| Docker / RunPod | Deployment: Docker |
| Systemd | Deployment: Overview |
How it works
Single Rust binary built on candle — pure Rust ML, no Python, no libtorch.
mold run "a cat"
│
├─ Server running? → send request over HTTP
│
└─ No server? → load model locally on GPU
├─ Encode prompt (T5/CLIP text encoders)
├─ Denoise latent (transformer/UNet)
├─ Decode pixels (VAE)
└─ Save PNG
Requirements
- NVIDIA GPU with CUDA or Apple Silicon with Metal
- Models auto-download on first use (~2-30GB depending on model)