mold
Generate images and short video clips on your own GPU. No cloud, no Python, no fuss.
Documentation | Getting Started | Models | API
That's it. Mold auto-downloads the model on first run and saves the image to your current directory.
Install
|
This downloads the latest pre-built binary to ~/.local/bin/mold. On Linux, the installer auto-detects your NVIDIA GPU and picks the right binary (RTX 40-series or RTX 50-series). macOS builds include Metal support.
Nix
From source
Add preview, expand, discord, or tui to the features list as needed.
Manual download
Pre-built binaries on the releases page.
Usage
Inline preview
Display generated images directly in the terminal (requires preview feature):
Piping
| |
Terminal UI (beta)
Model management
Remote rendering
# On your GPU server
# From your laptop
MOLD_HOST=http://gpu-server:7680
See the full CLI reference, configuration guide, and model catalog in the documentation.
Models
Supports 9 model families with 80+ variants:
| Family | Models | Highlights |
|---|---|---|
| FLUX.1 | schnell, dev, + fine-tunes | Best quality, 4-25 steps, LoRA support |
| Flux.2 Klein | 4B and 9B | Fast 4-step, low VRAM, default model |
| SDXL | base, turbo, + fine-tunes | Fast, flexible, negative prompts |
| SD 1.5 | base + fine-tunes | Lightweight, ControlNet support |
| SD 3.5 | large, medium, turbo | Triple encoder, high quality |
| Z-Image | turbo | Fast 9-step, Qwen3 encoder |
| Qwen-Image | base + 2512 | High resolution, CFG guidance, GGUF quant support |
| Wuerstchen | v2 | 42x latent compression |
| LTX Video | 0.9.6, 0.9.8 | Text-to-video with APNG/GIF/WebP/MP4 output |
Bare names auto-resolve: mold run flux-schnell "a cat" picks the best available variant.
See the full model catalog for sizes, VRAM requirements, and recommended settings.
LTX Video
Current supported LTX checkpoints are:
ltx-video-0.9.6:bf16ltx-video-0.9.6-distilled:bf16ltx-video-0.9.8-2b-distilled:bf16ltx-video-0.9.8-13b-dev:bf16ltx-video-0.9.8-13b-distilled:bf16
Recommended default today: ltx-video-0.9.6-distilled:bf16.
The 0.9.8 models pull the required spatial-upscaler asset automatically and
now run the full multiscale refinement path. mold keeps the shared T5 assets
under shared/flux/..., stores the 0.9.8 spatial upscaler under
shared/LTX-Video/..., and intentionally continues using the compatible
LTX-Video-0.9.5 VAE source until the newer VAE layout is ported.
Features
- txt2img, img2img, inpainting — full generation pipeline
- Image upscaling — Real-ESRGAN super-resolution (2x/4x) via
mold upscale, server API, or TUI - LoRA adapters — FLUX BF16 and GGUF quantized
- ControlNet — canny, depth, openpose (SD1.5)
- Prompt expansion — local LLM (Qwen3-1.7B) enriches short prompts
- Negative prompts — CFG-based models (SD1.5, SDXL, SD3, Wuerstchen)
- Pipe-friendly —
echo "a cat" | mold run | viu - - PNG metadata — embedded prompt, seed, model info
- Terminal preview — Kitty, Sixel, iTerm2, halfblock
- Smart VRAM — quantized encoders, block offloading, drop-and-reload
- Qwen-Image encoder control — selectable Qwen2.5-VL GGUF text encoders with Metal-safe defaults
- Shell completions — bash, zsh, fish, elvish, powershell
- REST API —
mold servewith SSE streaming, auth, rate limiting - Discord bot — slash commands with role permissions and quotas
- Interactive TUI — generate, gallery, models, settings
Deployment
| Method | Guide |
|---|---|
| NixOS module | Deployment: NixOS |
| Docker / RunPod | Deployment: Docker |
| Systemd | Deployment: Overview |
How it works
Single Rust binary built on candle — pure Rust ML, no Python, no libtorch.
mold run "a cat"
│
├─ Server running? → send request over HTTP
│
└─ No server? → load model locally on GPU
├─ Encode prompt (T5/CLIP text encoders)
├─ Denoise latent (transformer/UNet)
├─ Decode pixels (VAE)
└─ Save PNG
Requirements
- NVIDIA GPU with CUDA or Apple Silicon with Metal
- Models auto-download on first use (~2-30GB depending on model)