mold
Generate images and short video clips on your own GPU. No cloud, no Python, no fuss.
Documentation | Getting Started | Models | API
That's it. Mold auto-downloads the model on first run and saves the image to your current directory.
Install
|
This downloads the latest pre-built binary to ~/.local/bin/mold. On Linux, the installer auto-detects your NVIDIA GPU and picks the right binary (RTX 40-series or RTX 50-series). macOS builds include Metal support.
Nix
From source
Add preview, expand, discord, or tui to the features list as needed.
Manual download
Pre-built binaries on the releases page.
Usage
Inline preview
Display generated images directly in the terminal (requires preview feature):
Piping
| |
Terminal UI (beta)
Model management
Remote rendering
# On your GPU server
# From your laptop
MOLD_HOST=http://gpu-server:7680
See the full CLI reference, configuration guide, and model catalog in the documentation.
Models
Supports 11 model families with 80+ variants:
| Family | Models | Highlights |
|---|---|---|
| FLUX.1 | schnell, dev, + fine-tunes | Best quality, 4-25 steps, LoRA support |
| Flux.2 Klein | 4B and 9B | Fast 4-step, low VRAM, default model |
| SDXL | base, turbo, + fine-tunes | Fast, flexible, negative prompts |
| SD 1.5 | base + fine-tunes | Lightweight, ControlNet support |
| SD 3.5 | large, medium, turbo | Triple encoder, high quality |
| Z-Image | turbo | Fast 9-step, Qwen3 encoder |
| Qwen-Image | base + 2512 | High resolution, CFG guidance, GGUF quant support |
| Qwen-Image-Edit | 2511 | Multimodal image editing, repeatable --image, negative prompts |
| Wuerstchen | v2 | 42x latent compression |
| LTX-2 / LTX-2.3 | 19B, 22B | Joint audio-video generation, MP4-first workflows |
| LTX Video | 0.9.6, 0.9.8 | Text-to-video with APNG/GIF/WebP/MP4 output |
Bare names auto-resolve: mold run flux-schnell "a cat" picks the best available variant.
See the full model catalog for sizes, VRAM requirements, and recommended settings.
LTX Video
Current supported LTX checkpoints are:
ltx-video-0.9.6:bf16ltx-video-0.9.6-distilled:bf16ltx-video-0.9.8-2b-distilled:bf16ltx-video-0.9.8-13b-dev:bf16ltx-video-0.9.8-13b-distilled:bf16
Recommended default today: ltx-video-0.9.6-distilled:bf16.
The 0.9.8 models pull the required spatial-upscaler asset automatically and
now run the full multiscale refinement path. mold keeps the shared T5 assets
under shared/flux/..., stores the 0.9.8 spatial upscaler under
shared/LTX-Video/..., and intentionally continues using the compatible
LTX-Video-0.9.5 VAE source until the newer VAE layout is ported.
LTX-2 / LTX-2.3
Current supported LTX-2 checkpoints are:
ltx-2-19b-dev:fp8ltx-2-19b-distilled:fp8ltx-2.3-22b-dev:fp8ltx-2.3-22b-distilled:fp8
Recommended default today: ltx-2-19b-distilled:fp8.
This family is separate from ltx-video: it defaults to MP4, supports
synchronized audio, audio-to-video, keyframe interpolation, retake workflows,
stacked LoRAs, and camera-control LoRAs. The implementation is native Rust in
mold-inference with no Python bridge or upstream checkout requirement. CUDA
is the supported backend for real local generation, CPU is a correctness-only
fallback, and Metal is explicitly unsupported for this family. On 24 GB Ada
GPUs such as the RTX 4090, mold uses native staged loading, layer streaming,
and the compatible fp8-cast path for local FP8 runs rather than Hopper-only
fp8-scaled-mm. The native CUDA acceptance matrix is now validated across 19B
and 22B text+audio-video, image-to-video, audio-to-video, keyframe, retake,
public IC-LoRA, spatial upscaling (x1.5 / x2 where published), and
temporal upscaling (x2). The shared Gemma text assets are gated on Hugging
Face, so mold pull requires approved access to
google/gemma-3-12b-it-qat-q4_0-unquantized.
When you send source media through mold serve, the built-in request body
limit is 64 MiB, which is enough for common retake and audio-to-video
requests without changing server config.
Features
- txt2img, img2img, multimodal edit, inpainting — full generation pipeline
- Image upscaling — Real-ESRGAN super-resolution (2x/4x) via
mold upscale, server API, or TUI - LoRA adapters — FLUX BF16 and GGUF quantized
- ControlNet — canny, depth, openpose (SD1.5)
- Prompt expansion — local LLM (Qwen3-1.7B) enriches short prompts
- Negative prompts — CFG-based models (SD1.5, SDXL, SD3, Wuerstchen)
- Pipe-friendly —
echo "a cat" | mold run | viu - - PNG metadata — embedded prompt, seed, model info
- Terminal preview — Kitty, Sixel, iTerm2, halfblock
- Smart VRAM — quantized encoders, block offloading, drop-and-reload
- Qwen family encoder control — selectable Qwen2.5-VL variants for Qwen-Image and Qwen-Image-Edit, with quantized auto-fallback when BF16 would be too heavy
- Shell completions — bash, zsh, fish, elvish, powershell
- REST API —
mold servewith SSE streaming, auth, rate limiting - Discord bot — slash commands with role permissions and quotas
- Interactive TUI — generate, gallery, models, settings
Deployment
| Method | Guide |
|---|---|
| NixOS module | Deployment: NixOS |
| Docker / RunPod | Deployment: Docker |
| Systemd | Deployment: Overview |
How it works
Single Rust binary built on candle for the in-tree model families. LTX-2 now runs through the native Rust stack in mold-inference, so the full model surface stays in Rust with no libtorch dependency.
mold run "a cat"
│
├─ Server running? → send request over HTTP
│
└─ No server? → load model locally on GPU
├─ Encode prompt (T5/CLIP text encoders)
├─ Denoise latent (transformer/UNet)
├─ Decode pixels (VAE)
└─ Save PNG
Requirements
- NVIDIA GPU with CUDA or Apple Silicon with Metal
- Models auto-download on first use (~2-30GB depending on model)