llmfit
A terminal tool that right-sizes LLM models to your system's RAM, CPU, and GPU. Detects your hardware, compares it against a database of 48 popular models, and tells you which ones will actually run on your machine.
Ships with an interactive TUI (default) and a classic CLI mode.
Quick install
|
Downloads the latest release binary from GitHub and installs it to /usr/local/bin (or ~/.local/bin)
Or

Example of a medium performance home laptop

Install
Homebrew (macOS / Linux)
Quick install
|
Downloads the latest release binary from GitHub and installs it to /usr/local/bin (or ~/.local/bin).
From source
# binary is at target/release/llmfit
Usage
TUI (default)
Launches the interactive terminal UI. Your system specs are shown at the top. Models are listed in a scrollable table sorted by compatibility.
| Key | Action |
|---|---|
Up / Down or j / k |
Navigate models |
/ |
Enter search mode (partial match on name, provider, params, use case) |
Esc or Enter |
Exit search mode |
Ctrl-U |
Clear search |
f |
Cycle fit filter: All, Runnable, Perfect, Good, Marginal |
1-9 |
Toggle provider visibility |
Enter |
Toggle detail view for selected model |
PgUp / PgDn |
Scroll by 10 |
g / G |
Jump to top / bottom |
q |
Quit |
CLI mode
Use --cli or any subcommand to get classic table output:
# Table of all models ranked by fit
# Only perfectly fitting models, top 5
# Show detected system specs
# List all models in the database
# Search by name, provider, or size
# Detailed view of a single model
How it works
-
Hardware detection -- Reads total/available RAM via
sysinfo, counts CPU cores, and probes for NVIDIA (nvidia-smi) or AMD (rocm-smi) GPUs. -
Model database -- 48 models sourced from the HuggingFace API, stored in
data/hf_models.jsonand embedded at compile time. Memory requirements are computed from parameter counts using Q4_K_M quantization (0.5 bytes/param). VRAM is the primary constraint for GPU inference; system RAM is the fallback for CPU-only execution. -
Fit analysis -- Each model is scored against available memory with awareness of GPU vs CPU execution:
Run modes:
- GPU -- Model fits in VRAM. Fast inference.
- CPU+GPU -- VRAM insufficient, model spills to system RAM with partial GPU offload.
- CPU -- No GPU detected. Model loaded entirely into system RAM. Slow.
Fit levels:
- Perfect -- Recommended memory met on GPU (VRAM). Requires GPU acceleration.
- Good -- Fits with headroom. Best achievable for CPU+GPU offload.
- Marginal -- Tight fit, or CPU-only (CPU-only always caps here).
- Too Tight -- Not enough VRAM or system RAM anywhere.
Model database
The model list is generated by scripts/scrape_hf_models.py, a standalone Python script (stdlib only, no pip dependencies) that queries the HuggingFace REST API. Models include families from Meta Llama, Mistral, Qwen, Google Gemma, Microsoft Phi, DeepSeek, Cohere, BigCode, 01.ai, and more.
See MODELS.md for the full list of 48 included models across 16 providers with parameters, quantization, context length, and use case.
To refresh the model database:
# Automated update (recommended)
# Or run the script directly
# Or manually
The scraper writes data/hf_models.json, which is baked into the binary via include_str!. The automated update script backs up existing data, validates JSON output, and rebuilds the binary.
Project structure
src/
main.rs -- CLI argument parsing, entrypoint, TUI launch
hardware.rs -- System RAM/CPU/GPU detection
models.rs -- Model database loaded from embedded JSON
fit.rs -- Compatibility analysis (FitLevel scoring)
display.rs -- Classic CLI table rendering (tabled crate)
tui_app.rs -- TUI application state, filters, navigation
tui_ui.rs -- TUI rendering (ratatui)
tui_events.rs -- TUI keyboard event handling (crossterm)
data/
hf_models.json -- Model database (48 models)
scripts/
scrape_hf_models.py -- HuggingFace API scraper
update_models.sh -- Automated database update script
Makefile -- Build and maintenance commands
Publishing to crates.io
The Cargo.toml already includes the required metadata (description, license, repository). To publish:
# Dry run first to catch issues
# Publish for real (requires a crates.io API token)
Before publishing, make sure:
- The version in
Cargo.tomlis correct (bump with each release). - A
LICENSEfile exists in the repo root. Create one if missing:
# For MIT license:
# Or write your own. The Cargo.toml declares license = "MIT".
data/hf_models.jsonis committed. It is embedded at compile time and must be present in the published crate.- The
excludelist inCargo.tomlkeepstarget/,scripts/, anddemo.gifout of the published crate to keep the download small.
To publish updates:
# Bump version
# Edit Cargo.toml: version = "0.2.0"
Dependencies
| Crate | Purpose |
|---|---|
clap |
CLI argument parsing with derive macros |
sysinfo |
Cross-platform RAM and CPU detection |
serde / serde_json |
JSON deserialization for model database |
tabled |
CLI table formatting |
colored |
CLI colored output |
ratatui |
Terminal UI framework |
crossterm |
Terminal input/output backend for ratatui |
Platform support
- Linux -- Full support. GPU detection via
nvidia-smi(NVIDIA),rocm-smi(AMD), and sysfs/lspci(Intel Arc). - macOS (Apple Silicon) -- Full support. Detects unified memory via
system_profiler. VRAM = system RAM (shared pool). Models run via Metal GPU acceleration. - macOS (Intel) -- RAM and CPU detection works. Discrete GPU detection if
nvidia-smiavailable. - Windows -- RAM and CPU detection works. NVIDIA GPU detection via
nvidia-smiif installed.
GPU support
| Vendor | Detection method | VRAM reporting |
|---|---|---|
| NVIDIA | nvidia-smi |
Exact dedicated VRAM |
| AMD | rocm-smi |
Detected (VRAM may be unknown) |
| Intel Arc (discrete) | sysfs (mem_info_vram_total) |
Exact dedicated VRAM |
| Intel Arc (integrated) | lspci |
Shared system memory |
| Apple Silicon | system_profiler |
Unified memory (= system RAM) |
Contributing
Contributions are welcome, especially new models.
Adding a model
- Add the model's HuggingFace repo ID (e.g.,
meta-llama/Llama-3.1-8B) to theTARGET_MODELSlist inscripts/scrape_hf_models.py. - If the model is gated (requires HuggingFace authentication to access metadata), add a fallback entry to the
FALLBACKSlist in the same script with the parameter count and context length. - Run the automated update script:
# or: ./scripts/update_models.sh - Verify the updated model list:
./target/release/llmfit list - Update MODELS.md by running:
python3 << 'EOF' < scripts/...(see commit history for the generator script) - Open a pull request.
See MODELS.md for the current list and AGENTS.md for architecture details.
License
MIT