llmfit 0.2.3

Right-size LLM models to your system hardware. Interactive TUI and CLI to match models against available RAM, CPU, and GPU.
# AGENTS.md

Instructions for AI agents contributing to this codebase.

---

## Project overview

`llmfit` is a Rust CLI/TUI tool that matches LLM models against local system hardware (RAM, CPU, GPU). It detects system specs, loads a model database from embedded JSON, scores each model's fit, and presents results in an interactive terminal UI or classic table output.

## Language and toolchain

- Rust, edition 2024.
- Build with `cargo build`. Run with `cargo run`.
- No nightly features required. Stable toolchain only.
- Minimum supported Rust version: whatever edition 2024 requires (1.85+).

## Architecture

```
main.rs          Entrypoint. Parses CLI args via clap. Launches TUI by default,
                 falls back to CLI subcommands (system, list, fit, search, info)
                 or --cli flag for classic table output.

hardware.rs      SystemSpecs::detect() reads RAM/CPU via sysinfo crate.
                 detect_gpu() shells out to nvidia-smi / rocm-smi, and
                 detects Apple Silicon via system_profiler.
                 On unified memory (Apple Silicon), VRAM = system RAM.
                 No async. No unsafe.

models.rs        LlmModel struct. ModelDatabase loads from data/hf_models.json
                 embedded via include_str!() at compile time. No runtime file I/O.

fit.rs           FitLevel enum (Perfect, Good, Marginal, TooTight).
                 RunMode enum (Gpu, CpuOffload, CpuOnly).
                 ModelFit::analyze() compares a model against SystemSpecs,
                 selecting the best available execution path (GPU > CPU offload > CPU).
                 rank_models_by_fit() sorts by fit level, then run mode, then utilization.

display.rs       CLI-mode table rendering using the tabled crate.
                 Only used when --cli flag or subcommands are invoked.

tui_app.rs       TUI application state. Holds all models, filters (search text,
                 provider toggles, fit filter), selection index.
                 All filtering logic is here -- apply_filters() recomputes
                 filtered_fits indices whenever inputs change.

tui_ui.rs        Rendering with ratatui. Four layout regions: system bar,
                 search/filter bar, model table (or detail pane), status bar.
                 Stateless rendering -- reads from App, writes to Frame.

tui_events.rs    Keyboard event handling with crossterm. Two modes: Normal
                 (navigation, filter toggling, quit) and Search (text input).
```

## Data flow

1. `App::new()` calls `SystemSpecs::detect()` and `ModelDatabase::new()`.
2. Every model is analyzed into a `ModelFit` via `ModelFit::analyze()`.
3. Results are sorted by `rank_models_by_fit()`.
4. `apply_filters()` produces `filtered_fits: Vec<usize>` (indices into `all_fits`).
5. The TUI render loop reads `App` state and draws via `tui_ui::draw()`.
6. `tui_events::handle_events()` mutates `App` state, triggering re-render.

## Model database

- Source: `data/hf_models.json` (33 models).
- Generated by `scripts/scrape_hf_models.py` (Python, stdlib only, no pip deps).
- Embedded at compile time via `include_str!("../data/hf_models.json")`.
- Schema per entry: name, provider, parameter_count, min_ram_gb, recommended_ram_gb, min_vram_gb, quantization, context_length, use_case.
- `min_vram_gb` is VRAM needed for GPU inference. `min_ram_gb` is system RAM needed for CPU inference. Both are derived from the same parameter count.
- RAM formula: `params * 0.5 bytes (Q4_K_M) / 1024^3 * 1.2 overhead`.
- VRAM formula: `params * 0.5 bytes (Q4_K_M) / 1024^3 * 1.1 activation overhead`.
- Recommended RAM: `model_size * 2.0`.

Do not manually edit `hf_models.json`. Regenerate it by running the scraper:

```sh
python3 scripts/scrape_hf_models.py
```

The scraper has hardcoded fallback entries for gated models that require authentication.

## Conventions

- No `unsafe` code.
- No `.unwrap()` on user-facing paths. Use proper error handling or `expect()` with a descriptive message for internal invariants only.
- Fit levels are ordered: Perfect > Good > Marginal > TooTight. Do not add levels without updating `rank_models_by_fit()` sort logic.
- Fit is VRAM-first. GPU inference with sufficient VRAM is the ideal path. CPU inference via system RAM is a fallback. The `RunMode` enum tracks which memory pool is being used (Gpu, CpuOffload, CpuOnly).
- `min_vram_gb` is the VRAM needed to load model weights on GPU. `min_ram_gb` is the system RAM needed for CPU-only inference (same weights, loaded into RAM instead). They represent the same workload on different hardware paths.
- On Apple Silicon (unified memory), VRAM = system RAM. The `CpuOffload` path is skipped because there is no separate RAM pool to spill to. `SystemSpecs::unified_memory` tracks this.
- TUI rendering is stateless. `tui_ui::draw()` must not mutate `App`. Pass `&mut App` only for `TableState` widget requirements -- do not use it to change application state.
- Event handling in `tui_events.rs` is the sole place that mutates `App` in the TUI loop.
- Keep `display.rs` and `tui_*.rs` independent. The CLI path must work without initializing any TUI state.

## Adding a new model to the database

1. Add the model's HuggingFace repo ID to `TARGET_MODELS` in `scripts/scrape_hf_models.py`.
2. If the model is gated (requires HF auth), add a fallback entry to the `FALLBACK` dict in the same script.
3. Run `python3 scripts/scrape_hf_models.py`.
4. Verify the output in `data/hf_models.json`.
5. Run `cargo build` to verify compilation.

## Adding a new filter

1. Add the filter state to `App` in `tui_app.rs`.
2. Add filtering logic inside `apply_filters()`.
3. Add the keybinding in `tui_events.rs` (Normal mode handler).
4. Add the UI widget in `tui_ui.rs` (`draw_search_and_filters()` function).
5. Update the status bar help text in `draw_status_bar()`.

## Adding a new CLI subcommand

1. Add a variant to the `Commands` enum in `main.rs`.
2. Add the match arm in the `main()` function's command dispatch.
3. Use `display.rs` functions for output, or add new ones as needed.

## Testing

There are no tests yet. When adding tests:

- Unit tests for `fit.rs` logic (given known SystemSpecs and LlmModel values, assert correct FitLevel).
- Unit tests for `models.rs` (verify JSON parsing, search matching).
- Integration tests for CLI subcommands via `assert_cmd` crate.
- TUI is difficult to unit test. Keep rendering stateless and test the state mutations in `tui_app.rs` directly.

## Dependencies policy

- Prefer crates that are well-maintained and have minimal transitive dependencies.
- `sysinfo` is the system detection crate. Do not replace it with raw platform calls.
- `ratatui` + `crossterm` is the TUI stack. Do not mix in `termion` or `ncurses`.
- `clap` with derive feature for CLI parsing. Do not use manual arg parsing.
- The Python scraper uses only stdlib (`urllib`, `json`). Do not add pip dependencies.

## Common tasks

```sh
# Build
cargo build

# Run TUI
cargo run

# Run CLI mode
cargo run -- --cli

# Run specific subcommand
cargo run -- system
cargo run -- fit --perfect -n 5
cargo run -- search "llama"

# Refresh model database
python3 scripts/scrape_hf_models.py && cargo build

# Check for compilation issues
cargo check

# Format code
cargo fmt

# Lint
cargo clippy
```

## Platform notes

- GPU detection shells out to `nvidia-smi` (NVIDIA) and `rocm-smi` (AMD). These are best-effort and fail silently if unavailable.
- Apple Silicon detection uses `system_profiler SPDisplaysDataType`. On unified memory Macs, VRAM is reported as available system RAM (same pool).
- `sysinfo` handles cross-platform RAM/CPU. No conditional compilation needed.
- The TUI uses crossterm which works on Linux, macOS, and Windows terminals.