hf-fetch-model 0.10.2

# CLI Reference

hf-fetch-model installs two binaries: `hf-fetch-model` (explicit) and `hf-fm` (short alias).

```sh
cargo install hf-fetch-model --features cli
```

## Table of contents

- [Subcommands](#subcommands)
- [Download examples](#download-examples)
- [Dry-run example](#dry-run-example)
- [List-files examples](#list-files-examples)
- [Search examples](#search-examples)
- [Info examples](#info-examples)
- [Inspect examples](#inspect-examples)
- [Diff examples](#diff-examples)
- [Disk usage examples](#disk-usage-examples)
- [Du flags](#du-flags)
- [Other commands](#other-commands)
- [Cache commands](#cache-commands)
- [Cache clean-partial flags](#cache-clean-partial-flags)
- [Cache delete flags](#cache-delete-flags)
- [Cache verify flags](#cache-verify-flags)
- [Diff flags](#diff-flags)
- [Download flags](#download-flags)
- [Info flags](#info-flags)
- [Inspect flags](#inspect-flags)
- [List-files flags](#list-files-flags)
- [Search flags](#search-flags)
- [General flags](#general-flags)

## Subcommands

| Command | Description |
|---------|-------------|
| *(default)* | Download a model: `hf-fm <REPO_ID>` |
| `diff <REPO_A> <REPO_B>` | Compare tensor layouts between two models |
| `discover` | Find new model families on the Hub not yet cached locally |
| `info <REPO_ID>` | Show model card metadata and README text |
| `download-file <REPO_ID> <FILENAME>` | Download a single file (or glob pattern) and print its cache path |
| `du [REPO_ID\|N]` | Show cache disk usage — per-repo breakdown (by name or `#` index), or cache-wide summary |
| `cache clean-partial [REPO_ID\|N]` | Remove `.chunked.part` files from interrupted downloads |
| `cache delete <REPO_ID\|N>` | Delete a cached model (entire `models--org--name/` directory) |
| `cache path <REPO_ID\|N>` | Print the snapshot directory path for scripting |
| `cache verify <REPO_ID\|N>` | Re-verify SHA256 digests of cached files against HuggingFace LFS metadata |
| `inspect <REPO_ID> [FILENAME]` | Inspect safetensors file headers (tensor names, shapes, dtypes); auto-detects PEFT adapter config |
| `list-families` | List model families (`model_type`) in local cache |
| `list-files <REPO_ID>` | List files in a remote repo (filenames, sizes, SHA256) without downloading |
| `search <QUERY>` | Search the HuggingFace Hub for models (by downloads) |
| `status [REPO_ID]` | Show download status — per-repo detail, or cache-wide summary |

`<ARG>` = required, `[ARG]` = optional.

## Download examples

```sh
# Download all files
hf-fm google/gemma-2-2b-it

# Download safetensors + config only
hf-fm google/gemma-2-2b-it --preset safetensors

# Custom filters
hf-fm google/gemma-2-2b-it --filter "*.safetensors" --filter "*.json"

# Download to a specific directory
hf-fm google/gemma-2-2b-it --output-dir ./models

# Download a single file
hf-fm download-file mntss/clt-gemma-2-2b-426k W_dec_0.safetensors

# Download sharded PyTorch files by glob pattern
hf-fm download-file org/model "pytorch_model-*.bin"

# Download to flat layout (files directly in target directory)
hf-fm google/gemma-2-2b-it --preset safetensors --flat --output-dir ./models

# Download a single file to flat layout
hf-fm download-file org/model config.json --flat --output-dir ./configs

# Download with diagnostics
hf-fm google/gemma-2-2b-it -v
```

After a successful download, a summary line shows total size, elapsed time, and throughput:

```
Downloaded to: ~/.cache/huggingface/hub/models--google--gemma-2-2b-it/snapshots/...
  4.89 GiB in 114.9s (43.5 MiB/s)
```

In non-TTY contexts (pipes, CI), periodic progress lines are emitted to stderr instead of progress bars:

```
[hf-fm] model-00002-of-00002.safetensors: 22.96 MiB/229.54 MiB (10%)
[hf-fm] model-00001-of-00002.safetensors: 475.71 MiB/4.65 GiB (10%)
```

A warning is emitted when `--filter` duplicates a pattern already included by `--preset`:

```
warning: --filter "*.safetensors" is redundant with --preset safetensors
```

## Dry-run example

Preview what would be downloaded before committing:

```sh
hf-fm google/gemma-2-2b-it --preset safetensors --dry-run
```

Output shows per-file status (cached / to download), total and download sizes, and a recommended config based on the file size distribution.

## List-files examples

```sh
# List all files in a repo
hf-fm list-files google/gemma-2-2b-it

# List only safetensors-related files
hf-fm list-files google/gemma-2-2b-it --preset safetensors

# Custom filter
hf-fm list-files google/gemma-2-2b-it --filter "*.safetensors"

# Hide SHA256 column
hf-fm list-files google/gemma-2-2b-it --no-checksum

# Show which files are already in local cache
hf-fm list-files google/gemma-2-2b-it --show-cached
```

## Search examples

See [Search](search.md) for the full feature set.

```sh
# Basic search
hf-fm search RWKV-7

# Multi-term filtering
hf-fm search mistral,3B,instruct

# Exact match with model card
hf-fm search mistralai/Ministral-3-3B-Instruct-2512 --exact

# Filter by library
hf-fm search llama --library peft

# Filter by pipeline task
hf-fm search mistral --pipeline text-generation

# Filter by tag (useful for GGUF models without a library_name)
hf-fm search llama --tag gguf
```

Common quantization synonyms are normalized automatically: `8bit`, `8-bit`, `int8`, and `INT8` all produce the same results. Same for `4bit`/`4-bit`/`int4` and `fp8`/`float8`.

## Info examples

```sh
# Show metadata and first 40 lines of README
hf-fm info mistralai/Ministral-3-3B-Instruct-2512

# Show full README
hf-fm info mistralai/Ministral-3-3B-Instruct-2512 --lines 0

# JSON output
hf-fm info mistralai/Ministral-3-3B-Instruct-2512 --json

# Specific revision
hf-fm info mistralai/Ministral-3-3B-Instruct-2512 --revision v1.0
```

## Inspect examples

For a narrative walkthrough using a real 4-shard model, see the [Inspect tutorial](tutorials/inspect-before-downloading.md).

```sh
# Inspect a single safetensors file (cache-first, falls back to HTTP Range requests)
hf-fm inspect google/gemma-2-2b-it model-00001-of-00002.safetensors

# Inspect from cache only (no network)
hf-fm inspect google/gemma-2-2b-it model-00001-of-00002.safetensors --cached

# JSON output for programmatic consumption
hf-fm inspect google/gemma-2-2b-it model-00001-of-00002.safetensors --json

# Inspect all safetensors in a repo (uses shard index fast path when available)
hf-fm inspect google/gemma-2-2b-it

# Suppress metadata line
hf-fm inspect google/gemma-2-2b-it model-00001-of-00002.safetensors --no-metadata

# Per-dtype summary (tensor count, params, size per dtype)
hf-fm inspect google/gemma-2-2b-it model-00001-of-00002.safetensors --dtypes

# Dtype summary for a subset of tensors
hf-fm inspect google/gemma-2-2b-it model-00001-of-00002.safetensors --dtypes --filter "layers.0"

# Dtype summary as JSON (for scripting / cross-model aggregation)
hf-fm inspect google/gemma-2-2b-it model-00001-of-00002.safetensors --dtypes --json

# Hierarchical tree view (numeric sibling groups auto-collapsed to [0..N])
hf-fm inspect google/gemma-2-2b-it model-00001-of-00002.safetensors --tree

# Tree view of a subset of tensors
hf-fm inspect google/gemma-2-2b-it model-00001-of-00002.safetensors --tree --filter "embed"

# Tree as JSON (tagged enum: leaf / branch / ranged)
hf-fm inspect google/gemma-2-2b-it model-00001-of-00002.safetensors --tree --json

# Show only the first 10 tensors (useful when you just want to peek)
hf-fm inspect google/gemma-2-2b-it model-00001-of-00002.safetensors --limit 10

# First 5 tensors matching a filter (JSON adds a `truncated` field so consumers detect the cap)
hf-fm inspect google/gemma-2-2b-it model-00001-of-00002.safetensors --filter "layers.0" --limit 5 --json

# Inspect a PEFT adapter repo (auto-detects adapter_config.json)
hf-fm inspect some-user/llama-2-7b-lora-adapter

# Will it fit on my GPU? (device 0 by default; pass --check-gpu N to pick another)
hf-fm inspect meta-llama/Llama-3.2-1B --cached --check-gpu

# Multi-GPU box: check device 1 instead of 0
hf-fm inspect meta-llama/Llama-3.2-1B --cached --check-gpu 1

# JSON composition: gpu_check rides alongside the existing header schema
hf-fm inspect meta-llama/Llama-3.2-1B --cached --check-gpu --json
```

## Diff examples

```sh
# Compare tensor layouts between two model variants (cache-first)
hf-fm diff RedHatAI/Llama-3.2-1B-Instruct-FP8 casperhansen/llama-3.2-1b-instruct-awq

# Cache-only (no network)
hf-fm diff RedHatAI/Llama-3.2-1B-Instruct-FP8 casperhansen/llama-3.2-1b-instruct-awq --cached

# Filter to specific layers
hf-fm diff RedHatAI/Llama-3.2-1B-Instruct-FP8 casperhansen/llama-3.2-1b-instruct-awq --filter "layers.0"

# Quick summary (counts only, no tensor listing)
hf-fm diff RedHatAI/Llama-3.2-1B-Instruct-FP8 casperhansen/llama-3.2-1b-instruct-awq --cached --summary

# Per-dtype histograms side-by-side, with Δ Size column (ideal for scaled-sibling pairs)
hf-fm diff openai/gpt-oss-20b openai/gpt-oss-120b --dtypes

# JSON output for programmatic consumption (includes byte_count on every tensor entry)
hf-fm diff RedHatAI/Llama-3.2-1B-Instruct-FP8 casperhansen/llama-3.2-1b-instruct-awq --cached --json
```

## Disk usage examples

```sh
# Show all cached repos sorted by size (numbered)
hf-fm du

# Drill into the 2nd largest repo by index
hf-fm du 2

# Show per-file breakdown for a specific repo
hf-fm du google/gemma-2-2b-it

# Show last-modified age column
hf-fm du --age

# Hierarchical tree of every cached repo + its files
hf-fm du --tree

# Tree view with last-modified column on each repo branch
hf-fm du --tree --age
```

## Du flags

| Flag | Description | Default |
|------|-------------|---------|
| `--age` | Show a last-modified age column (e.g., `2 days ago`, `3 months ago`) | off |
| `--tree` | Hierarchical tree view: repos as branches, files as leaves, using box-drawing connectors. Composes with `--age`; conflicts with the positional repo argument (the per-repo view is already covered by `du <REPO_ID>`). | off |

## Other commands

```sh
# Check download status (per-repo or entire cache)
hf-fm status RWKV/RWKV7-Goose-World3-1.5B-HF
hf-fm status

# List model families in local cache
hf-fm list-families

# Discover new families from HuggingFace Hub
hf-fm discover
```

## Cache commands

```sh
# Remove all partial downloads (interactive prompt)
hf-fm cache clean-partial

# Remove partials for a specific repo (by name or index)
hf-fm cache clean-partial meta-llama/Llama-3.2-1B
hf-fm cache clean-partial 29

# Preview what would be removed
hf-fm cache clean-partial --dry-run

# Skip confirmation prompt
hf-fm cache clean-partial --yes
```

## Cache clean-partial flags

| Flag | Description | Default |
|------|-------------|---------|
| `--dry-run` | Preview what would be removed without deleting | off |
| `--yes` | Skip confirmation prompt | off |

```sh
# Delete a cached model (interactive prompt)
hf-fm cache delete EleutherAI/pythia-1.4b

# Delete by numeric index from du output
hf-fm cache delete 3

# Skip confirmation prompt
hf-fm cache delete 3 --yes
```

## Cache delete flags

| Flag | Description | Default |
|------|-------------|---------|
| `--yes` | Skip confirmation prompt | off |

```sh
# Evict every repo last touched more than 30 days ago
hf-fm cache gc --older-than 30

# Trim the cache to fit under a budget (oldest-first)
hf-fm cache gc --max-size 20GiB

# Combined: age first, then trim further if still over budget
hf-fm cache gc --older-than 30 --max-size 20GiB

# Protect specific repos from eviction (repeatable)
hf-fm cache gc --max-size 20GiB --except google/gemma-2-2b-it

# Preview without deleting; show every kept repo for transparency
hf-fm cache gc --older-than 30 --dry-run --list-kept

# Skip the confirmation prompt
hf-fm cache gc --older-than 30 --yes
```

## Cache gc flags

| Flag | Description | Default |
|------|-------------|---------|
| `--older-than DAYS` | Evict repos with mtime older than this many days | unset |
| `--max-size SIZE` | Hard cap on total cache size (`B`, `KiB`, `MiB`, `GiB`, `TiB`) | unset |
| `--except REPO_ID` | Repository to protect from eviction (repeatable) | none |
| `--dry-run` | Preview the eviction plan without deleting anything | off |
| `--yes` | Skip the confirmation prompt | off |
| `--list-kept` | List every kept repo in the preview (default: hidden for terseness) | off |

At least one of `--older-than` or `--max-size` is required. When both are set, age eviction runs first; if the cache is still over budget, oldest non-protected repos are evicted next, oldest first. Repos with active partial downloads (mtime within the last hour) are skipped to avoid racing with `hf-fm download`; run `cache clean-partial` first to clear stale partials.

Decimal-prefixed size suffixes (`KB`, `MB`, `GB`, `TB`) are rejected — `hf-fm` reports sizes in binary units everywhere else and silent reinterpretation would mislead. Use `KiB`, `MiB`, `GiB`, `TiB`.

```sh
# Print snapshot path for shell substitution
hf-fm cache path google/gemma-2-2b-it

# By numeric index from du output
hf-fm cache path 2

# Use in shell scripts
cd $(hf-fm cache path google/gemma-2-2b-it)
```

```sh
# Re-verify SHA256 digests of cached files (requires network)
hf-fm cache verify google/gemma-2-2b-it

# By numeric index from du output
hf-fm cache verify 2

# Verify a specific revision
hf-fm cache verify google/gemma-2-2b-it --revision v1.0
```

## Cache verify flags

| Flag | Description | Default |
|------|-------------|---------|
| `--revision` | Git revision (branch, tag, SHA) | main |
| `--token` | Auth token (or set `HF_TOKEN` env var) | — |

`cache verify` fetches the expected SHA256 digests from the HuggingFace API and recomputes each cached file's digest locally. Per-file outcomes:

- `SHA256 OK` — the cached file matches the expected digest.
- `SHA256 MISMATCH` — the cached file's digest differs (corruption); both expected and actual hashes are printed for forensics.
- `no LFS hash` — the file has no LFS metadata (small git-stored files such as `config.json`); verification is skipped.
- `MISSING` — the file is listed remotely but not present in the local snapshot.

Exit code is non-zero only when at least one file mismatched; `skipped` and `missing` alone are non-failures (a partial cache is a legitimate state). This makes the command safe to compose into CI / cron-style integrity checks.

## Diff flags

| Flag | Description | Default |
|------|-------------|---------|
| `--cached` | Cache-only mode: fail if files are not cached locally | off |
| `--dtypes` | Show side-by-side per-dtype histograms instead of the per-tensor body (conflicts with `--summary`) | off |
| `--filter` | Show only tensors whose name contains this substring | — |
| `--json` | Output the full diff as JSON (per-tensor entries include `byte_count`; `--dtypes` adds a `dtype_histograms` field) | off |
| `--revision-a` | Git revision for model A | main |
| `--revision-b` | Git revision for model B | main |
| `--summary` | Show only the summary line (counts per category; conflicts with `--dtypes`) | off |
| `--token` | Auth token (or set `HF_TOKEN` env var) | — |

## Download flags

These flags apply to the default download command (`hf-fm <REPO_ID>`). `download-file` shares the performance flags and `--flat` but not `--dry-run`, `--filter`, or `--preset`. `download-file` also accepts glob patterns (e.g., `"pytorch_model-*.bin"`) as the filename argument.

| Flag | Description | Default |
|------|-------------|---------|
| `-v`, `--verbose` | Enable download diagnostics (plan, per-file decisions, throughput) | off |
| `--dry-run` | Preview what would be downloaded (no actual download) | off |
| `--chunk-threshold-mib` | Min file size (MiB) for multi-connection download | auto-tuned |
| `--concurrency` | Parallel file downloads | auto-tuned |
| `--connections-per-file` | Parallel HTTP connections per large file | auto-tuned |
| `--exclude` | Exclude glob pattern (repeatable) | none |
| `--filter` | Include glob pattern (repeatable) | all files |
| `--flat` | Copy files to flat layout: `{output-dir}/{filename}` | off |
| `--output-dir` | Custom output directory (or flat copy target with `--flat`) | HF cache |
| `--preset` | Filter preset: `safetensors`, `gguf`, `pth`, `config-only` | — |
| `--revision` | Git revision (branch, tag, SHA) | main |
| `--token` | Auth token (or set `HF_TOKEN` env var) | — |

## List-files flags

| Flag | Description | Default |
|------|-------------|---------|
| `--exclude` | Exclude glob pattern (repeatable) | none |
| `--filter` | Include glob pattern (repeatable) | all files |
| `--no-checksum` | Suppress the SHA256 column | off |
| `--preset` | Filter preset: `safetensors`, `gguf`, `pth`, `config-only` | — |
| `--revision` | Git revision (branch, tag, SHA) | main |
| `--show-cached` | Show cache status: complete (✓), partial, or missing (✗) | off |
| `--token` | Auth token (or set `HF_TOKEN` env var) | — |

## Search flags

| Flag | Description | Default |
|------|-------------|---------|
| `--exact` | Match a full repository ID exactly and show its metadata card | off |
| `--library` | Filter by library framework (e.g., `transformers`, `peft`, `vllm`) | — |
| `--limit` | Maximum number of results | 20 |
| `--pipeline` | Filter by pipeline task (e.g., `text-generation`, `text-classification`) | — |
| `--tag` | Filter by model tag (e.g., `gguf`, `conversational`, `imatrix`) | — |

## Info flags

| Flag | Description | Default |
|------|-------------|---------|
| `--json` | Output metadata and README as JSON | off |
| `--lines` | Maximum lines of README to display (0 = all) | 40 |
| `--revision` | Git revision (branch, tag, SHA) | main |
| `--token` | Auth token (or set `HF_TOKEN` env var) | — |

## Inspect flags

| Flag | Description | Default |
|------|-------------|---------|
| `--cached` | Cache-only mode: fail if the file is not cached locally | off |
| `--check-gpu [N]` | Append a one-line GPU-fit verdict comparing model weight bytes against free VRAM on device `N` (default `0`). Reads device info via [`hypomnesis`](https://crates.io/crates/hypomnesis) (NVML on Linux/Windows, DXGI on Windows; falls back to `nvidia-smi`). On systems with no NVIDIA GPU detected, prints `GPU N: unavailable — <reason>` and skips the verdict (exit code stays `0` — the command is informational, not a gate). Uses the **unfiltered** model totals (so `--filter` / `--limit` affect only the printed table). Composes with `--json`: a `gpu_check` object is added to the per-file schema, the `--tree --json` schema, and the `--dtypes --json` schema; the repo-level plain `--json` schema becomes `{"files": [...], "gpu_check": {...}}` when `--check-gpu` is passed (the array schema is preserved when it is absent). At the whole-repo level, forces shard aggregation so the verdict reflects the total weight bytes across every shard. Conflicts with `--list` (no headers are read in `--list` mode). | off |
| `--dtypes` | Show a per-dtype summary (tensor count, params, size) instead of individual tensors. Composes with `--json` to emit `{ dtypes: [...], total_tensors, total_params }`. | off |
| `--filter` | Show only tensors whose name contains this substring | — |
| `--json` | Output the full header as JSON instead of a human-readable table | off |
| `--limit` | Show only the first N tensors (applied after `--filter`). JSON output gains a `truncated` field when the cap is reached. | — |
| `--no-metadata` | Suppress the `Metadata:` line in human-readable output | off |
| `--tree` | Show a hierarchical tree view grouped by dotted namespace prefix; numeric sibling groups with identical sub-structure collapse to `[0..N]`. Composes with `--filter` and `--json`. Conflicts with `--dtypes` and `--limit`. | off |
| `--revision` | Git revision (branch, tag, SHA) | main |
| `--token` | Auth token (or set `HF_TOKEN` env var) | — |

## General flags

| Flag | Description |
|------|-------------|
| `-h`, `--help` | Print help |
| `-V`, `--version` | Print version |

Subcommands accept their own flags. Run `hf-fm <command> --help` for details.