hf-fetch-model
Download HuggingFace models at full speed with a single function call.
Features
- Single-file download — download one file by name, get its cache path
- Repo-level download — give it a model ID, get all files
- Maximum throughput — multi-connection parallel Range downloads for large files (≥100 MiB, 8 connections by default) enabled for all download functions, plus hf-hub's
.high()mode - Download diagnostics — structured
tracingevents atdebuglevel report download plan, per-file chunked/single decisions, throughput, and completion summary - File filtering — glob patterns (
*.safetensors) and presets (safetensors,gguf,config-only) - HF cache compatible — files stored in
~/.cache/huggingface/hub/ - Progress reporting — per-file callbacks, optional
indicatifprogress bars - Checksum verification — SHA256 against HuggingFace LFS metadata
- Retry with backoff — exponential backoff + jitter for flaky connections
- Timeout control — per-file and overall time limits
- Cache diagnostics —
statuscommand shows per-file download state (complete / partial / missing) - Model search —
searchcommand queries the HuggingFace Hub by keyword - CLI included —
hf-fetch-model/hf-fmbinary for command-line use
Installation
Library
CLI
This installs two binaries: hf-fetch-model (explicit) and hf-fm (short alias).
Quick Start (Library)
// Minimal — download everything
let path = download.await?;
// Configured — filter + progress
use ;
let config = safetensors
.on_progress
.build?;
let path = download_with_config.await?;
Blocking wrappers (download_blocking(), download_with_config_blocking()) are available for non-async callers.
Single-file download
use FetchConfig;
let config = builder
.on_progress
.build?;
// Download one file — returns the local cache path
let path = download_file.await?;
// Blocking variant for non-async callers
let path = download_file_blocking?;
CLI Usage
# Download all files
# Download safetensors + config only
# Custom filters
# Download to a specific directory
# Download a single file
# Search for models on HuggingFace Hub
# Check download status (per-repo or entire cache)
# List model families in local cache
# Discover new families from HuggingFace Hub
# Download with diagnostics (chunked/single decisions, throughput)
Subcommands
| Command | Description |
|---|---|
| (default) | Download a model: hf-fm <REPO_ID> |
download-file <REPO_ID> <FILENAME> |
Download a single file and print its cache path |
search <QUERY> |
Search the HuggingFace Hub for models (by downloads) |
status [REPO_ID] |
Show download status — per-repo detail, or cache-wide summary |
list-families |
List model families (model_type) in local cache |
discover |
Find new model families on the Hub not yet cached locally |
<ARG> = required, [ARG] = optional.
Download Flags
These flags apply to the default download command (hf-fm <REPO_ID>) and download-file.
| Flag | Description | Default |
|---|---|---|
-v, --verbose |
Enable download diagnostics (plan, per-file decisions, throughput) | off |
--chunk-threshold-mib |
Min file size (MiB) for multi-connection download | 100 |
--concurrency |
Parallel file downloads | 4 |
--connections-per-file |
Parallel HTTP connections per large file | 8 |
--exclude |
Exclude glob pattern (repeatable) | none |
--filter |
Include glob pattern (repeatable) | all files |
--output-dir |
Custom output directory | HF cache |
--preset |
Filter preset: safetensors, gguf, config-only |
— |
--revision |
Git revision (branch, tag, SHA) | main |
--token |
Auth token (or set HF_TOKEN env var) |
— |
General Flags
| Flag | Description |
|---|---|
-h, --help |
Print help |
-V, --version |
Print version |
Subcommands accept their own flags (e.g., --limit for search and discover). Run hf-fm <command> --help for details.
Download Diagnostics
hf-fetch-model emits structured tracing events at debug level to help diagnose download performance. In the CLI, use the --verbose / -v flag. For library users, initialize a tracing-subscriber at debug level (e.g., RUST_LOG=hf_fetch_model=debug):
# CLI — verbose flag (prints diagnostics to stderr)
Example output:
DEBUG hf_fetch_model: listing repository files repo_id="allenai/OLMo-1B-hf"
DEBUG hf_fetch_model: metadata fetch succeeded files_with_size=8 total_files=8
DEBUG hf_fetch_model: download plan total_files=8 concurrency=4 connections_per_file=8 chunk_threshold_mib=100 chunked_enabled=true
DEBUG hf_fetch_model: chunked download (multi-connection) filename="model.safetensors" size_mib=2475 connections=8
DEBUG hf_fetch_model: single-connection download (below chunk threshold) filename="config.json" size_mib=0
DEBUG hf_fetch_model: download complete filename="model.safetensors" elapsed_secs="23.1" throughput_mbps="857.2"
DEBUG hf_fetch_model: download complete files_downloaded=8 files_failed=0 total_elapsed_secs="24.3"
Key diagnostics:
- "metadata fetch failed" (warning): file sizes are unknown, so chunked downloads are disabled — all files use single-connection download.
- "single-connection download" with reason "file size unknown": metadata was not available for this file.
- "chunked download": file exceeds
chunk_thresholdand is being downloaded with multiple parallel HTTP Range connections. - throughput_mbps: actual per-file throughput, useful for comparing single vs chunked performance.
Architecture
candle-mi
download_model() convenience fn
│ optional dep (feature = "fast-download")
hf-fetch-model
• repo file listing
• file filtering (glob patterns)
• parallel file orchestration
• multi-connection Range downloads (large files)
• progress callbacks
• checksum verification
• resume / retry
• cache diagnostics & model search
│ dep
hf-hub (tokio, .high())
• single-connection download (.high() mode)
• HF cache layout compatibility
• auth token handling
Configuration
| Builder method | Description | Default |
|---|---|---|
.revision(rev) |
Git revision | "main" |
.token(tok) |
Auth token | — |
.token_from_env() |
Read HF_TOKEN env var |
— |
.filter(glob) |
Include pattern (repeatable) | all files |
.exclude(glob) |
Exclude pattern (repeatable) | none |
.concurrency(n) |
Parallel downloads | 4 |
.output_dir(path) |
Custom cache directory | HF default |
.timeout_per_file(dur) |
Per-file timeout | 300s |
.timeout_total(dur) |
Overall timeout | unlimited |
.max_retries(n) |
Retries per file | 3 |
.verify_checksums(bool) |
SHA256 verification | true |
.chunk_threshold(bytes) |
Min file size for multi-connection download | 100 MiB |
.connections_per_file(n) |
Parallel connections per large file | 8 |
.on_progress(closure) |
Progress callback | — |
Used by
- candle-mi — Mechanistic interpretability toolkit for transformer models, uses hf-fetch-model for fast model downloads (optional
fast-downloadfeature).
License
Licensed under either of Apache License, Version 2.0 or MIT License at your option.