🔍 Hunch
A fast, offline media filename parser for Rust — extract title, year, season, episode, codec, language, and 49 properties from messy filenames.
Hunch is a Rust rewrite of Python's guessit. All 49 guessit properties are implemented. The engine uses a tokenizer-first, two-pass, TOML-driven architecture with linear-time regex only (ReDoS-immune).
Install
Homebrew (macOS / Linux)
Cargo (from source)
Pre-built binaries
Download from GitHub Releases.
Also supports cargo-binstall:
As a library
Usage
CLI
{
}
Multiple files at once:
Options:
-t, --type <TYPE> Hint media type: "movie" or "episode"
-n, --name-only Treat input as name only (no path separators)
-j, --json Output compact JSON (default is pretty-printed)
-v, --verbose Enable debug logging (see Logging below)
Library
use hunch;
Parsing with Options
use ;
// Hint the media type:
let result = hunch_with;
// Parse a bare release name (no path handling):
let result = hunch_with;
Logging
Hunch uses the log crate for structured diagnostic
output. This is invaluable for debugging misparses — you can see exactly
which pipeline stage matched, dropped, or promoted each property.
# CLI: --verbose enables debug-level logging
# Fine-grained control via RUST_LOG
RUST_LOG=hunch=trace
| Level | What it shows |
|---|---|
debug |
Pipeline stage transitions, match counts, title/release group decisions |
trace |
Every individual match span, conflict resolution evictions, zone rule filtering |
In library usage, attach any log-compatible subscriber (e.g., env_logger,
tracing-log). When no subscriber is attached, all log calls compile to
no-ops — zero runtime cost.
API Documentation
Full API docs are available on docs.rs/hunch.
All 49 Property
variants are documented with example values. The
HunchResult
type provides typed accessors plus generic first()/all() methods.
guessit Compatibility
Hunch validates against guessit's own 1,309-case YAML test suite. All 49 guessit properties are implemented (3 intentionally diverged).
| guessit (Python) | hunch (Rust) | |
|---|---|---|
| Overall pass rate | 100% | 81.7% (1,069 / 1,309) |
| Properties implemented | 49 | 49 (3 intentionally diverged) |
| Properties at 95%+ | 49 | 22 |
| Properties at 100% | 49 | 16 |
96–100% accurate: year, video_codec, container, source, screen_size, audio_codec, crc32, color_depth, streaming_service, edition, frame_rate, aspect_ratio, size, version, date, proper_count, and more.
90%+ accurate: title, release_group, episode, season, audio_channels, type, website, film_title.
For per-property breakdowns see COMPATIBILITY.md.
Intentional divergences
| Property | Reason |
|---|---|
audio_bit_rate / video_bit_rate |
Hunch uses a single bit_rate |
mimetype |
Trivially derived from container; redundant |
Design
Hunch does not port guessit's rebulk engine. Instead:
- Tokenize — split on separators, extract extension, detect brackets
- Zone map — detect anchors (SxxExx, 720p, x264) to establish title zone vs tech zone boundaries
- Pass 1: Match & Resolve — 20 TOML rule files (embedded at compile time) + algorithmic matchers. Conflict resolution by priority then length.
- Pass 2: Extract — release group, title, and episode title run with access to resolved match positions from Pass 1.
- Result —
HunchResultwith 49 typed property accessors
See ARCHITECTURE.md for the full design and decision log.
Project Structure
src/
├── lib.rs # Public API: hunch(), hunch_with()
├── main.rs # CLI binary (clap)
├── hunch_result.rs # HunchResult type + JSON serialization
├── options.rs # Configuration
├── zone_map.rs # Structural zone analysis
├── tokenizer.rs # Input → TokenStream
├── pipeline/ # Two-pass orchestration + logging
├── matcher/ # Conflict resolution + TOML rule engine
└── properties/ # 31 property matcher modules
rules/ # 20 TOML data files (compile-time embedded)
tests/ # Integration tests + guessit regression suite
Contributing
See CONTRIBUTING.md for more details.