🔍 Hunch

A fast, offline media filename parser for Rust — extract title, year, season, episode, codec, language, and 49 properties from messy filenames.

Hunch is a Rust rewrite of Python's guessit. All 49 guessit properties are implemented. The engine uses a tokenizer-first, two-pass, TOML-driven architecture with linear-time regex only (ReDoS-immune).

Install

Homebrew (macOS / Linux)

brew install lijunzh/hunch/hunch

Cargo (from source)

cargo install hunch

Pre-built binaries

Download from GitHub Releases. Also supports cargo-binstall:

cargo binstall hunch

As a library

cargo add hunch

Usage

CLI

$ hunch "The.Walking.Dead.S05E03.720p.BluRay.x264-DEMAND.mkv"
{
  "container": "mkv",
  "episode": 3,
  "release_group": "DEMAND",
  "screen_size": "720p",
  "season": 5,
  "source": "Blu-ray",
  "title": "The Walking Dead",
  "type": "episode",
  "video_codec": "H.264"
}

Multiple files at once:

hunch "Movie.2024.1080p.mkv" "Show.S01E01.mkv"

Options:

-t, --type <TYPE>    Hint media type: "movie" or "episode"
-n, --name-only      Treat input as name only (no path separators)
-j, --json           Output compact JSON (default is pretty-printed)
-v, --verbose        Enable debug logging (see Logging below)

Library

use hunch::hunch;

fn main() {
    let result = hunch("The.Walking.Dead.S05E03.720p.BluRay.x264-DEMAND.mkv");
    assert_eq!(result.title(), Some("The Walking Dead"));
    assert_eq!(result.season(), Some(5));
    assert_eq!(result.episode(), Some(3));
    assert_eq!(result.source(), Some("Blu-ray"));
    assert_eq!(result.video_codec(), Some("H.264"));
    assert_eq!(result.release_group(), Some("DEMAND"));
    assert_eq!(result.container(), Some("mkv"));
}

Parsing with Options

use hunch::{Options, hunch_with};

// Hint the media type:
let result = hunch_with(
    "Show.S01E01.720p.HDTV.x264-LOL.mkv",
    Options::new().with_type("episode"),
);

// Parse a bare release name (no path handling):
let result = hunch_with(
    "Movie.2024.1080p.BluRay.x264-GROUP",
    Options::new().name_only(),
);

Logging

Hunch uses the log crate for structured diagnostic output. This is invaluable for debugging misparses — you can see exactly which pipeline stage matched, dropped, or promoted each property.

# CLI: --verbose enables debug-level logging
hunch -v "Movie.2024.1080p.BluRay.x264-GROUP.mkv"

# Fine-grained control via RUST_LOG
RUST_LOG=hunch=trace hunch "Movie.2024.1080p.mkv"

Level	What it shows
`debug`	Pipeline stage transitions, match counts, title/release group decisions
`trace`	Every individual match span, conflict resolution evictions, zone rule filtering

In library usage, attach any log-compatible subscriber (e.g., env_logger, tracing-log). When no subscriber is attached, all log calls compile to no-ops — zero runtime cost.

API Documentation

Full API docs are available on docs.rs/hunch.

All 49 Property variants are documented with example values. The HunchResult type provides typed accessors plus generic first()/all() methods.

guessit Compatibility

Hunch validates against guessit's own 1,309-case YAML test suite. All 49 guessit properties are implemented (3 intentionally diverged).

	guessit (Python)	hunch (Rust)
Overall pass rate	100%	81.7% (1,069 / 1,309)
Properties implemented	49	49 (3 intentionally diverged)
Properties at 95%+	49	22
Properties at 100%	49	16

96–100% accurate: year, video_codec, container, source, screen_size, audio_codec, crc32, color_depth, streaming_service, edition, frame_rate, aspect_ratio, size, version, date, proper_count, and more.

90%+ accurate: title, release_group, episode, season, audio_channels, type, website, film_title.

For per-property breakdowns see COMPATIBILITY.md.

Intentional divergences

Property	Reason
`audio_bit_rate` / `video_bit_rate`	Hunch uses a single `bit_rate`
`mimetype`	Trivially derived from `container`; redundant

Design

Hunch does not port guessit's rebulk engine. Instead:

Tokenize — split on separators, extract extension, detect brackets
Zone map — detect anchors (SxxExx, 720p, x264) to establish title zone vs tech zone boundaries
Pass 1: Match & Resolve — 20 TOML rule files (embedded at compile time) + algorithmic matchers. Conflict resolution by priority then length.
Pass 2: Extract — release group, title, and episode title run with access to resolved match positions from Pass 1.
Result — HunchResult with 49 typed property accessors

See ARCHITECTURE.md for the full design and decision log.

Project Structure

src/
├── lib.rs              # Public API: hunch(), hunch_with()
├── main.rs             # CLI binary (clap)
├── hunch_result.rs     # HunchResult type + JSON serialization
├── options.rs          # Configuration
├── zone_map.rs         # Structural zone analysis
├── tokenizer.rs        # Input → TokenStream
├── pipeline/           # Two-pass orchestration + logging
├── matcher/            # Conflict resolution + TOML rule engine
└── properties/         # 31 property matcher modules

rules/                  # 20 TOML data files (compile-time embedded)
tests/                  # Integration tests + guessit regression suite

Contributing

cargo test              # Run all tests (295 tests)
cargo test -- --ignored # Run guessit compatibility report
cargo bench             # Run benchmarks
cargo doc --open        # Build and browse API docs locally

See CONTRIBUTING.md for more details.

License

MIT

hunch 1.1.2