hunch 1.0.0

A Rust port of guessit — extract media metadata from filenames
Documentation

🔍 Hunch

A fast, offline media filename parser for Rust — extract title, year, season, episode, codec, language, and 40+ other properties from messy filenames.

Hunch is a Rust rewrite of Python's guessit. All 49 guessit properties are implemented. The engine uses a tokenizer-first, two-pass, TOML-driven architecture with linear-time regex only (ReDoS-immune).

Install

Homebrew (macOS / Linux)

brew install lijunzh/hunch/hunch

Cargo (from source)

cargo install hunch

Pre-built binaries

Download from GitHub Releases. Also supports cargo-binstall:

cargo binstall hunch

As a library

cargo add hunch

Usage

CLI

$ hunch "The.Walking.Dead.S05E03.720p.BluRay.x264-DEMAND.mkv"
{
  "container": "mkv",
  "episode": 3,
  "release_group": "DEMAND",
  "screen_size": "720p",
  "season": 5,
  "source": "Blu-ray",
  "title": "The Walking Dead",
  "type": "episode",
  "video_codec": "H.264"
}

Multiple files at once:

hunch "Movie.2024.1080p.mkv" "Show.S01E01.mkv"

Options:

-t, --type <TYPE>    Hint media type: "movie" or "episode"
-n, --name-only      Treat input as name only (no path separators)
-j, --json           Output compact JSON (default is pretty-printed)

Library

use hunch::hunch;

fn main() {
    let result = hunch("The.Walking.Dead.S05E03.720p.BluRay.x264-DEMAND.mkv");
    println!("{:#?}", result);
    // HunchResult {
    //   title: Some("The Walking Dead"),
    //   season: Some(5),
    //   episode: Some(3),
    //   screen_size: Some("720p"),
    //   source: Some("Blu-ray"),
    //   video_codec: Some("H.264"),
    //   release_group: Some("DEMAND"),
    //   container: Some("mkv"),
    //   media_type: Episode,
    //   ...
    // }
}

guessit Compatibility

Hunch validates against guessit's own 1,309-case YAML test suite. All 49 guessit properties are implemented (3 intentionally diverged).

guessit (Python) hunch (Rust)
Overall pass rate 100% 79.1% (1,036 / 1,309)
Properties implemented 49 49 (3 intentionally diverged)
Properties at 95%+ 49 22
Properties at 100% 49 16

96–100% accurate: year, video_codec, container, source, screen_size, audio_codec, crc32, color_depth, streaming_service, edition, frame_rate, aspect_ratio, size, version, date, proper_count, and more.

90%+ accurate: title, release_group, episode, season, audio_channels, type, website, film_title.

For per-property breakdowns see COMPATIBILITY.md.

Intentional divergences

Property Reason
audio_bit_rate / video_bit_rate Hunch uses a single bit_rate
mimetype Trivially derived from container; redundant

Design

Hunch does not port guessit's rebulk engine. Instead:

  1. Tokenize — split on separators, extract extension, detect brackets
  2. Zone map — detect anchors (SxxExx, 720p, x264) to establish title zone vs tech zone boundaries
  3. Pass 1: Match & Resolve — 20 TOML rule files (embedded at compile time) + algorithmic matchers. Conflict resolution by priority then length.
  4. Pass 2: Extract — release group, title, and episode title run with access to resolved match positions from Pass 1.
  5. ResultHunchResult with 49 typed property accessors

See ARCHITECTURE.md for the full design and decision log.

Project Structure

src/
├── lib.rs              # Public API: hunch(), hunch_with()
├── main.rs             # CLI binary (clap)
├── hunch_result.rs     # HunchResult type + JSON serialization
├── options.rs          # Configuration
├── zone_map.rs         # Structural zone analysis
├── tokenizer.rs        # Input → TokenStream
├── pipeline/           # Two-pass orchestration
├── matcher/            # Conflict resolution + TOML rule engine
└── properties/         # 31 property matcher modules

rules/                  # 20 TOML data files (compile-time embedded)
tests/                  # Integration tests + guessit regression suite

Contributing

cargo test              # Run all tests
cargo test -- --ignored # Run guessit compatibility report
cargo bench             # Run benchmarks

License

MIT