hunch 1.1.7

A Rust port of guessit β€” extract media metadata from filenames
Documentation

πŸ” Hunch

A fast, offline media filename parser for Rust β€” extract title, year, season, episode, codec, language, and 49 properties from messy filenames.

Hunch is a Rust rewrite of Python's guessit. Pure, deterministic, single-binary, linear-time regex only (ReDoS-immune).

Quick Start

# Install
brew install lijunzh/hunch/hunch   # macOS/Linux
cargo install hunch                 # from source
cargo binstall hunch                # pre-built binary

CLI

$ hunch "The.Walking.Dead.S05E03.720p.BluRay.x264-DEMAND.mkv"
{
  "container": "mkv",
  "episode": 3,
  "release_group": "DEMAND",
  "screen_size": "720p",
  "season": 5,
  "source": "Blu-ray",
  "title": "The Walking Dead",
  "type": "episode",
  "video_codec": "H.264"
}

Library

use hunch::hunch;

fn main() {
    let result = hunch("The.Walking.Dead.S05E03.720p.BluRay.x264-DEMAND.mkv");
    assert_eq!(result.title(), Some("The Walking Dead"));
    assert_eq!(result.season(), Some(5));
    assert_eq!(result.episode(), Some(3));
}

Cross-file context

For CJK, anime, or ambiguous filenames, hunch uses sibling files and directory names as context for better title extraction and type detection.

# Single file with sibling context:
hunch --context ./Season1/ "(BD)εδΊŒε›½θ¨˜ 第13θ©±(1440x1080 x264-10bpp flac).mkv"

# Batch-parse a single directory:
hunch --batch ./Season1/ --json

# Batch-parse an entire media library (recommended):
hunch --batch /path/to/tv/ -r -j

πŸ’‘ Tip: Always use --batch -r from your library root (e.g., tv/, movies/) rather than running --batch on each leaf directory individually. The -r flag preserves full relative paths like tv/Anime/Show/Extra/Menu.mkv, giving the parser critical context from directory names (tv/, Anime/, Season 1/) for accurate type detection. Without -r, files in deep subdirectories lose their path context and bonus content (SP, OVA, NCED) may be misclassified.

Documentation

Document Audience Content
User Manual Users Install, CLI, library API, all 49 properties, FAQ
Design Contributors Principles, architecture, key decisions
Compatibility Everyone guessit test suite pass rates by property
API Reference Developers Full Rust API docs
Changelog Everyone Version history

guessit Compatibility

All 49 guessit properties implemented. Validated against guessit's 1,309-case test suite.

Metric Value
Pass rate 81.8% (1,071 / 1,309)
Properties at 95%+ 22
Properties at 100% 16

See docs/compatibility.md for per-property breakdowns.

Known Limitations

In one real-world library audit of 7,838 files, hunch achieved 99.8% accuracy across a mixed Anime / English / Japanese / Kids collection. The remaining failures fall into a small number of edge-case categories that are difficult to solve reliably with a deterministic, offline filename parser.

These examples illustrate the main categories of remaining failures rather than an exhaustive list of every individual filename.

Bonus content without episode numbers

Files in bonus directories such as Bonus/ or η‰Ήε…Έζ˜ εƒ/ that contain no numeric episode marker may still be classified as episode with no episode number. Hunch recognizes these directory names for title cleanup but does not currently infer type=extra from directory names alone.

tv/Anime/.../η‰Ήε…Έζ˜ εƒ/[DBD-Raws][Natsume Yuujinchou Shichi][ε£°ε„ͺγƒˆγƒΌγ‚―γ‚·γƒ§γƒΌ][1080P][BDRip][HEVC-10bit][FLAC].mkv
  β†’ type=episode, episode=None  (expected: type=extra)

tv/English/Power Rangers/17 - Power Rangers RPM/Bonus/Power Rangers RPM - Stuntman Behind The Scenes (Japanese).mp4
  β†’ type=episode, episode=None  (expected: type=extra)

Why this remains difficult: directory names are useful context, but using them alone to infer type=extra would require an open-ended set of library-specific rules (Extras/, Featurettes/, Behind the Scenes/, Making Of/, etc.), increasing regression risk across other collections.

Sample / preview clips

Verification clips such as Sample1.mkv inside Samples/ directories may have their digits interpreted as episode numbers.

movie/.../Samples/Sample1.mkv
  β†’ type=episode, episode=1  (expected: not real media content)

Why this is low priority: sample files are typically release artifacts rather than meaningful library entries. Reliable detection would require special-casing many filename and directory conventions that vary across release groups.

Ambiguous special / episode cross-references

Some filenames contain both special markers (SP) and episode markers (EP), where the episode number refers to a related TV episode rather than the file itself.

movie/.../[Detective Conan][Tokuten BD][SP02][TV Series EP1080][BDRIP][1080P][H264_FLAC].mkv
  β†’ type=episode, episode=1080  (EP1080 is a cross-reference, not this file's episode)

Why this remains difficult: distinguishing "this file is episode 1080" from "this file references episode 1080" requires semantic understanding beyond hunch's current deterministic filename heuristics.

Malformed filenames

Genuinely malformed inputs such as 1.The.mkv.mkv can still produce poor results.

Why this is not prioritized: hunch assumes filenames contain at least some recoverable structure. Severely malformed input is treated as garbage-in, garbage-out.

Contributing

See CONTRIBUTING.md. The easiest contribution is reporting a failed parse.

cargo test              # 295 tests
cargo test -- --ignored # guessit compatibility report
cargo bench             # benchmarks

License

MIT