🔍 Hunch
🤖 This repository is entirely AI-generated with human guidance. All code, tests, and documentation were produced by AI under the direction of a human collaborator.
A Rust port of Python's guessit for extracting media metadata from filenames.
⚠️ Work in progress. Hunch currently passes 77.3% of guessit's own 1,309-case test suite (1,012 / 1,309). Core properties like video codec, screen size, source, audio codec, and year are 95–99% accurate. Title, episode, language, and 40+ other properties are steadily improving.
fancy_regexhas been fully removed — all regex is linear-time (ReDoS-immune). See COMPATIBILITY.md for the full breakdown.
Hunch extracts title, year, season, episode, resolution, codec, language, and 40+ other properties from messy media filenames — the same job guessit does, rewritten from scratch for Rust.
Quick Start
As a library
use hunch;
As a CLI tool
{
}
guessit Compatibility
Hunch is a port of guessit. All 49 of guessit's properties are implemented. We validate against guessit's own YAML test suite:
| guessit (Python) | hunch (Rust) | |
|---|---|---|
| Overall pass rate | 100% (by definition) | 75.1% (983 / 1,309) |
| Properties implemented | 49 | 43 |
| Properties at 90%+ | 49 | 24 |
| Properties at 100% | 49 | 12 |
Where hunch matches guessit (96–100% accuracy): year, video_codec, container, source, screen_size, crc32, color_depth, streaming_service, bonus, film, aspect_ratio, size, edition, episode_details, version, frame_rate, episode_count, season_count.
Where hunch diverges (<70% accuracy): episode_title (62%), language (67%), video_profile (57%), bonus_title (62%), alternative_title (0%).
For per-property breakdowns, per-file results, and known gaps, see COMPATIBILITY.md.
Design
Hunch does not port guessit's rebulk engine. Instead it uses a
simpler span-based architecture:
- Match — 29 property matcher functions scan the input independently
and produce
MatchSpans (start, end, property, value) with priorities. - Resolve — Overlapping spans are resolved by priority, then by length (longer matches win ties).
- Extract — Title is inferred from the largest unclaimed region before the first technical property. Media type and proper count are computed as derived values and set directly on the result.
Input: "The.Walking.Dead.S05E03.720p.BluRay.x264-DEMAND.mkv"
│
├─ 1. Run 29 matcher functions → Vec<MatchSpan>
├─ 2. Resolve conflicts (priority, then length)
├─ 3. Extract title from unclaimed leading region
├─ 4. Set computed properties (media type, proper count)
└─ 5. Build HunchResult (BTreeMap<Property, Vec<String>>)
Project Structure
src/
├── lib.rs # Public API: hunch(), hunch_with()
├── main.rs # CLI binary (clap)
├── hunch_result.rs # HunchResult type + JSON serialization
├── options.rs # Configuration (media type hint, name-only mode)
├── pipeline.rs # Orchestration: matchers → resolve → extract
├── matcher/
│ ├── span.rs # MatchSpan + Property enum (42 variants)
│ ├── engine.rs # Conflict resolution (free function)
│ └── regex_utils.rs # ValuePattern helper for fancy_regex
└── properties/ # 29 matcher functions (one module each)
├── title.rs # Title extraction + media type inference
├── episodes.rs # Season/episode detection (S01E02, 1x03, etc.)
├── year.rs, source.rs, video_codec.rs, ...
└── mod.rs # Module re-exports
tests/
├── integration.rs # 27 hand-written end-to-end tests
├── guessit_regression.rs # 22 regression suites + compatibility report
├── helpers/mod.rs # Custom YAML fixture parser
└── fixtures/ # Copied from guessit (self-contained)
benches/
└── parse.rs # Criterion benchmarks
License
MIT