dialog_detective 1.1.0

Automatically identify and rename unknown tv series video files by letting AI listen to their dialogue.
Documentation

DialogDetective

Automatically identify and rename unknown tv series video files by letting AI listen to their dialogue.

Why I Built This

I sometimes rip TV series from my Blu-ray/DVD collection to have them available for easier binge watching. Unfortunately, the structure of those disc releases is often completely non-linear - you get files like TITLE_01.mkv, TITLE_03.mkv, TITLE_07.mkv with no clear indication which episode is which.

I didn't want to manually map these weird title IDs to actual season and episode numbers. That would require me to watch a bit of each file and guess based on episode summaries from TV databases. I thought modern LLMs should be able to do this for me. A quick prototype later, it turned out they can.

So I created DialogDetective to do this work automatically. If you have the same problem, this tool might help you too.

How It Works

DialogDetective extracts audio from your video files, transcribes the dialogue using Whisper (with GPU acceleration), fetches episode metadata from TVMaze, and uses an LLM (Gemini or Claude) to match the transcript to the correct episode. Then it renames or copies the files with proper episode information.

Installation

cargo install --path .

Pre-built Binaries

Pre-built binaries are available on the GitHub Releases page:

  • macOS (Apple Silicon & Intel): Built with Metal GPU acceleration
  • Linux (x86_64 & aarch64): Built with CPU-only Whisper
  • Windows (x86_64): Built with CPU-only Whisper

GPU Acceleration

DialogDetective uses whisper-rs for speech-to-text, which supports various GPU backends for faster transcription.

Default builds:

  • macOS: Metal (Apple GPU) - enabled automatically
  • Linux/Windows: CPU-only

Building with GPU support (Linux/Windows):

If you have the required GPU frameworks installed, you can build with GPU acceleration:

# NVIDIA CUDA (requires CUDA toolkit)
cargo build --release --features cuda

# Vulkan (requires Vulkan SDK)
cargo build --release --features vulkan

# AMD ROCm/hipBLAS (requires ROCm)
cargo build --release --features hipblas

See the whisper-rs documentation for detailed requirements for each GPU backend.

Prerequisites

  • Rust toolchain (install from rustup.rs)
  • FFmpeg - Must be installed and available in your PATH for audio extraction
    • macOS: brew install ffmpeg
    • Ubuntu/Debian: apt install ffmpeg
    • Windows: Download from ffmpeg.org
  • AI CLI: Gemini CLI (default) or Claude Code
    • Must be installed and authenticated before use

Whisper models are downloaded automatically on first run.

Quick Start

# Dry run - see what would happen (recommended first step)
# It is encouraged to limit processing to specific seasons. See below for more information about this.
dialog_detective ./videos "The Flash" -s 1

# Rename files in place
dialog_detective ./videos "The Flash" --mode rename -s 1

# Copy files to organized directory
dialog_detective ./videos "The Flash" --mode copy -o ./organized -s 1

# Select different Whisper model (default: base)
dialog_detective ./videos "The Flash" --model large-v3-turbo -s 1

# See all available options
dialog_detective --help

Season Filtering (Highly Encouraged!)

DialogDetective can work on full series without season filtering, but using -s or --season is highly encouraged for several important reasons:

  • Reduces LLM context size - Only sends relevant episodes to the AI instead of the entire series
  • Improves matching accuracy - Fewer episodes means less confusion and better identification
  • Saves tokens - Significantly reduces API costs, especially for long-running series
  • Faster processing - Less data to send and analyze

Important: The season filter limits the matching scope. If you specify -s 1 and a video file is actually from season 2, it will likely be mismatched to a season 1 episode. Only use season filtering when you know all your video files belong to the specified season(s).

Since you're typically processing a single season at a time when ripping discs, specifying the correct season makes the tool much more effective: -s 1 or --season 2

Usage

Run dialog_detective --help for complete usage information.

Important options:

  • -s / --season - Filter to specific season (highly encouraged, can be repeated)
  • --model - Select Whisper model
  • --matcher - AI backend: gemini (default) or claude
  • --mode - Operation: dry-run (default), rename, or copy
  • --list-models - Show all available Whisper models

Filename Templates

Use --format to customize output filenames.

Default: {show} - S{season:02}E{episode:02} - {title}.{ext}

Available variables:

  • {show} - Series name
  • {season} - Season number (use {season:02} for zero-padding)
  • {episode} - Episode number (use {episode:02} for zero-padding)
  • {title} - Episode title
  • {ext} - Original file extension

Example:

dialog_detective ./videos "The Flash" -s 1 \
  --format "{show} S{season:02}E{episode:02} {title}.{ext}"

AI Backend Integration

DialogDetective currently uses CLI tools for LLM access (Gemini CLI and Claude Code). This was the easiest way for me to quickly support LLMs, as I already had both tools installed and authenticated on my system.

The interface is abstracted enough to easily add direct API access via API keys (OpenAI, Anthropic, etc.) if there's demand for it. If you need direct API support, feel free to reach out or submit a PR - contributions are highly welcome!

Cache & Storage

DialogDetective caches various data to avoid redundant processing and speed up repeated runs.

Cache Location

All cached data is stored in a platform-specific cache directory:

  • macOS: ~/Library/Caches/de.westhoffswelt.dialogdetective/
  • Linux: ~/.cache/dialogdetective/
  • Windows: %LOCALAPPDATA%\dialogdetective\

What Gets Cached

Data Location TTL Why
Whisper Models models/ Permanent Models are large (39MB - 2.9GB) and don't change. Downloaded once from HuggingFace on first use.
Series Metadata metadata/ 24 hours Episode lists from TVMaze rarely change. Caching reduces API calls and speeds up repeated runs on the same show.
Transcripts transcripts/ 24 hours Whisper transcription is CPU/GPU intensive. Caching by video file hash means re-running on the same files skips transcription entirely.
Match Results matching/ 24 hours LLM matching costs tokens and time. Results are cached by a composite key (video hash + show + seasons + matcher), so identical queries return instantly.

The 24-hour TTL balances freshness with efficiency. If you need to force a refresh (e.g., after TVMaze updates episode data), simply delete the relevant cache subdirectory.

Temporary Files

During processing, DialogDetective extracts audio to temporary WAV files in your system's temp directory (/tmp, /var/folders/..., or %TEMP%). These files are automatically cleaned up when processing completes or if the program is interrupted.

Managing Cache

To clear all cached data:

# macOS
rm -rf ~/Library/Caches/de.westhoffswelt.dialogdetective/

# Linux
rm -rf ~/.cache/dialogdetective/

To clear only models (to free disk space):

# macOS
rm -rf ~/Library/Caches/de.westhoffswelt.dialogdetective/models/

# Linux
rm -rf ~/.cache/dialogdetective/models/

Use dialog_detective --list-models to see which models are currently cached and their sizes.

License

MIT License - see LICENSE file for details.