Module matcher

Module matcher 

Source
Expand description

AI-powered subtitle file matching and discovery engine.

This module provides sophisticated algorithms for automatically matching subtitle files with their corresponding video files using AI analysis, language detection, and intelligent filename pattern recognition. It handles complex scenarios including multiple subtitle languages, season/episode structures, and various naming conventions.

§Core Features

§Intelligent File Discovery

  • Recursive Search: Traverses directory structures to find media and subtitle files
  • Format Detection: Automatically identifies video and subtitle file formats
  • Pattern Recognition: Understands common naming patterns and conventions
  • Language Detection: Identifies subtitle languages from filenames and content

§AI-Powered Matching

  • Semantic Analysis: Uses AI to understand filename semantics beyond patterns
  • Content Correlation: Matches based on content similarity and timing patterns
  • Multi-Language Support: Handles subtitle files in different languages
  • Confidence Scoring: Provides match confidence levels for user validation

§Advanced Matching Algorithms

  • Fuzzy Matching: Tolerates variations in naming conventions
  • Episode Detection: Recognizes season/episode patterns in TV series
  • Quality Assessment: Evaluates subtitle quality and completeness
  • Conflict Resolution: Handles multiple subtitle candidates intelligently

§Architecture Overview

The matching system consists of several interconnected components:

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Discovery     │────│   AI Analysis    │────│   Match Engine  │
│   - Find files  │    │   - Semantic     │    │   - Score calc  │
│   - Language    │    │   - Content      │    │   - Validation  │
│   - Metadata    │    │   - Confidence   │    │   - Ranking     │
└─────────────────┘    └──────────────────┘    └─────────────────┘
        │                        │                        │
        └────────────────────────┼────────────────────────┘
                                 │
                   ┌─────────────────────────┐
                   │       Cache System      │
                   │   - Analysis results    │
                   │   - Match history       │
                   │   - Performance data    │
                   └─────────────────────────┘

§Usage Examples

§Basic File Matching

use subx_cli::core::matcher::{MatchEngine, MatchConfig, FileDiscovery};
use std::path::Path;

// Configure matching parameters
let config = MatchConfig {
    confidence_threshold: 0.8,
    dry_run: false,
    ai_provider: Some("openai".to_string()),
    ..Default::default()
};

// Initialize the matching engine
let engine = MatchEngine::new(config);

// Discover files in directories
let discovery = FileDiscovery::new();
let video_files = discovery.find_media_files(Path::new("/videos"))?;
let subtitle_files = discovery.find_subtitle_files(Path::new("/subtitles"))?;

// Perform matching
let matches = engine.match_files(&video_files, &subtitle_files).await?;

for match_result in matches {
    println!("Matched: {} -> {} (confidence: {:.2})",
        match_result.video_file.name,
        match_result.subtitle_file.name,
        match_result.confidence
    );
}

§Advanced Matching with Language Filtering

use subx_cli::core::matcher::MatchConfig;

let config = MatchConfig {
    target_languages: vec!["zh".to_string(), "en".to_string()],
    exclude_languages: vec!["jp".to_string()],
    confidence_threshold: 0.75,
    max_matches_per_video: 2, // Allow multiple subtitle languages
    ..Default::default()
};

let matches = engine.match_files_with_config(&video_files, &subtitle_files, config).await?;

§TV Series Episode Matching

// For TV series with season/episode structure
let tv_config = MatchConfig {
    series_mode: true,
    season_episode_patterns: vec![
        r"S(\d+)E(\d+)".to_string(),
        r"Season (\d+) Episode (\d+)".to_string(),
    ],
    ..Default::default()
};

let tv_matches = engine.match_tv_series(&video_files, &subtitle_files, tv_config).await?;

§Matching Algorithms

§1. Filename Analysis

  • Pattern Extraction: Identifies common patterns like episode numbers, years, quality markers
  • Language Code Detection: Recognizes language codes in various formats (en, eng, english, etc.)
  • Normalization: Standardizes filenames for comparison by removing common variations

§2. AI Semantic Analysis

  • Title Extraction: Uses AI to identify actual titles from complex filenames
  • Content Understanding: Analyzes subtitle content to understand context and themes
  • Cross-Reference: Compares extracted information between video and subtitle files

§3. Confidence Scoring

  • Multiple Factors: Combines filename similarity, language match, content correlation
  • Weighted Scoring: Applies different weights based on reliability of each factor
  • Threshold Filtering: Only presents matches above configurable confidence levels

§4. Conflict Resolution

  • Ranking: Orders multiple candidates by confidence score
  • Deduplication: Removes duplicate or overlapping matches
  • User Preferences: Applies user-defined preferences for language, quality, etc.

§Performance Characteristics

  • Caching: Results are cached to avoid re-analysis of unchanged files
  • Parallel Processing: File analysis is performed concurrently for speed
  • Incremental Updates: Only processes new or modified files in subsequent runs
  • Memory Efficient: Streams large directory structures without loading all data

§Error Handling

The matching system provides comprehensive error handling for:

  • File system access issues (permissions, missing directories)
  • AI service connectivity and quota problems
  • Invalid or corrupted subtitle files
  • Configuration validation errors
  • Network timeouts and service degradation

§Thread Safety

All matching operations are thread-safe and can be used concurrently. The cache system uses appropriate synchronization for multi-threaded access.

Re-exports§

pub use discovery::FileDiscovery;
pub use discovery::MediaFile;
pub use discovery::MediaFileType;
pub use engine::MatchConfig;
pub use engine::MatchEngine;
pub use engine::MatchOperation;

Modules§

cache
Caching utilities for the file matching engine.
discovery
Media file discovery utilities.
engine
File matching engine that uses AI content analysis to align video and subtitle files.

Structs§

FileInfo
Extended file information structure with metadata for intelligent matching.