Function detect_encoding_command

Source
pub fn detect_encoding_command(
    file_paths: &[String],
    verbose: bool,
) -> Result<()>
Expand description

Execute character encoding detection for subtitle files with comprehensive analysis.

This function performs advanced character encoding detection on subtitle files, providing detailed information about detected encodings, confidence levels, and content samples. It supports both basic detection and verbose analysis modes to meet different user needs.

§Detection Process

  1. File Validation: Verify file existence and accessibility
  2. Initial Scanning: Read file header and sample content
  3. BOM Detection: Check for Unicode Byte Order Marks
  4. Statistical Analysis: Analyze byte patterns and character frequencies
  5. Language Heuristics: Apply language-specific detection rules
  6. Confidence Calculation: Score each potential encoding
  7. Result Ranking: Order candidates by confidence level
  8. Output Generation: Format results for user presentation

§Verbose Mode Features

When verbose is enabled, the output includes:

  • Confidence Percentages: Numerical reliability scores
  • Content Samples: Decoded text previews
  • Alternative Encodings: Other possible encodings with scores
  • Detection Metadata: Technical details about the detection process
  • Language Hints: Probable content language indicators

§Error Handling

The function provides robust error handling:

  • File Access: Clear messages for permission or existence issues
  • Corruption Detection: Identification of damaged or invalid files
  • Encoding Failures: Graceful handling of undetectable encodings
  • Partial Processing: Continue with other files if individual files fail

§Output Formats

§Basic Mode

file1.srt: UTF-8
file2.ass: Windows-1252
file3.vtt: GB2312

§Verbose Mode

file1.srt: UTF-8 (99.5% confidence)
Sample: "1\n00:00:01,000 --> 00:00:03,000\nHello World"
Alternatives: ISO-8859-1 (15.2%), Windows-1252 (12.8%)
Language: English (detected)

file2.ass: Windows-1252 (87.3% confidence)
Sample: "[Script Info]\nTitle: Movie Subtitle"
Alternatives: ISO-8859-1 (45.1%), UTF-8 (23.7%)
Language: Mixed/Unknown

§Performance Considerations

  • Streaming Analysis: Large files processed efficiently
  • Sample-based Detection: Uses representative file portions
  • Caching: Results cached for repeated operations
  • Parallel Processing: Multiple files analyzed concurrently

§Arguments

  • file_paths - Vector of file paths to analyze for encoding
  • verbose - Enable detailed output with confidence scores and samples

§Returns

Returns Ok(()) on successful analysis completion, or an error if:

  • Critical system resources are unavailable
  • All specified files are inaccessible
  • The encoding detection system fails to initialize

§Examples

use subx_cli::commands::detect_encoding_command;

// Quick encoding check for single file
detect_encoding_command::detect_encoding_command(
    &["subtitle.srt".to_string()],
    false
)?;

// Detailed analysis for multiple files
let files = vec![
    "episode1.srt".to_string(),
    "episode2.ass".to_string(),
    "episode3.vtt".to_string(),
];
detect_encoding_command::detect_encoding_command(&files, true)?;

// Batch analysis with glob patterns (shell expansion)
let glob_files = vec![
    "season1/*.srt".to_string(),
    "season2/*.ass".to_string(),
];
detect_encoding_command::detect_encoding_command(&glob_files, false)?;

§Use Cases

  • Troubleshooting: Identify encoding issues causing display problems
  • Conversion Planning: Determine current encoding before conversion
  • Quality Assurance: Verify encoding consistency across file collections
  • Migration: Assess encoding diversity when migrating subtitle libraries
  • Automation: Integrate encoding detection into batch processing workflows