subx-cli 1.7.1

AI subtitle processing CLI tool, which automatically matches, renames, and converts subtitle files.
Documentation
//! AI service integration for intelligent subtitle matching and content analysis.
//!
//! This module provides a comprehensive AI service abstraction layer for SubX's
//! intelligent content analysis capabilities. It enables AI-powered subtitle-video
//! file matching through semantic analysis, content understanding, and confidence
//! scoring across multiple AI service providers.
//!
//! # Architecture Overview
//!
//! The AI service layer is built around a provider pattern that supports:
//! - **Multi-Provider Support**: OpenAI, Anthropic, and other AI backends
//! - **Content Analysis**: Deep understanding of video and subtitle content
//! - **Semantic Matching**: Intelligent file pairing beyond filename similarity
//! - **Confidence Scoring**: Quantitative match quality assessment
//! - **Caching Layer**: Persistent caching of expensive AI analysis results
//! - **Retry Logic**: Robust error handling with exponential backoff
//!
//! # Core Capabilities
//!
//! ## Content Analysis Engine
//! - **Video Metadata Extraction**: Title, series, episode, language detection
//! - **Subtitle Content Analysis**: Dialogue patterns, character names, themes
//! - **Cross-Reference Matching**: Semantic similarity between content types
//! - **Language Identification**: Automatic detection and verification
//! - **Quality Assessment**: Content quality scoring and recommendations
//!
//! ## Intelligent Matching Algorithm
//! 1. **Content Sampling**: Extract representative samples from subtitle files
//! 2. **Metadata Analysis**: Parse video filenames and directory structures
//! 3. **Semantic Analysis**: AI-powered content understanding and comparison
//! 4. **Confidence Scoring**: Multi-factor confidence calculation
//! 5. **Conflict Resolution**: Resolve ambiguous matches with user preferences
//! 6. **Verification**: Optional human-in-the-loop verification workflow
//!
//! ## Provider Management
//! - **Dynamic Provider Selection**: Choose optimal provider based on content type
//! - **Automatic Failover**: Seamless fallback between service providers
//! - **Cost Optimization**: Smart routing to minimize API usage costs
//! - **Rate Limiting**: Respect provider-specific rate limits and quotas
//! - **Usage Tracking**: Detailed usage statistics and cost monitoring
//!
//! # Usage Examples
//!
//! ## Basic Content Analysis
//! ```rust,ignore
//! use subx_cli::core::ComponentFactory;
//! use subx_cli::config::ProductionConfigService;
//! use subx_cli::Result;
//! use std::sync::Arc;
//!
//! async fn analyze_content() -> Result<()> {
//!     // Create AI client using component factory
//!     let config_service = Arc::new(ProductionConfigService::new()?);
//!     let factory = ComponentFactory::new(config_service.as_ref())?;
//!     let ai_client = factory.create_ai_provider()?;
//!     
//!     // AI client is ready for content analysis
//!     println!("AI client created and configured");
//!     Ok(())
//! }
//! ```
//!
//! ## Match Verification Workflow
//! ```rust,ignore
//! use subx_cli::services::ai::{AIProvider, VerificationRequest};
//!
//! async fn verify_matches(ai_client: Box<dyn AIProvider>) -> Result<()> {
//!     let verification = VerificationRequest {
//!         video_file: "movie.mp4".to_string(),
//!         subtitle_file: "movie_subtitles.srt".to_string(),
//!         match_factors: vec![
//!             "title_similarity".to_string(),
//!             "content_correlation".to_string(),
//!         ],
//!     };
//!     
//!     let confidence = ai_client.verify_match(verification).await?;
//!     
//!     if confidence.score > 0.9 {
//!         println!("Verification successful: {:.2}%", confidence.score * 100.0);
//!     } else {
//!         println!("Verification failed. Factors: {:?}", confidence.factors);
//!     }
//!     
//!     Ok(())
//! }
//! ```
//!
//! ## Advanced Provider Configuration
//! ```rust,ignore
//! use subx_cli::core::ComponentFactory;
//! use subx_cli::config::ProductionConfigService;
//! use std::sync::Arc;
//!
//! async fn configure_ai_services() -> Result<()> {
//!     // Create component factory with configuration service
//!     let config_service = Arc::new(ProductionConfigService::new()?);
//!     let factory = ComponentFactory::new(config_service.as_ref())?;
//!     
//!     // Create AI client with factory-injected configuration
//!     let client = factory.create_ai_provider()?;
//!     
//!     // Use configured client...
//!     println!("AI client configured with all settings from config service");
//!     Ok(())
//! }
//! ```
//!
//! # Performance Characteristics
//!
//! ## Processing Speed
//! - **Analysis Time**: 2-5 seconds per content analysis request
//! - **Batch Processing**: Concurrent processing of multiple file pairs
//! - **Caching Benefits**: 10-100x speedup for cached results
//! - **Network Latency**: Optimized for high-latency connections
//!
//! ## Resource Usage
//! - **Memory Footprint**: ~50-200MB for typical analysis sessions
//! - **API Costs**: $0.001-0.01 per analysis depending on content size
//! - **Cache Storage**: ~1-10KB per cached analysis result
//! - **Network Bandwidth**: 1-50KB per API request
//!
//! ## Accuracy Metrics
//! - **Match Accuracy**: >95% for properly named content
//! - **False Positive Rate**: <2% with confidence threshold >0.8
//! - **Language Detection**: >99% accuracy for supported languages
//! - **Content Understanding**: Context-aware matching for complex scenarios
//!
//! # Error Handling and Recovery
//!
//! The AI service layer provides comprehensive error handling:
//! - **Network Failures**: Automatic retry with exponential backoff
//! - **API Rate Limits**: Intelligent backoff and queue management
//! - **Service Unavailability**: Graceful fallback to alternative providers
//! - **Invalid Responses**: Response validation and error recovery
//! - **Timeout Handling**: Configurable timeout with partial result recovery
//!
//! # Security and Privacy
//!
//! - **Data Privacy**: Content samples are processed with privacy-focused prompts
//! - **API Key Management**: Secure credential storage and rotation
//! - **Content Filtering**: No permanent storage of user content on AI providers
//! - **Request Sanitization**: Input validation and safe prompt construction

use async_trait::async_trait;
use serde::{Deserialize, Serialize};

/// AI provider trait for content analysis and subtitle matching.
///
/// This trait defines the interface for AI services that can analyze
/// video and subtitle content to determine optimal matches.
#[async_trait]
pub trait AIProvider: Send + Sync {
    /// Analyze multimedia files and subtitle files for matching results.
    ///
    /// # Arguments
    ///
    /// * `request` - Analysis request containing files and content samples
    ///
    /// # Returns
    ///
    /// A `MatchResult` containing potential matches with confidence scores
    async fn analyze_content(&self, request: AnalysisRequest) -> crate::Result<MatchResult>;

    /// Verify file matching confidence.
    ///
    /// # Arguments
    ///
    /// * `verification` - Verification request for existing matches
    ///
    /// # Returns
    ///
    /// A confidence score for the verification request
    async fn verify_match(
        &self,
        verification: VerificationRequest,
    ) -> crate::Result<ConfidenceScore>;

    /// Send a raw chat completion request to the provider.
    ///
    /// This provides a provider-neutral entry point for callers (such as the
    /// translation engine) that need to issue prompts that do not fit the
    /// matching/verification request shape. Each provider implementation is
    /// responsible for handling authentication, retries, response size limits,
    /// and error sanitization, mirroring its specialized methods.
    ///
    /// # Arguments
    ///
    /// * `messages` - OpenAI-compatible chat messages, e.g.
    ///   `[{"role":"system","content":"..."},{"role":"user","content":"..."}]`.
    ///
    /// # Returns
    ///
    /// The assistant message text from the first choice in the response.
    ///
    /// # Errors
    ///
    /// Returns [`crate::error::SubXError::AiService`] for HTTP, parsing, or
    /// upstream provider errors. The default implementation returns
    /// [`crate::error::SubXError::AiService`] indicating the provider does not
    /// support raw chat completion; only providers that override this method
    /// support translation prompts.
    async fn chat_completion(&self, _messages: Vec<serde_json::Value>) -> crate::Result<String> {
        Err(crate::error::SubXError::ai_service(
            "AI provider does not support chat_completion",
        ))
    }
}

/// Analysis request structure for AI content analysis.
///
/// Contains all necessary information for AI services to analyze
/// and match video files with subtitle files.
#[derive(Debug, Serialize, Clone, PartialEq, Eq)]
pub struct AnalysisRequest {
    /// List of video file paths to analyze
    pub video_files: Vec<String>,
    /// List of subtitle file paths to analyze
    pub subtitle_files: Vec<String>,
    /// Content samples from subtitle files for analysis
    pub content_samples: Vec<ContentSample>,
}

/// Subtitle content sample for AI analysis.
///
/// Represents a sample of subtitle content that helps AI services
/// understand the content and context for matching purposes.
#[derive(Debug, Serialize, Clone, PartialEq, Eq)]
pub struct ContentSample {
    /// Filename of the subtitle file
    pub filename: String,
    /// Preview of the subtitle content
    pub content_preview: String,
    /// Size of the subtitle file in bytes
    pub file_size: u64,
}

/// AI analysis result containing potential file matches.
///
/// The primary result structure returned by AI services containing
/// matched files with confidence scores and reasoning.
#[derive(Debug, Deserialize, Clone, PartialEq)]
pub struct MatchResult {
    /// List of potential file matches
    pub matches: Vec<FileMatch>,
    /// Overall confidence score for the analysis (0.0 to 1.0)
    pub confidence: f32,
    /// AI reasoning explanation for the matches
    pub reasoning: String,
}

/// Individual file match information using unique file IDs.
///
/// Represents a single video-subtitle file pairing suggested by the AI
/// identified by unique IDs with associated confidence metrics and reasoning factors.
#[derive(Debug, Deserialize, Clone, PartialEq)]
pub struct FileMatch {
    /// Unique ID of the matched video file
    pub video_file_id: String,
    /// Unique ID of the matched subtitle file
    pub subtitle_file_id: String,
    /// Confidence score for this specific match (0.0 to 1.0)
    pub confidence: f32,
    /// List of factors that contributed to this match
    pub match_factors: Vec<String>,
}

/// Confidence score for AI matching decisions.
///
/// Represents the AI system's confidence in a particular match along
/// with the reasoning factors that led to that decision.
#[derive(Debug, Deserialize, Clone, PartialEq)]
pub struct ConfidenceScore {
    /// Numerical confidence score (typically 0.0 to 1.0)
    pub score: f32,
    /// List of factors that influenced the confidence score
    pub factors: Vec<String>,
}

/// Verification request structure for AI validation.
///
/// Used to request verification of a potential match between
/// a video file and subtitle file from the AI system.
#[derive(Debug, Serialize, Clone, PartialEq, Eq)]
pub struct VerificationRequest {
    /// Path to the video file
    pub video_file: String,
    /// Path to the subtitle file
    pub subtitle_file: String,
    /// Factors to consider when matching subtitles to video content
    pub match_factors: Vec<String>,
}

/// AI usage statistics.
#[derive(Debug, Clone)]
pub struct AiUsageStats {
    /// Name of the model used.
    pub model: String,
    /// Number of prompt tokens used.
    pub prompt_tokens: u32,
    /// Number of completion tokens used.
    pub completion_tokens: u32,
    /// Total number of tokens used.
    pub total_tokens: u32,
}

/// AI response content and usage statistics.
#[derive(Debug, Clone)]
pub struct AiResponse {
    /// Response content text.
    pub content: String,
    /// Usage statistics.
    pub usage: Option<AiUsageStats>,
}

/// Caching functionality for AI analysis results
pub mod cache;

/// OpenAI integration and client implementation
pub mod openai;
/// OpenRouter AI service provider client implementation
pub mod openrouter;

/// Azure OpenAI service provider client implementation
pub mod azure_openai;

/// Local / OpenAI-compatible LLM provider client implementation
pub mod local;

/// AI prompt templates and management
pub mod prompts;

/// Prompt and response helpers for AI-driven subtitle translation.
pub mod translation_prompts;

/// Retry logic and backoff strategies for AI services
pub mod retry;

/// Helpers for sanitizing upstream error payloads before surfacing them.
pub mod error_sanitizer;

/// Shared security helpers for AI client implementations.
pub mod security;

/// Hint-emission helpers shared by hosted-provider clients (`openai`,
/// `openrouter`, `azure-openai`).
pub mod hosted_hint;

pub use cache::AICache;
pub use openai::OpenAIClient;
pub use retry::{RetryConfig, retry_with_backoff};
pub use security::local_provider_hint;