# Trimdown Architecture
## Overview
Trimdown is a Rust-based CLI tool designed with a modular, maintainable architecture following DRY, KISS, and SoC principles. The codebase is organized as a single crate with clear module boundaries and minimal dependencies.
## Project Structure
```
trimdown-rs/
├── src/
│ ├── main.rs # CLI entry point and orchestration
│ ├── lib.rs # Public library interface
│ ├── cli.rs # Command-line argument parsing
│ ├── compression.rs # Core compression implementations
│ ├── formats.rs # File type detection and validation
│ ├── processor.rs # File and folder processing logic
│ └── utils.rs # Utility functions and helpers
├── Cargo.toml # Dependencies and build configuration
├── README.md # User documentation
├── SPEC.md # Technical specifications
├── ARCHITECTURE.md # This file
├── TODO.md # Task tracking
└── tests/ # Integration tests (future)
```
## Module Architecture
### 1. Main Module (`main.rs`)
**Responsibility**: Application entry point and high-level orchestration
**Key Functions**:
- Parse CLI arguments using clap
- Initialize logging with env_logger
- Route to single file or folder processing
- Display application header and basic info
**Dependencies**:
- `cli`: For argument parsing
- `processor`: For file/folder processing
- `colored`: For console output
- `tokio`: For async runtime
**Design Decisions**:
- Async main function to support parallel video compression
- Minimal logic - delegates to processor module
- Clear error messages for user guidance
### 2. CLI Module (`cli.rs`)
**Responsibility**: Command-line interface definition and parsing
**Key Components**:
- `Cli` struct: Main CLI configuration
- `PdfQuality` enum: PDF compression quality levels
- `PdfMethod` enum: PDF compression methods
**Design Decisions**:
- Uses clap derive macros for declarative CLI definition
- Sensible defaults for all optional parameters
- Type-safe enums for quality/method selection
- Clone trait for passing to async tasks
**Configuration Options**:
```rust
pub struct Cli {
pub input: PathBuf, // Required input path
pub output: Option<PathBuf>, // Optional output path
pub folder: bool, // Force folder mode
pub quality: u8, // JPEG quality (1-100)
pub max_width: u32, // Max image width
pub video_crf: u8, // Video CRF (0-51)
pub pdf_quality: PdfQuality, // PDF quality level
pub pdf_method: PdfMethod, // PDF compression method
pub force: bool, // Overwrite existing files
pub verbose: bool, // Verbose output
}
```
### 3. Formats Module (`formats.rs`)
**Responsibility**: File type detection and validation
**Key Components**:
- `FileType` enum: Supported file types
- `detect_file_type()`: Extension-based detection
- `is_supported_file()`: Quick validation
**Design Decisions**:
- Simple extension-based detection (sufficient for CLI use)
- Comprehensive test coverage (100%)
- Case-insensitive extension matching
- Clear separation from compression logic
**Supported Types**:
- PowerPoint: .pptx, .ppt
- PDF: .pdf
- Video: .mp4, .avi, .mov, .wmv, .mkv, .m4v, .flv, .webm
- Word: .docx, .doc
### 4. Compression Module (`compression.rs`)
**Responsibility**: Core compression implementations for all file types
**Key Functions**:
- `compress_powerpoint()`: PowerPoint compression with media optimization
- `compress_pdf()`: PDF compression using QPDF
- `compress_video()`: Video compression using FFmpeg
- `compress_word()`: Word document compression
- `compress_image_file()`: Image optimization
- `compress_video_file()`: Video file optimization
- `detect_and_fix_mislabeled_image()`: Smart image format detection
**Design Patterns**:
- **Strategy Pattern**: Different compression strategies per file type
- **Template Method**: Common extraction/compression/repacking workflow
- **Async/Await**: Parallel video compression in PowerPoint files
**Compression Workflows**:
#### PowerPoint Compression
```
1. Extract PPTX ZIP archive → temp directory
2. Scan ppt/media/ for images and videos
3. Compress images sequentially (fast)
4. Compress videos in parallel (slow, benefits from parallelism)
5. Repack ZIP with deflate compression
6. Validate output file
```
#### PDF Compression
```
1. Check for QPDF availability
2. Execute QPDF with optimization flags:
- Linearization for fast web viewing
- Stream compression
- Flate recompression
- Image optimization
3. Validate PDF header and structure
4. Handle exit codes (0=success, 3=warnings)
```
#### Video Compression
```
1. Check for FFmpeg availability
2. Probe video duration for progress estimation
3. Execute FFmpeg with H.264 encoding:
- CRF-based quality control
- AAC audio at 128kbps
- Fast start for streaming
4. Monitor progress via temporary file
5. Validate output and compression ratio
6. Replace original only if significant compression
```
#### Word Compression
```
1. Extract DOCX ZIP archive → temp directory
2. Scan word/media/ for images
3. Compress images with configured quality
4. Repack ZIP with deflate compression
5. Validate output file
```
**Image Compression Strategy**:
- Skip files < 150KB (already small)
- Detect mislabeled files (e.g., JPEG with .png extension)
- Resize if width > max_width using Lanczos3 filter
- Format-specific optimization:
- JPEG: Progressive encoding with quality setting
- PNG: Lossless recompression
- BMP/TIFF: Convert to WebP or JPEG
- GIF: Preserve animations, resize only
- WebP/AVIF: Format-specific optimization
**Error Handling**:
- Graceful degradation when external tools unavailable
- Clear error messages with installation instructions
- Validation of output files before replacing originals
- Atomic file operations to prevent data loss
### 5. Processor Module (`processor.rs`)
**Responsibility**: File and folder processing orchestration
**Key Functions**:
- `process_single_file()`: Single file compression workflow
- `process_folder()`: Batch folder compression workflow
**Design Decisions**:
- Clear separation between single and batch processing
- Comprehensive statistics and reporting
- Force flag validation before processing
- Progress tracking for batch operations
**Single File Workflow**:
```
1. Validate input file exists
2. Generate output path (or use provided)
3. Check for existing output (respect --force flag)
4. Detect file type
5. Execute appropriate compression function
6. Display compression statistics
```
**Folder Workflow**:
```
1. Scan folder for supported files (max depth 1)
2. Display file list with sizes and types
3. Process each file sequentially:
- Generate output path with _compressed suffix
- Check for existing output (respect --force flag)
- Compress file
- Track statistics
4. Display summary:
- Files processed
- Total original size
- Total compressed size
- Overall compression ratio
```
### 6. Utils Module (`utils.rs`)
**Responsibility**: Shared utility functions
**Key Functions**:
- `check_external_tool()`: Verify external tool availability
- `print_warning()`: Colored warning messages
- `print_success()`: Colored success messages
- `print_error()`: Colored error messages
- `print_info()`: Colored info messages
- `format_size()`: Human-readable file sizes
**Design Decisions**:
- Simple, focused functions
- No business logic
- 100% test coverage
- Consistent output formatting
## Data Flow
### Single File Compression
```
User Input (CLI)
↓
main.rs: Parse arguments
↓
processor.rs: process_single_file()
↓
formats.rs: detect_file_type()
↓
compression.rs: compress_*()
↓
utils.rs: Display results
↓
User Output (Console)
```
### Batch Folder Compression
```
User Input (CLI)
↓
main.rs: Parse arguments
↓
processor.rs: process_folder()
↓
walkdir: Scan directory
↓
formats.rs: Filter supported files
↓
Loop: For each file
↓
compression.rs: compress_*()
↓
utils.rs: Display progress
↓
processor.rs: Aggregate statistics
↓
User Output (Console)
```
## Concurrency Model
### Async Runtime
- **Runtime**: Tokio multi-threaded runtime
- **Purpose**: Enable parallel video compression within PowerPoint files
- **Scope**: Limited to video processing tasks
### Parallelism Strategy
- **Images**: Sequential processing (fast, I/O bound)
- **Videos**: Parallel processing (slow, CPU bound)
- **Files**: Sequential processing (simplicity, clear progress)
### Synchronization
- **No shared state**: Each task operates on separate files
- **No locks needed**: File system provides atomicity
- **Progress tracking**: Clone progress bar for async tasks
## Error Handling Strategy
### Error Types
1. **User Errors**: Invalid input, missing files, permission issues
2. **System Errors**: Missing external tools, disk space, I/O errors
3. **Format Errors**: Corrupted files, unsupported formats
4. **Compression Errors**: Tool failures, validation failures
### Error Handling Approach
- **anyhow**: For error propagation and context
- **thiserror**: For custom error types (future)
- **Result<T>**: All fallible operations return Result
- **Graceful degradation**: Skip problematic files in batch mode
### Error Messages
- **Clear**: Describe what went wrong
- **Actionable**: Provide resolution steps
- **Colored**: Red for errors, yellow for warnings
- **Contextual**: Include file names and paths
## Dependencies
### Core Dependencies
- **clap**: CLI argument parsing (derive macros)
- **tokio**: Async runtime for parallel processing
- **anyhow**: Error handling and context
- **colored**: Terminal color output
### Compression Dependencies
- **zip**: ZIP archive handling (PPTX, DOCX)
- **flate2**: Deflate compression
- **image**: Image processing and format conversion
- **lopdf**: PDF manipulation (native fallback)
### UI Dependencies
- **indicatif**: Progress bars
- **console**: Terminal utilities
### Utility Dependencies
- **walkdir**: Directory traversal
- **tempfile**: Temporary file/directory management
- **serde**: Serialization (future config files)
- **log/env_logger**: Logging infrastructure
### External Tools
- **FFmpeg**: Video compression (optional)
- **QPDF**: PDF compression (optional)
## Testing Strategy
### Unit Tests
- **Location**: Inline with modules (`#[cfg(test)]`)
- **Coverage**: Individual functions and edge cases
- **Focus**: formats.rs (100% coverage)
### Integration Tests
- **Location**: `tests/` directory (future)
- **Coverage**: End-to-end compression workflows
- **Focus**: Real file processing with validation
### Test Data
- **Sample Files**: Small test files for each format
- **Corrupted Files**: Invalid/corrupted files for error handling
- **Edge Cases**: Empty files, large files, special characters
### Test Utilities
- **tempfile**: Temporary test files and directories
- **assert_cmd**: CLI testing (future)
## Performance Considerations
### Memory Management
- **Streaming**: Process files in chunks where possible
- **Temporary Files**: Use temp directories for extraction
- **Cleanup**: Automatic cleanup via TempDir RAII
### CPU Utilization
- **Parallel Videos**: Utilize all CPU cores for video compression
- **Sequential Images**: Fast enough without parallelism
- **Batch Processing**: Sequential to avoid resource contention
### I/O Optimization
- **Buffered I/O**: Use buffered readers/writers
- **Minimal Copies**: Avoid unnecessary file copies
- **In-Place Updates**: Compress directly when safe
### Progress Reporting
- **Granular Updates**: Update progress every 5 images or 500ms
- **Estimated Duration**: Use video duration for progress calculation
- **Non-Blocking**: Progress monitoring in separate task
## Security Considerations
### Input Validation
- **Path Traversal**: Validate ZIP entries don't escape extraction directory
- **File Size**: Check available disk space before processing
- **Format Validation**: Verify file headers and structure
### External Tools
- **Command Injection**: Use structured arguments, not shell strings
- **Tool Verification**: Check tool availability before execution
- **Output Validation**: Verify compressed files are valid
### File Operations
- **Atomic Writes**: Use temp files and rename for atomicity
- **Permission Checks**: Verify write permissions before processing
- **Cleanup**: Always clean up temporary files
## Extensibility
### Adding New File Types
1. Add variant to `FileType` enum in `formats.rs`
2. Update `detect_file_type()` with new extensions
3. Implement `compress_<type>()` in `compression.rs`
4. Add case in `process_single_file()` and `process_folder()`
5. Add tests for new format
### Adding Compression Options
1. Add field to `Cli` struct in `cli.rs`
2. Update compression functions to use new option
3. Update documentation and help text
4. Add tests for new option
### Adding External Tools
1. Add tool check in relevant compression function
2. Implement fallback strategy if tool unavailable
3. Add installation instructions to error messages
4. Update documentation with new dependency
## Build and Release
### Build Configuration
- **Edition**: Rust 2024
- **Optimization**: Size-optimized release builds (`opt-level = "z"`)
- **LTO**: Enabled for smaller binaries
- **Strip**: Debug symbols removed in release
### Release Process
1. Update version in `Cargo.toml`
2. Update `CHANGELOG.md`
3. Run full test suite: `cargo test`
4. Build release binary: `cargo build --release`
5. Test binary manually
6. Create git tag: `git tag v0.1.0`
7. Push tag: `git push origin v0.1.0`
8. Update Homebrew formula
9. Publish to crates.io (future)
### Distribution
- **Homebrew**: Primary distribution method for macOS
- **Cargo**: `cargo install trimdown` (future)
- **GitHub Releases**: Binary releases for all platforms (future)
## Future Architecture Improvements
### Planned Enhancements
1. **Plugin System**: Allow custom compression strategies
2. **Configuration Files**: Support .trimdownrc for defaults
3. **Trait-Based Design**: Define Compressor trait for extensibility
4. **Streaming API**: Process files without full extraction
5. **Web API**: HTTP API for remote compression
6. **GUI**: Desktop application with drag-and-drop
### Refactoring Opportunities
1. **Error Types**: Custom error types with thiserror
2. **Compression Trait**: Abstract compression interface
3. **Progress Trait**: Pluggable progress reporting
4. **Config Module**: Centralized configuration management
5. **Metrics Module**: Detailed performance metrics
## Maintenance Guidelines
### Code Style
- **Formatting**: Use `rustfmt` with default settings
- **Linting**: Use `clippy` with default lints
- **Documentation**: Document all public APIs
- **Comments**: Explain why, not what
### Dependency Management
- **Updates**: Review and update dependencies quarterly
- **Security**: Monitor for security advisories
- **Minimal**: Only add dependencies when necessary
- **Versions**: Use specific versions, not wildcards
### Testing Requirements
- **Coverage**: Maintain >80% test coverage
- **CI**: Run tests on all commits
- **Benchmarks**: Track performance regressions
- **Integration**: Test with real files regularly
### Documentation
- **README**: User-facing documentation
- **SPEC**: Technical specifications
- **ARCHITECTURE**: This document
- **TODO**: Task tracking and roadmap
- **CHANGELOG**: Version history and changes