trimdown 0.1.5

File compression CLI tool for PowerPoint, PDF, Video, and Word documents
Documentation
# Trimdown Specification

## Overview
Trimdown is a high-performance file compression CLI tool written in Rust, designed to compress PowerPoint, PDF, Video, and Word documents while maintaining visual quality. The tool focuses on reducing file sizes through intelligent media optimization and format-specific compression strategies.

## Core Requirements

### Functional Requirements

#### FR1: File Type Support
- **FR1.1**: Support PowerPoint files (.pptx, .ppt)
- **FR1.2**: Support PDF files (.pdf)
- **FR1.3**: Support Video files (.mp4, .avi, .mov, .wmv, .mkv, .m4v, .flv, .webm)
- **FR1.4**: Support Word documents (.docx, .doc)

#### FR2: Compression Capabilities
- **FR2.1**: Image compression with configurable quality (1-100, default: 85)
- **FR2.2**: Image resizing with configurable max width (default: 1920px)
- **FR2.3**: Video compression using H.264 codec with configurable CRF (0-51, default: 28)
- **FR2.4**: PDF compression with multiple quality levels (screen, ebook, printer, prepress, maximum)
- **FR2.5**: Smart image format detection and correction for mislabeled files
- **FR2.6**: Preserve aspect ratios during image resizing

#### FR3: Processing Modes
- **FR3.1**: Single file compression mode
- **FR3.2**: Batch folder compression mode
- **FR3.3**: Auto-detection of folder vs file input
- **FR3.4**: Parallel processing for videos within PowerPoint files

#### FR4: User Interface
- **FR4.1**: Command-line interface with intuitive arguments
- **FR4.2**: Real-time progress bars for long-running operations
- **FR4.3**: Colored console output for better readability
- **FR4.4**: Verbose mode for detailed logging
- **FR4.5**: Compression statistics and reports

#### FR5: File Management
- **FR5.1**: Automatic output filename generation (_compressed suffix)
- **FR5.2**: Custom output path specification
- **FR5.3**: Force overwrite option for existing files
- **FR5.4**: Validation of output file integrity

### Non-Functional Requirements

#### NFR1: Performance
- **NFR1.1**: Process images in <1 second for typical sizes
- **NFR1.2**: Utilize multi-core processors for parallel video compression
- **NFR1.3**: Memory-efficient processing for large files
- **NFR1.4**: Achieve 50-75% compression ratio for PowerPoint files
- **NFR1.5**: Achieve 70-90% compression ratio for videos

#### NFR2: Reliability
- **NFR2.1**: Graceful handling of corrupted or invalid files
- **NFR2.2**: Atomic file operations to prevent data loss
- **NFR2.3**: Validation of compressed output files
- **NFR2.4**: Comprehensive error messages with actionable guidance

#### NFR3: Usability
- **NFR3.1**: Zero-configuration operation for basic use cases
- **NFR3.2**: Sensible defaults for all compression parameters
- **NFR3.3**: Clear documentation and help text
- **NFR3.4**: Cross-platform compatibility (macOS, Linux, Windows)

#### NFR4: Maintainability
- **NFR4.1**: Modular code architecture
- **NFR4.2**: Comprehensive test coverage (>80%)
- **NFR4.3**: Clear separation of concerns
- **NFR4.4**: Well-documented public APIs

## Technical Specifications

### Compression Algorithms

#### Image Compression
- **JPEG**: Progressive encoding with configurable quality
- **PNG**: Lossless recompression with flate optimization
- **BMP/TIFF**: Convert to WebP or JPEG for better compression
- **GIF**: Preserve animations, resize only if needed
- **WebP/AVIF**: Optimize with format-specific encoders

#### Video Compression
- **Codec**: H.264 (libx264)
- **Audio**: AAC at 128kbps
- **Optimization**: Fast start for web streaming
- **Progress**: Real-time monitoring via FFmpeg progress output

#### PDF Compression
- **Primary Method**: QPDF with linearization and stream compression
- **Features**: Object stream generation, image recompression, flate optimization
- **Validation**: PDF header verification and file integrity checks

### External Dependencies
- **FFmpeg**: Required for video compression
- **QPDF**: Required for PDF compression
- **Optional**: Graceful degradation when tools are unavailable

### File Format Handling

#### PowerPoint (.pptx, .ppt)
1. Extract ZIP archive to temporary directory
2. Locate media files in `ppt/media/`
3. Compress images sequentially
4. Compress videos in parallel (if FFmpeg available)
5. Repack ZIP archive with deflate compression
6. Validate output file size and integrity

#### PDF (.pdf)
1. Validate input PDF structure
2. Execute QPDF with optimization flags
3. Validate output PDF header and content
4. Handle exit codes (0=success, 3=success with warnings)

#### Video (various formats)
1. Probe video duration for progress estimation
2. Execute FFmpeg with H.264 encoding
3. Monitor progress via temporary progress file
4. Validate output file and compression ratio
5. Replace original only if compression is significant (>5%)

#### Word (.docx, .doc)
1. Extract ZIP archive to temporary directory
2. Locate media files in `word/media/`
3. Compress images with configured quality
4. Repack ZIP archive with deflate compression

### Error Handling Strategy
- **Invalid Input**: Clear error message with supported formats
- **Missing Tools**: Installation instructions for FFmpeg/QPDF
- **Corrupted Files**: Skip with warning, continue batch processing
- **Disk Space**: Check available space before processing
- **Permission Errors**: Clear error message with resolution steps

### Quality Assurance

#### Testing Strategy
- **Unit Tests**: Individual function validation
- **Integration Tests**: End-to-end compression workflows
- **Property Tests**: Format detection and validation
- **Performance Tests**: Compression speed and ratio benchmarks

#### Test Coverage Goals
- **Formats Module**: 100% coverage
- **Compression Module**: >85% coverage
- **Processor Module**: >80% coverage
- **Utils Module**: 100% coverage

## Configuration Options

### CLI Arguments
```
trimdown [OPTIONS] <INPUT>

Arguments:
  <INPUT>  Input file or folder path

Options:
  -o, --output <OUTPUT>          Output file path (single file mode only)
      --folder                   Force folder mode
  -q, --quality <QUALITY>        JPEG quality (1-100) [default: 85]
  -w, --max-width <MAX_WIDTH>    Maximum image width in pixels [default: 1920]
      --video-crf <VIDEO_CRF>    Video compression factor (0-51) [default: 28]
      --pdf-quality <QUALITY>    PDF compression quality [default: ebook]
      --pdf-method <METHOD>      PDF compression method [default: auto]
  -f, --force                    Overwrite output file if it exists
  -v, --verbose                  Verbose output
  -h, --help                     Print help
  -V, --version                  Print version
```

### PDF Quality Levels
- **screen**: Lowest quality, smallest size (72 DPI)
- **ebook**: Balanced quality and size (150 DPI) - default
- **printer**: High quality for printing (300 DPI)
- **prepress**: Professional printing quality (300 DPI)
- **maximum**: Maximum quality with all optimizations

### PDF Compression Methods
- **auto**: Automatically select best available method
- **native**: Use Rust-based lopdf library
- **qpdf**: Use QPDF external tool (recommended)
- **mutool**: Use MuPDF tools
- **ghostscript**: Use Ghostscript

## Success Metrics

### Compression Ratios
- **PowerPoint**: 50-75% size reduction
- **PDF**: 20-40% size reduction (varies by content)
- **Video**: 70-90% size reduction
- **Word**: 30-60% size reduction (depends on media content)

### Performance Targets
- **Image Processing**: <1 second per image
- **Video Processing**: Real-time or faster (1x-2x speed)
- **PDF Processing**: <5 seconds for typical documents
- **Batch Processing**: Linear scaling with file count

### Quality Preservation
- **Images**: Visually lossless at quality 85
- **Videos**: Minimal quality loss at CRF 28
- **PDFs**: Text and vector graphics preserved perfectly
- **Documents**: Layout and formatting preserved exactly

## Future Enhancements

### Planned Features
1. Excel file support (.xlsx, .xls)
2. OpenDocument format support (.odp, .odt)
3. Configuration file support (.trimdownrc)
4. Compression presets (fast, balanced, maximum)
5. Dry-run mode for preview
6. Metadata preservation options
7. Undo/restore functionality

### Potential Integrations
1. Web UI for drag-and-drop
2. Cloud storage integration (S3, Google Drive, Dropbox)
3. CI/CD pipeline integration
4. Batch processing API
5. Desktop GUI application