rustdupe 0.2.0 - Docs.rs

# RustDupe Implementation Roadmap

> **Version:** 1.0  
> **Last Updated:** 2026-02-05  
> **Target Release:** v1.0.0 by Q4 2026  
> **Current Version:** v0.2.0 (stable TUI with core duplicate detection)

---

## Table of Contents

1. [Executive Summary](#executive-summary)
2. [Version Strategy](#version-strategy)
3. [Roadmap Overview](#roadmap-overview)
4. [Phase Breakdown](#phase-breakdown)
5. [Detailed Feature Cards](#detailed-feature-cards)
6. [Dependencies & Prerequisites](#dependencies--prerequisites)
7. [Risk Mitigation](#risk-mitigation)
8. [Success Metrics](#success-metrics)
9. [Resource Requirements](#resource-requirements)
10. [Appendix: Deferred Features](#appendix-deferred-features)

---

## Executive Summary

### Vision

RustDupe aims to become the **premier cross-platform duplicate file finder** by combining the performance of Rust-based tools (like fclones and czkawka) with a superior interactive TUI experience and enterprise-grade reliability. The project targets power users, developers, and IT professionals who need fast, safe, and comprehensive duplicate detection.

### Key Milestones

| Milestone | Version | Target Date | Key Deliverable |
|-----------|---------|-------------|-----------------|
| Foundation | v0.3.0 | Q1 2026 | Stability and quick wins |
| Performance | v0.4.0 | Q2 2026 | Major performance gains and new detection modes |
| Integration | v0.5.0 | Q3 2026 | Cloud support and advanced UX features |
| Production | v1.0.0 | Q4 2026 | Enterprise-ready, comprehensive feature set |

### Strategic Goals

1. **Performance Leadership**: Match or exceed fclones' performance benchmarks (34s for 316GB dataset)
2. **Feature Parity**: Match czkawka's advanced detection modes (similar images, audio, video)
3. **TUI Excellence**: Maintain unique positioning as the premier TUI-based duplicate finder
4. **Cross-Platform Excellence**: First-class support for Windows, macOS, and Linux

---

## Version Strategy

### Semantic Versioning

RustDupe follows [Semantic Versioning 2.0.0](https://semver.org/):

| Version Component | Increment When | Example |
|-------------------|----------------|---------|
| **MAJOR** (X.0.0) | Breaking CLI/API changes, major architectural shifts | v1.0.0: Stable API freeze |
| **MINOR** (0.X.0) | New features, enhancements, non-breaking additions | v0.3.0: Multi-directory scanning |
| **PATCH** (0.0.X) | Bug fixes, performance improvements, documentation | v0.2.1: Critical bug fix |

### Release Cadence

| Phase | Cadence | Notes |
|-------|---------|-------|
| Pre-1.0 (v0.x) | Monthly minor releases | Rapid iteration based on feedback |
| Post-1.0 (v1.x) | Quarterly minor releases | Stability-focused, longer support cycles |
| All versions | As-needed patch releases | Critical fixes within 48 hours |

### Version Support Policy

- **Current minor version**: Full support (bug fixes, features)
- **Previous minor version**: Security fixes only (4 weeks overlap)
- **Older versions**: Community support only

### Pre-Release Tags

- `alpha`: Feature incomplete, internal testing
- `beta`: Feature complete, community testing
- `rc` (release candidate): Final testing before stable

---

## Roadmap Overview

### High-Level Timeline

```mermaid
gantt
    title RustDupe Implementation Roadmap 2026
    dateFormat YYYY-MM-DD
    axisFormat %b
    
    section Phase 1: Foundation
    Multi-directory scanning      :p1_md, 2026-02-01, 3w
    Configuration file support    :p1_cfg, after p1_md, 2w
    Improved error handling       :p1_err, after p1_cfg, 2w
    Enhanced TUI navigation       :p1_ui, after p1_err, 2w
    Memory-mapped I/O option      :p1_mmap, after p1_ui, 2w
    
    section Phase 2: Performance
    Bloom filters                 :p2_bloom, 2026-04-01, 3w
    Adaptive buffer sizing        :p2_buf, after p2_bloom, 2w
    SIMD optimizations audit      :p2_simd, after p2_buf, 2w
    Perceptual image hashing      :p2_img, after p2_simd, 4w
    Fuzzy text matching           :p2_txt, after p2_img, 3w
    
    section Phase 3: Integration
    Cloud storage scanning        :p3_cloud, 2026-07-01, 4w
    Real-time monitoring          :p3_monitor, after p3_cloud, 4w
    Plugin system                 :p3_plugin, after p3_monitor, 3w
    Advanced selection rules      :p3_rules, after p3_plugin, 3w
    
    section Phase 4: Production
    Performance validation        :p4_perf, 2026-10-01, 3w
    Security audit                :p4_sec, after p4_perf, 3w
    Documentation overhaul        :p4_docs, after p4_sec, 4w
    Enterprise features           :p4_ent, after p4_docs, 4w
    v1.0.0 release                :milestone, p4_end, 2026-12-31, 0d
```

### Quarterly Summary

| Quarter | Focus | Major Deliverables |
|---------|-------|-------------------|
| **Q1 2026** | Foundation | Multi-dir scanning, config files, stability |
| **Q2 2026** | Performance | Bloom filters, perceptual hashing, fuzzy matching |
| **Q3 2026** | Integration | Cloud storage, monitoring, plugin system |
| **Q4 2026** | Production | Security audit, enterprise features, v1.0.0 |

---

## Phase Breakdown

### Phase 1: v0.3.0 - Foundation Improvements (Q1 2026)

**Theme:** Stability, quick wins, and core usability enhancements

#### Goals
- Address most-requested missing features
- Improve stability and error handling
- Establish configuration management foundation
- Maintain backward compatibility

#### Feature List

| Feature | Effort | Priority | Dependencies |
|---------|--------|----------|--------------|
| Multi-directory scanning | 5 days | Critical | None |
| Configuration file support | 4 days | High | None |
| Enhanced TUI navigation | 4 days | High | None |
| Improved error handling | 3 days | Critical | None |
| Memory-mapped I/O option | 3 days | Medium | None |
| Better progress indicators | 2 days | Medium | None |
| Export format improvements | 2 days | Low | None |

**Total Estimated Effort:** 23 days (~1 month with parallel work)

#### v0.3.0 Success Criteria
- [ ] Can scan 3+ directories simultaneously
- [ ] Config file support for all CLI options
- [ ] Zero crashes on malformed inputs
- [ ] Memory-mapped I/O available as `--mmap` flag
- [ ] 95%+ test coverage for new code

---

### Phase 2: v0.4.0 - Performance & Detection (Q2 2026)

**Theme:** Major performance optimizations and advanced detection methods

#### Goals
- Achieve performance parity with fclones
- Add perceptual hashing for images
- Implement fuzzy text matching
- Lay groundwork for audio/video detection

#### Feature List

| Feature | Effort | Priority | Dependencies |
|---------|--------|----------|--------------|
| Bloom filters for pre-filtering | 5 days | Critical | None |
| Adaptive buffer sizing | 3 days | High | Performance profiling |
| SIMD optimization audit | 4 days | Medium | None |
| Perceptual image hashing | 8 days | Critical | Image processing |
| Fuzzy text matching (SimHash) | 6 days | High | Text processing |
| Partial content pre-hashing | 3 days | Medium | Hashing |
| Performance benchmarking suite | 4 days | High | CI/CD |

**Total Estimated Effort:** 33 days (~6 weeks with parallel work)

#### v0.4.0 Success Criteria
- [ ] Within 10% of fclones performance on benchmark suite
- [ ] Perceptual image detection with configurable thresholds
- [ ] Fuzzy document matching operational
- [ ] Bloom filter reduces hash comparisons by 30%+
- [ ] Comprehensive performance regression testing

---

### Phase 3: v0.5.0 - Integration & UX (Q3 2026)

**Theme:** Cloud integration, real-time monitoring, and advanced user experience

#### Goals
- Enable cloud storage scanning
- Add real-time directory monitoring
- Build extensible plugin system
- Implement advanced selection rules

#### Feature List

| Feature | Effort | Priority | Dependencies |
|---------|--------|----------|--------------|
| Cloud storage integration | 10 days | High | OAuth, REST APIs |
| Real-time file monitoring | 6 days | Medium | notify crate |
| Plugin system architecture | 8 days | Medium | DLL/dylib loading |
| Advanced selection rules | 5 days | High | Rule engine |
| Enhanced reporting/exports | 4 days | Medium | Template system |
| Undo/rollback improvements | 3 days | Medium | Transaction log |

**Total Estimated Effort:** 36 days (~7 weeks with parallel work)

#### v0.5.0 Success Criteria
- [ ] Google Drive/OneDrive/Dropbox scanning operational
- [ ] Real-time duplicate detection via file monitoring
- [ ] Plugin API documented with example plugins
- [ ] Rule-based auto-selection working
- [ ] Enhanced HTML reports with visual diff

---

### Phase 4: v1.0.0 - Production Ready (Q4 2026)

**Theme:** Stabilization, security, documentation, and enterprise features

#### Goals
- Achieve production-ready stability
- Complete security audit
- Comprehensive documentation
- Enterprise-grade features

#### Feature List

| Feature | Effort | Priority | Dependencies |
|---------|--------|----------|--------------|
| Security audit & hardening | 5 days | Critical | All prior code |
| Performance validation | 4 days | Critical | Benchmark suite |
| Documentation overhaul | 6 days | Critical | All features |
| Enterprise features | 5 days | Medium | LDAP, policies |
| API stability freeze | 3 days | Critical | None |
| Long-term support planning | 2 days | Medium | None |

**Total Estimated Effort:** 25 days (~5 weeks)

#### v1.0.0 Success Criteria
- [ ] No open critical or high-severity security issues
- [ ] API backward compatibility guarantee
- [ ] Complete user and developer documentation
- [ ] Enterprise authentication integration (optional)
- [ ] 6-month support commitment for v1.0.x

---

## Detailed Feature Cards

### Phase 1 Features

#### FC-001: Multi-Directory Scanning

**Description:**
Enable scanning of multiple directories in a single command, with support for directory groups and exclusion patterns.

**Acceptance Criteria:**
1. Can specify multiple directories: `rustdupe scan /path/1 /path/2 /path/3`
2. Directory groups can be named: `--group photos=/Photos --group docs=/Documents`
3. Per-directory exclusion patterns supported
4. Reference directory concept works across multiple inputs
5. Progress shows per-directory and total progress

**Dependencies:**
- Internal: None (extends existing scanning logic)
- External: None

**Effort Estimate:** 5 days (40 hours)

**Assigned Phase:** v0.3.0

**Technical Notes:**
- Extend `jwalk` parallel traversal to multiple roots
- Aggregate results in hash map before duplicate detection
- UI needs directory tree view or grouped display

---

#### FC-002: Configuration File Support

**Description:**
Support for YAML/TOML configuration files to persist commonly used options and directory configurations.

**Acceptance Criteria:**
1. Config file at `~/.config/rustdupe/config.toml` auto-loaded
2. CLI flags override config file settings
3. Config includes all current CLI options
4. Multiple named profiles supported: `--profile photos`
5. Config validation with helpful error messages

**Dependencies:**
- Internal: None
- External: `serde`, `toml` crates

**Effort Estimate:** 4 days (32 hours)

**Assigned Phase:** v0.3.0

**Technical Notes:**
- Use XDG directories crate for cross-platform paths
- Config structure mirrors CLI arguments
- Consider using `figment` for layered config (file + env + CLI)

---

#### FC-003: Enhanced TUI Navigation

**Description:**
Improve TUI with vim-style keybindings, search within results, and better keyboard navigation.

**Acceptance Criteria:**
1. Vim keybindings (j/k for navigation, / for search)
2. Search/filter within duplicate groups
3. Bulk selection by pattern (e.g., all in Downloads)
4. Expand/collapse groups
5. Sortable columns (size, path, date)

**Dependencies:**
- Internal: Existing TUI code
- External: ratatui features

**Effort Estimate:** 4 days (32 hours)

**Assigned Phase:** v0.3.0

**Technical Notes:**
- Leverage ratatui's `Table` widget with sorting
- Implement fuzzy search with `nucleo` or `skim`
- Maintain accessibility for non-vim users

---

#### FC-004: Improved Error Handling

**Description:**
Comprehensive error handling improvements with user-friendly messages and graceful degradation.

**Acceptance Criteria:**
1. All errors include context and suggestions
2. Permission errors suggest elevation options
3. Continue scanning on non-fatal errors
4. Error summary at end of scan
5. Structured error codes for scripting

**Dependencies:**
- Internal: Existing error types
- External: `anyhow` enhancements

**Effort Estimate:** 3 days (24 hours)

**Assigned Phase:** v0.3.0

**Technical Notes:**
- Use `anyhow::Context` for rich error messages
- Create error categorization (fatal/warning/info)
- Consider `color-eyre` for better error reports

---

#### FC-005: Memory-Mapped I/O Option

**Description:**
Optional memory-mapped file I/O for large files to improve performance on systems with sufficient RAM.

**Acceptance Criteria:**
1. `--mmap` flag enables memory-mapped reading
2. Falls back to buffered I/O if mmap fails
3. Configurable threshold (default: files >64MB)
4. Performance improvement documented
5. Safe handling of files modified during scan

**Dependencies:**
- Internal: Hashing infrastructure
- External: `memmap2` crate

**Effort Estimate:** 3 days (24 hours)

**Assigned Phase:** v0.3.0

**Technical Notes:**
- Use `unsafe` block with proper error handling
- Consider BLAKE3's built-in mmap support
- Profile to verify actual performance gains

---

### Phase 2 Features

#### FC-006: Bloom Filters for Quick Rejection

**Description:**
Implement Bloom filters to quickly reject files that definitely aren't duplicates before expensive hash computation.

**Acceptance Criteria:**
1. Two-stage Bloom filter (by size, then partial hash)
2. Configurable false positive rate (default: 1%)
3. 30%+ reduction in hash computations
4. Memory usage <100MB for 1M files
5. Works with incremental scanning

**Dependencies:**
- Internal: Scanning pipeline
- External: `bloom` or `growable-bloom-filter` crate

**Effort Estimate:** 5 days (40 hours)

**Assigned Phase:** v0.4.0

**Technical Notes:**
- Size filter: 10 bits per element at 1% FP rate
- Partial hash filter: first 4KB of file
- Measure actual vs theoretical performance

---

#### FC-007: Perceptual Image Hashing

**Description:**
Detect similar images (not just identical) using perceptual hashing algorithms (pHash, dHash, aHash).

**Acceptance Criteria:**
1. Three algorithms supported: pHash, dHash, aHash
2. Configurable similarity threshold (Hamming distance)
3. Default thresholds based on research:
   - pHash: ≤10 bits different
   - dHash: ≤2 bits different
   - aHash: ≤5 bits different
4. BK-tree for efficient similarity search
5. Works with JPEG, PNG, GIF, WebP, HEIC

**Dependencies:**
- Internal: None
- External: `image_hasher`, `image`, `bk_tree` crates

**Effort Estimate:** 8 days (64 hours)

**Assigned Phase:** v0.4.0

**Technical Notes:**
- Use `image_hasher` crate (maintained fork of img_hash)
- BK-tree enables sub-linear similarity search
- Consider GPU acceleration for very large image sets

---

#### FC-008: Fuzzy Text Matching

**Description:**
Detect near-duplicate documents using SimHash and MinHash algorithms.

**Acceptance Criteria:**
1. Text extraction from PDF, DOCX, TXT
2. SimHash for near-duplicate detection
3. MinHash LSH for clustering
4. Configurable similarity threshold
5. Support for non-English text

**Dependencies:**
- Internal: None
- External: `pdf-extract`, `docx-rs`, `simhash`

**Effort Estimate:** 6 days (48 hours)

**Assigned Phase:** v0.4.0

**Technical Notes:**
- Normalize text (lowercase, remove punctuation)
- Tokenize into words or n-grams
- Shingling with MinHash for large document sets

---

#### FC-009: Performance Benchmarking Suite

**Description:**
Automated performance benchmarking with regression detection.

**Acceptance Criteria:**
1. CI-integrated benchmark tests
2. Standardized test datasets (316GB, 1.4M files like fclones)
3. Compare against fclones, czkawka baseline
4. Performance regression alerts (>10% slowdown)
5. Historical performance tracking

**Dependencies:**
- Internal: None
- External: `criterion`, GitHub Actions

**Effort Estimate:** 4 days (32 hours)

**Assigned Phase:** v0.4.0

**Technical Notes:**
- Use `criterion.rs` for statistical rigor
- Store benchmark results as artifacts
- Consider caching test datasets

---

### Phase 3 Features

#### FC-010: Cloud Storage Integration

**Description:**
Scan cloud storage services (Google Drive, OneDrive, Dropbox) via APIs or local sync folders.

**Acceptance Criteria:**
1. Google Drive API support (OAuth2)
2. OneDrive Graph API support
3. Dropbox API v2 support
4. Local sync folder scanning (fallback)
5. Resumable scans for large cloud stores
6. Rate limiting and retry logic

**Dependencies:**
- Internal: None
- External: `reqwest`, `oauth2`, cloud SDKs

**Effort Estimate:** 10 days (80 hours)

**Assigned Phase:** v0.5.0

**Technical Notes:**
- Use metadata-only approach where possible (hash in API response)
- Implement exponential backoff for rate limits
- Support service account authentication for GSuite

---

#### FC-011: Real-Time File Monitoring

**Description:**
Monitor directories for changes and detect duplicates in real-time as files are created/modified.

**Acceptance Criteria:**
1. Cross-platform file system monitoring
2. Configurable debounce interval
3. Low resource usage (<5% CPU when idle)
4. Optional daemon mode
5. Integration with scan cache

**Dependencies:**
- Internal: Scanning pipeline
- External: `notify` crate

**Effort Estimate:** 6 days (48 hours)

**Assigned Phase:** v0.5.0

**Technical Notes:**
- Use `notify` crate for cross-platform support
- Inotify on Linux, FSEvents on macOS, ReadDirectoryChangesW on Windows
- Debounce events to avoid duplicate work

---

#### FC-012: Plugin System Architecture

**Description:**
Extensible plugin system for custom detection algorithms and integrations.

**Acceptance Criteria:**
1. Plugin API with stable ABI
2. Hot-reloading support
3. Example plugins provided
4. Sandboxed execution (WASM option)
5. Plugin marketplace/registry (future)

**Dependencies:**
- Internal: Modular architecture
- External: `abi_stable`, `wasmtime` (optional)

**Effort Estimate:** 8 days (64 hours)

**Assigned Phase:** v0.5.0

**Technical Notes:**
- Consider C ABI for language interoperability
- WASM plugins for safety and portability
- Plugin manifest format (TOML)

---

#### FC-013: Advanced Selection Rules

**Description:**
Rule-based automatic selection of which duplicate to keep, with priority scoring.

**Acceptance Criteria:**
1. Rule types: path pattern, date, size, resolution
2. Weighted scoring system
3. Dry-run preview mode
4. Save/load rule sets
5. EXIF/metadata-based rules for images

**Dependencies:**
- Internal: Selection logic
- External: `kamadak-exif` for metadata

**Effort Estimate:** 5 days (40 hours)

**Assigned Phase:** v0.5.0

**Technical Notes:**
- JSON/YAML rule definition format
- Rules evaluate to priority scores
- Highest score wins as "original"

---

### Phase 4 Features

#### FC-014: Security Audit

**Description:**
Comprehensive security audit including fuzzing, dependency scanning, and code review.

**Acceptance Criteria:**
1. Fuzz testing for all parsers
2. Dependency vulnerability scan
3. Code review for unsafe blocks
4. Path traversal protection verified
5. Input validation audit

**Dependencies:**
- Internal: All code
- External: `cargo-audit`, `cargo-fuzz`

**Effort Estimate:** 5 days (40 hours)

**Assigned Phase:** v1.0.0

**Technical Notes:**
- Use `cargo-audit` for dependency scanning
- Fuzz test file path parsing
- Review all unsafe blocks with miri

---

#### FC-015: Enterprise Features

**Description:**
Features for enterprise deployments: centralized policies, LDAP integration, audit logging.

**Acceptance Criteria:**
1. Group policy templates (Windows)
2. Centralized configuration management
3. Audit log format (JSON/Syslog)
4. LDAP/AD authentication (optional)
5. Compliance reporting

**Dependencies:**
- Internal: Config system
- External: LDAP crate (optional)

**Effort Estimate:** 5 days (40 hours)

**Assigned Phase:** v1.0.0

**Technical Notes:**
- Structured audit logging with JSON output
- Consider SCAP/CIS compliance checks
- Integration with SIEM systems

---

## Dependencies & Prerequisites

### Dependency Graph

```
v0.3.0 Foundation
├── Core Scanning (stable)
├── TUI Framework (stable)
└── Config Management (NEW)
    └── TOML parsing

v0.4.0 Performance
├── Bloom Filters (NEW)
│   └── Hash pipeline
├── Perceptual Hashing (NEW)
│   ├── Image processing
│   └── BK-tree
└── Fuzzy Matching (NEW)
    └── Text extraction

v0.5.0 Integration
├── Cloud APIs (NEW)
│   ├── OAuth
│   └── REST clients
├── File Monitoring (NEW)
│   └── notify crate
└── Plugin System (NEW)
    └── Dynamic loading

v0.6.0/v1.0.0 Production
├── All previous features
├── Security hardening
└── Documentation
```

### External Dependencies by Phase

#### Phase 1 Dependencies

| Crate | Version | Purpose | License |
|-------|---------|---------|---------|
| `figment` | ^0.10 | Layered configuration | MIT |
| `toml` | ^0.8 | Config file parsing | MIT/Apache-2.0 |
| `memmap2` | ^0.9 | Memory-mapped files | MIT/Apache-2.0 |

#### Phase 2 Dependencies

| Crate | Version | Purpose | License |
|-------|---------|---------|---------|
| `bloom` | ^0.3 | Bloom filters | MIT/Apache-2.0 |
| `image_hasher` | ^1.0 | Perceptual hashing | MIT |
| `image` | ^0.25 | Image loading | MIT/Apache-2.0 |
| `bk-tree` | ^0.5 | BK-tree search | MIT |
| `pdf-extract` | ^0.7 | PDF text extraction | MIT |
| `docx-rs` | ^0.4 | DOCX parsing | MIT |

#### Phase 3 Dependencies

| Crate | Version | Purpose | License |
|-------|---------|---------|---------|
| `notify` | ^6.0 | File system monitoring | CC0-1.0 |
| `reqwest` | ^0.12 | HTTP client | MIT/Apache-2.0 |
| `oauth2` | ^4.4 | OAuth authentication | MIT/Apache-2.0 |
| `abi_stable` | ^0.11 | Plugin ABI | MIT |

#### Phase 4 Dependencies

| Crate | Version | Purpose | License |
|-------|---------|---------|---------|
| `cargo-audit` | latest | Security scanning | MIT/Apache-2.0 |
| `ldap3` | ^0.11 | LDAP integration | MIT/Apache-2.0 (optional) |

### Prerequisites Checklist

- [x] Rust 1.85+ (current requirement)
- [ ] CI/CD pipeline with multi-platform testing
- [ ] Benchmark infrastructure
- [ ] Documentation hosting
- [ ] Security scanning integration

---

## Risk Mitigation

### Identified Risks

#### Phase 1 Risks

| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| Multi-dir scanning complexity | Medium | Medium | Start with simple aggregation, iterate |
| Config file migration issues | Low | Medium | Maintain CLI parity, deprecation warnings |
| TUI framework limitations | Low | High | ratatui is mature; have fallback plan |

#### Phase 2 Risks

| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| Perceptual hashing performance | High | High | Benchmark early; make optional |
| Bloom filter false positives | Medium | Medium | Tuning period; document behavior |
| Image format support gaps | Medium | Low | Use `image` crate; document limits |

#### Phase 3 Risks

| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| Cloud API rate limits | High | Medium | Exponential backoff; local cache |
| OAuth complexity | Medium | High | Use established `oauth2` crate |
| Plugin security | Medium | High | WASM sandboxing option; code signing |

#### Phase 4 Risks

| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| Security audit findings | Medium | High | Start security review in Phase 3 |
| API stability concerns | Low | High | Semantic versioning; beta period |
| Enterprise feature bloat | Medium | Medium | Keep enterprise features optional |

### Contingency Plans

1. **Performance Not Meeting Targets:**
   - Fallback to partial hashing (first/last blocks)
   - Profile-guided optimization
   - Consider GPU acceleration for Phase 4

2. **Cloud API Changes:**
   - Abstract cloud interface
   - Local sync folder fallback
   - Community-driven API updates

3. **Resource Constraints:**
   - Defer lower-priority features
   - Focus on core differentiators (TUI, performance)
   - Community contribution program

---

## Success Metrics

### Phase 1 KPIs

| Metric | Target | Measurement |
|--------|--------|-------------|
| Multi-directory support | 100% | Test coverage |
| Config file adoption | 30% | Telemetry/survey |
| Crash rate | <0.1% | Error reports |
| User satisfaction | 4.0/5 | GitHub stars, feedback |

### Phase 2 KPIs

| Metric | Target | Measurement |
|--------|--------|-------------|
| Performance vs fclones | ≤10% slower | Benchmark suite |
| Image similarity accuracy | 95%+ | Test dataset |
| Bloom filter efficiency | 30%+ reduction | Comparison test |
| Test coverage | 90%+ | cargo-tarpaulin |

### Phase 3 KPIs

| Metric | Target | Measurement |
|--------|--------|-------------|
| Cloud integration users | 100+ | Downloads/feedback |
| Plugin downloads | 10+ | Plugin registry |
| Real-time monitoring uptime | 99%+ | Long-running tests |
| Documentation completeness | 100% | Coverage check |

### Phase 4 KPIs

| Metric | Target | Measurement |
|--------|--------|-------------|
| Security issues | 0 critical/high | Audit results |
| API stability | 100% | Breaking changes |
| Enterprise adoption | 5+ orgs | Surveys/feedback |
| Overall rating | 4.5/5 | GitHub, crates.io |

### Long-Term Goals (Post-v1.0)

| Metric | Target | Timeline |
|--------|--------|----------|
| crates.io downloads | 100K+ | 12 months post-v1.0 |
| GitHub stars | 2,000+ | 12 months post-v1.0 |
| Contributor count | 20+ | 12 months post-v1.0 |
| Package manager adoption | 5+ distros | 18 months post-v1.0 |

---

## Resource Requirements

### Effort Summary by Phase

| Phase | Estimated Days | FTE Months* | Parallel Tracks |
|-------|----------------|-------------|-----------------|
| v0.3.0 Foundation | 23 days | 1.2 | 2 |
| v0.4.0 Performance | 33 days | 1.7 | 3 |
| v0.5.0 Integration | 36 days | 1.8 | 3 |
| v0.6.0/v1.0.0 Production | 25 days | 1.3 | 2 |
| **Total** | **117 days** | **6.0** | - |

\* Assumes 20 working days per month with parallel workstreams

### Skills Required

| Skill | Phase 1 | Phase 2 | Phase 3 | Phase 4 |
|-------|---------|---------|---------|---------|
| Rust (core) | ★★★ | ★★★ | ★★★ | ★★★ |
| TUI Development | ★★☆ | ★☆☆ | ★☆☆ | ★☆☆ |
| Image Processing | ☆☆☆ | ★★☆ | ★☆☆ | ★☆☆ |
| Cryptography/Hashing | ★☆☆ | ★★☆ | ★☆☆ | ★☆☆ |
| Cloud APIs | ☆☆☆ | ☆☆☆ | ★★☆ | ★☆☆ |
| Security | ★☆☆ | ★☆☆ | ★☆☆ | ★★★ |
| Technical Writing | ★☆☆ | ★☆☆ | ★★☆ | ★★★ |

### Tooling Requirements

| Tool | Purpose | Cost |
|------|---------|------|
| GitHub Actions | CI/CD | Free (open source) |
| GitHub Projects | Issue tracking | Free |
| crates.io | Package distribution | Free |
| docs.rs | Documentation hosting | Free |
| Criterion.rs | Benchmarking | Free |
| cargo-audit | Security scanning | Free |
| Coveralls/Codecov | Test coverage | Free (open source) |

### Infrastructure Needs

- **Benchmark Server:** Dedicated machine for consistent performance testing
- **Test Data Storage:** 500GB+ for benchmark datasets
- **Cloud Test Accounts:** Google Drive, OneDrive, Dropbox for integration testing
- **Cross-Platform VMs:** Windows, macOS, Linux for manual testing

---

## Appendix: Deferred Features

### Deferred to Post-v1.0

| Feature | Reason for Deferral | Potential Phase |
|---------|---------------------|-----------------|
| Audio fingerprinting (Chromaprint) | High complexity, limited demand | v1.1+ |
| Video deduplication | Requires ffmpeg, complex dependencies | v1.2+ |
| GUI version (egui/iced) | Would compete with czkawka; focus on TUI | v1.x or separate project |
| Mobile apps (Android/iOS) | Platform restrictions, new codebase | Future project |
| Machine learning similarity | Overkill for most use cases; high resource cost | Research phase |
| GPU acceleration | CPU performance sufficient; adds complexity | v1.x if needed |
| Network deduplication | Enterprise niche; complex distributed systems | v2.0+ |
| Continuous background dedup | Daemon complexity, resource concerns | v1.x optional component |

### Deferred Features - Detailed Rationale

#### Audio Fingerprinting

**Why Deferred:**
- Chromaprint integration requires external binary or complex FFI
- Audio duplicate detection is niche use case
- Would add significant binary size

**Conditions for Revival:**
- Plugin system enables external implementation
- Community demand reaches threshold
- Simpler Rust-native fingerprinting available

#### Video Deduplication

**Why Deferred:**
- Requires ffmpeg dependency or complex video parsing
- Keyframe extraction is computationally expensive
- Video files are typically large; hashing is already slow

**Conditions for Revival:**
- Plugin system provides isolation
- GPU acceleration makes it feasible
- Frame sampling approach proves effective

#### GUI Version

**Why Deferred:**
- Would directly compete with czkawka (excellent GUI already exists)
- Maintaining GUI + TUI = double maintenance burden
- TUI is unique differentiator

**Conditions for Revival:**
- Separate GUI crate in same repository
- egui or iced matures with good accessibility
- Strong community contributor interest

---

## Document Control

| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2026-02-05 | Documentation Team | Initial release |

### Approval

This roadmap represents the strategic direction for RustDupe development. Individual features may be reprioritized based on community feedback and technical discoveries.

---

**Next Review Date:** 2026-04-01 (post-v0.3.0 release)

**Feedback:** Please open an issue at https://github.com/MasuRii/RustDupe/issues for roadmap feedback