rustdupe 0.2.0

Smart duplicate file finder with interactive TUI
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
# RustDupe Implementation Roadmap

> **Version:** 1.0  
> **Last Updated:** 2026-02-05  
> **Target Release:** v1.0.0 by Q4 2026  
> **Current Version:** v0.2.0 (stable TUI with core duplicate detection)

---

## Table of Contents

1. [Executive Summary]#executive-summary
2. [Version Strategy]#version-strategy
3. [Roadmap Overview]#roadmap-overview
4. [Phase Breakdown]#phase-breakdown
5. [Detailed Feature Cards]#detailed-feature-cards
6. [Dependencies & Prerequisites]#dependencies--prerequisites
7. [Risk Mitigation]#risk-mitigation
8. [Success Metrics]#success-metrics
9. [Resource Requirements]#resource-requirements
10. [Appendix: Deferred Features]#appendix-deferred-features

---

## Executive Summary

### Vision

RustDupe aims to become the **premier cross-platform duplicate file finder** by combining the performance of Rust-based tools (like fclones and czkawka) with a superior interactive TUI experience and enterprise-grade reliability. The project targets power users, developers, and IT professionals who need fast, safe, and comprehensive duplicate detection.

### Key Milestones

| Milestone | Version | Target Date | Key Deliverable |
|-----------|---------|-------------|-----------------|
| Foundation | v0.3.0 | Q1 2026 | Stability and quick wins |
| Performance | v0.4.0 | Q2 2026 | Major performance gains and new detection modes |
| Integration | v0.5.0 | Q3 2026 | Cloud support and advanced UX features |
| Production | v1.0.0 | Q4 2026 | Enterprise-ready, comprehensive feature set |

### Strategic Goals

1. **Performance Leadership**: Match or exceed fclones' performance benchmarks (34s for 316GB dataset)
2. **Feature Parity**: Match czkawka's advanced detection modes (similar images, audio, video)
3. **TUI Excellence**: Maintain unique positioning as the premier TUI-based duplicate finder
4. **Cross-Platform Excellence**: First-class support for Windows, macOS, and Linux

---

## Version Strategy

### Semantic Versioning

RustDupe follows [Semantic Versioning 2.0.0](https://semver.org/):

| Version Component | Increment When | Example |
|-------------------|----------------|---------|
| **MAJOR** (X.0.0) | Breaking CLI/API changes, major architectural shifts | v1.0.0: Stable API freeze |
| **MINOR** (0.X.0) | New features, enhancements, non-breaking additions | v0.3.0: Multi-directory scanning |
| **PATCH** (0.0.X) | Bug fixes, performance improvements, documentation | v0.2.1: Critical bug fix |

### Release Cadence

| Phase | Cadence | Notes |
|-------|---------|-------|
| Pre-1.0 (v0.x) | Monthly minor releases | Rapid iteration based on feedback |
| Post-1.0 (v1.x) | Quarterly minor releases | Stability-focused, longer support cycles |
| All versions | As-needed patch releases | Critical fixes within 48 hours |

### Version Support Policy

- **Current minor version**: Full support (bug fixes, features)
- **Previous minor version**: Security fixes only (4 weeks overlap)
- **Older versions**: Community support only

### Pre-Release Tags

- `alpha`: Feature incomplete, internal testing
- `beta`: Feature complete, community testing
- `rc` (release candidate): Final testing before stable

---

## Roadmap Overview

### High-Level Timeline

```mermaid
gantt
    title RustDupe Implementation Roadmap 2026
    dateFormat YYYY-MM-DD
    axisFormat %b
    
    section Phase 1: Foundation
    Multi-directory scanning      :p1_md, 2026-02-01, 3w
    Configuration file support    :p1_cfg, after p1_md, 2w
    Improved error handling       :p1_err, after p1_cfg, 2w
    Enhanced TUI navigation       :p1_ui, after p1_err, 2w
    Memory-mapped I/O option      :p1_mmap, after p1_ui, 2w
    
    section Phase 2: Performance
    Bloom filters                 :p2_bloom, 2026-04-01, 3w
    Adaptive buffer sizing        :p2_buf, after p2_bloom, 2w
    SIMD optimizations audit      :p2_simd, after p2_buf, 2w
    Perceptual image hashing      :p2_img, after p2_simd, 4w
    Fuzzy text matching           :p2_txt, after p2_img, 3w
    
    section Phase 3: Integration
    Cloud storage scanning        :p3_cloud, 2026-07-01, 4w
    Real-time monitoring          :p3_monitor, after p3_cloud, 4w
    Plugin system                 :p3_plugin, after p3_monitor, 3w
    Advanced selection rules      :p3_rules, after p3_plugin, 3w
    
    section Phase 4: Production
    Performance validation        :p4_perf, 2026-10-01, 3w
    Security audit                :p4_sec, after p4_perf, 3w
    Documentation overhaul        :p4_docs, after p4_sec, 4w
    Enterprise features           :p4_ent, after p4_docs, 4w
    v1.0.0 release                :milestone, p4_end, 2026-12-31, 0d
```

### Quarterly Summary

| Quarter | Focus | Major Deliverables |
|---------|-------|-------------------|
| **Q1 2026** | Foundation | Multi-dir scanning, config files, stability |
| **Q2 2026** | Performance | Bloom filters, perceptual hashing, fuzzy matching |
| **Q3 2026** | Integration | Cloud storage, monitoring, plugin system |
| **Q4 2026** | Production | Security audit, enterprise features, v1.0.0 |

---

## Phase Breakdown

### Phase 1: v0.3.0 - Foundation Improvements (Q1 2026)

**Theme:** Stability, quick wins, and core usability enhancements

#### Goals
- Address most-requested missing features
- Improve stability and error handling
- Establish configuration management foundation
- Maintain backward compatibility

#### Feature List

| Feature | Effort | Priority | Dependencies |
|---------|--------|----------|--------------|
| Multi-directory scanning | 5 days | Critical | None |
| Configuration file support | 4 days | High | None |
| Enhanced TUI navigation | 4 days | High | None |
| Improved error handling | 3 days | Critical | None |
| Memory-mapped I/O option | 3 days | Medium | None |
| Better progress indicators | 2 days | Medium | None |
| Export format improvements | 2 days | Low | None |

**Total Estimated Effort:** 23 days (~1 month with parallel work)

#### v0.3.0 Success Criteria
- [ ] Can scan 3+ directories simultaneously
- [ ] Config file support for all CLI options
- [ ] Zero crashes on malformed inputs
- [ ] Memory-mapped I/O available as `--mmap` flag
- [ ] 95%+ test coverage for new code

---

### Phase 2: v0.4.0 - Performance & Detection (Q2 2026)

**Theme:** Major performance optimizations and advanced detection methods

#### Goals
- Achieve performance parity with fclones
- Add perceptual hashing for images
- Implement fuzzy text matching
- Lay groundwork for audio/video detection

#### Feature List

| Feature | Effort | Priority | Dependencies |
|---------|--------|----------|--------------|
| Bloom filters for pre-filtering | 5 days | Critical | None |
| Adaptive buffer sizing | 3 days | High | Performance profiling |
| SIMD optimization audit | 4 days | Medium | None |
| Perceptual image hashing | 8 days | Critical | Image processing |
| Fuzzy text matching (SimHash) | 6 days | High | Text processing |
| Partial content pre-hashing | 3 days | Medium | Hashing |
| Performance benchmarking suite | 4 days | High | CI/CD |

**Total Estimated Effort:** 33 days (~6 weeks with parallel work)

#### v0.4.0 Success Criteria
- [ ] Within 10% of fclones performance on benchmark suite
- [ ] Perceptual image detection with configurable thresholds
- [ ] Fuzzy document matching operational
- [ ] Bloom filter reduces hash comparisons by 30%+
- [ ] Comprehensive performance regression testing

---

### Phase 3: v0.5.0 - Integration & UX (Q3 2026)

**Theme:** Cloud integration, real-time monitoring, and advanced user experience

#### Goals
- Enable cloud storage scanning
- Add real-time directory monitoring
- Build extensible plugin system
- Implement advanced selection rules

#### Feature List

| Feature | Effort | Priority | Dependencies |
|---------|--------|----------|--------------|
| Cloud storage integration | 10 days | High | OAuth, REST APIs |
| Real-time file monitoring | 6 days | Medium | notify crate |
| Plugin system architecture | 8 days | Medium | DLL/dylib loading |
| Advanced selection rules | 5 days | High | Rule engine |
| Enhanced reporting/exports | 4 days | Medium | Template system |
| Undo/rollback improvements | 3 days | Medium | Transaction log |

**Total Estimated Effort:** 36 days (~7 weeks with parallel work)

#### v0.5.0 Success Criteria
- [ ] Google Drive/OneDrive/Dropbox scanning operational
- [ ] Real-time duplicate detection via file monitoring
- [ ] Plugin API documented with example plugins
- [ ] Rule-based auto-selection working
- [ ] Enhanced HTML reports with visual diff

---

### Phase 4: v1.0.0 - Production Ready (Q4 2026)

**Theme:** Stabilization, security, documentation, and enterprise features

#### Goals
- Achieve production-ready stability
- Complete security audit
- Comprehensive documentation
- Enterprise-grade features

#### Feature List

| Feature | Effort | Priority | Dependencies |
|---------|--------|----------|--------------|
| Security audit & hardening | 5 days | Critical | All prior code |
| Performance validation | 4 days | Critical | Benchmark suite |
| Documentation overhaul | 6 days | Critical | All features |
| Enterprise features | 5 days | Medium | LDAP, policies |
| API stability freeze | 3 days | Critical | None |
| Long-term support planning | 2 days | Medium | None |

**Total Estimated Effort:** 25 days (~5 weeks)

#### v1.0.0 Success Criteria
- [ ] No open critical or high-severity security issues
- [ ] API backward compatibility guarantee
- [ ] Complete user and developer documentation
- [ ] Enterprise authentication integration (optional)
- [ ] 6-month support commitment for v1.0.x

---

## Detailed Feature Cards

### Phase 1 Features

#### FC-001: Multi-Directory Scanning

**Description:**
Enable scanning of multiple directories in a single command, with support for directory groups and exclusion patterns.

**Acceptance Criteria:**
1. Can specify multiple directories: `rustdupe scan /path/1 /path/2 /path/3`
2. Directory groups can be named: `--group photos=/Photos --group docs=/Documents`
3. Per-directory exclusion patterns supported
4. Reference directory concept works across multiple inputs
5. Progress shows per-directory and total progress

**Dependencies:**
- Internal: None (extends existing scanning logic)
- External: None

**Effort Estimate:** 5 days (40 hours)

**Assigned Phase:** v0.3.0

**Technical Notes:**
- Extend `jwalk` parallel traversal to multiple roots
- Aggregate results in hash map before duplicate detection
- UI needs directory tree view or grouped display

---

#### FC-002: Configuration File Support

**Description:**
Support for YAML/TOML configuration files to persist commonly used options and directory configurations.

**Acceptance Criteria:**
1. Config file at `~/.config/rustdupe/config.toml` auto-loaded
2. CLI flags override config file settings
3. Config includes all current CLI options
4. Multiple named profiles supported: `--profile photos`
5. Config validation with helpful error messages

**Dependencies:**
- Internal: None
- External: `serde`, `toml` crates

**Effort Estimate:** 4 days (32 hours)

**Assigned Phase:** v0.3.0

**Technical Notes:**
- Use XDG directories crate for cross-platform paths
- Config structure mirrors CLI arguments
- Consider using `figment` for layered config (file + env + CLI)

---

#### FC-003: Enhanced TUI Navigation

**Description:**
Improve TUI with vim-style keybindings, search within results, and better keyboard navigation.

**Acceptance Criteria:**
1. Vim keybindings (j/k for navigation, / for search)
2. Search/filter within duplicate groups
3. Bulk selection by pattern (e.g., all in Downloads)
4. Expand/collapse groups
5. Sortable columns (size, path, date)

**Dependencies:**
- Internal: Existing TUI code
- External: ratatui features

**Effort Estimate:** 4 days (32 hours)

**Assigned Phase:** v0.3.0

**Technical Notes:**
- Leverage ratatui's `Table` widget with sorting
- Implement fuzzy search with `nucleo` or `skim`
- Maintain accessibility for non-vim users

---

#### FC-004: Improved Error Handling

**Description:**
Comprehensive error handling improvements with user-friendly messages and graceful degradation.

**Acceptance Criteria:**
1. All errors include context and suggestions
2. Permission errors suggest elevation options
3. Continue scanning on non-fatal errors
4. Error summary at end of scan
5. Structured error codes for scripting

**Dependencies:**
- Internal: Existing error types
- External: `anyhow` enhancements

**Effort Estimate:** 3 days (24 hours)

**Assigned Phase:** v0.3.0

**Technical Notes:**
- Use `anyhow::Context` for rich error messages
- Create error categorization (fatal/warning/info)
- Consider `color-eyre` for better error reports

---

#### FC-005: Memory-Mapped I/O Option

**Description:**
Optional memory-mapped file I/O for large files to improve performance on systems with sufficient RAM.

**Acceptance Criteria:**
1. `--mmap` flag enables memory-mapped reading
2. Falls back to buffered I/O if mmap fails
3. Configurable threshold (default: files >64MB)
4. Performance improvement documented
5. Safe handling of files modified during scan

**Dependencies:**
- Internal: Hashing infrastructure
- External: `memmap2` crate

**Effort Estimate:** 3 days (24 hours)

**Assigned Phase:** v0.3.0

**Technical Notes:**
- Use `unsafe` block with proper error handling
- Consider BLAKE3's built-in mmap support
- Profile to verify actual performance gains

---

### Phase 2 Features

#### FC-006: Bloom Filters for Quick Rejection

**Description:**
Implement Bloom filters to quickly reject files that definitely aren't duplicates before expensive hash computation.

**Acceptance Criteria:**
1. Two-stage Bloom filter (by size, then partial hash)
2. Configurable false positive rate (default: 1%)
3. 30%+ reduction in hash computations
4. Memory usage <100MB for 1M files
5. Works with incremental scanning

**Dependencies:**
- Internal: Scanning pipeline
- External: `bloom` or `growable-bloom-filter` crate

**Effort Estimate:** 5 days (40 hours)

**Assigned Phase:** v0.4.0

**Technical Notes:**
- Size filter: 10 bits per element at 1% FP rate
- Partial hash filter: first 4KB of file
- Measure actual vs theoretical performance

---

#### FC-007: Perceptual Image Hashing

**Description:**
Detect similar images (not just identical) using perceptual hashing algorithms (pHash, dHash, aHash).

**Acceptance Criteria:**
1. Three algorithms supported: pHash, dHash, aHash
2. Configurable similarity threshold (Hamming distance)
3. Default thresholds based on research:
   - pHash: ≤10 bits different
   - dHash: ≤2 bits different
   - aHash: ≤5 bits different
4. BK-tree for efficient similarity search
5. Works with JPEG, PNG, GIF, WebP, HEIC

**Dependencies:**
- Internal: None
- External: `image_hasher`, `image`, `bk_tree` crates

**Effort Estimate:** 8 days (64 hours)

**Assigned Phase:** v0.4.0

**Technical Notes:**
- Use `image_hasher` crate (maintained fork of img_hash)
- BK-tree enables sub-linear similarity search
- Consider GPU acceleration for very large image sets

---

#### FC-008: Fuzzy Text Matching

**Description:**
Detect near-duplicate documents using SimHash and MinHash algorithms.

**Acceptance Criteria:**
1. Text extraction from PDF, DOCX, TXT
2. SimHash for near-duplicate detection
3. MinHash LSH for clustering
4. Configurable similarity threshold
5. Support for non-English text

**Dependencies:**
- Internal: None
- External: `pdf-extract`, `docx-rs`, `simhash`

**Effort Estimate:** 6 days (48 hours)

**Assigned Phase:** v0.4.0

**Technical Notes:**
- Normalize text (lowercase, remove punctuation)
- Tokenize into words or n-grams
- Shingling with MinHash for large document sets

---

#### FC-009: Performance Benchmarking Suite

**Description:**
Automated performance benchmarking with regression detection.

**Acceptance Criteria:**
1. CI-integrated benchmark tests
2. Standardized test datasets (316GB, 1.4M files like fclones)
3. Compare against fclones, czkawka baseline
4. Performance regression alerts (>10% slowdown)
5. Historical performance tracking

**Dependencies:**
- Internal: None
- External: `criterion`, GitHub Actions

**Effort Estimate:** 4 days (32 hours)

**Assigned Phase:** v0.4.0

**Technical Notes:**
- Use `criterion.rs` for statistical rigor
- Store benchmark results as artifacts
- Consider caching test datasets

---

### Phase 3 Features

#### FC-010: Cloud Storage Integration

**Description:**
Scan cloud storage services (Google Drive, OneDrive, Dropbox) via APIs or local sync folders.

**Acceptance Criteria:**
1. Google Drive API support (OAuth2)
2. OneDrive Graph API support
3. Dropbox API v2 support
4. Local sync folder scanning (fallback)
5. Resumable scans for large cloud stores
6. Rate limiting and retry logic

**Dependencies:**
- Internal: None
- External: `reqwest`, `oauth2`, cloud SDKs

**Effort Estimate:** 10 days (80 hours)

**Assigned Phase:** v0.5.0

**Technical Notes:**
- Use metadata-only approach where possible (hash in API response)
- Implement exponential backoff for rate limits
- Support service account authentication for GSuite

---

#### FC-011: Real-Time File Monitoring

**Description:**
Monitor directories for changes and detect duplicates in real-time as files are created/modified.

**Acceptance Criteria:**
1. Cross-platform file system monitoring
2. Configurable debounce interval
3. Low resource usage (<5% CPU when idle)
4. Optional daemon mode
5. Integration with scan cache

**Dependencies:**
- Internal: Scanning pipeline
- External: `notify` crate

**Effort Estimate:** 6 days (48 hours)

**Assigned Phase:** v0.5.0

**Technical Notes:**
- Use `notify` crate for cross-platform support
- Inotify on Linux, FSEvents on macOS, ReadDirectoryChangesW on Windows
- Debounce events to avoid duplicate work

---

#### FC-012: Plugin System Architecture

**Description:**
Extensible plugin system for custom detection algorithms and integrations.

**Acceptance Criteria:**
1. Plugin API with stable ABI
2. Hot-reloading support
3. Example plugins provided
4. Sandboxed execution (WASM option)
5. Plugin marketplace/registry (future)

**Dependencies:**
- Internal: Modular architecture
- External: `abi_stable`, `wasmtime` (optional)

**Effort Estimate:** 8 days (64 hours)

**Assigned Phase:** v0.5.0

**Technical Notes:**
- Consider C ABI for language interoperability
- WASM plugins for safety and portability
- Plugin manifest format (TOML)

---

#### FC-013: Advanced Selection Rules

**Description:**
Rule-based automatic selection of which duplicate to keep, with priority scoring.

**Acceptance Criteria:**
1. Rule types: path pattern, date, size, resolution
2. Weighted scoring system
3. Dry-run preview mode
4. Save/load rule sets
5. EXIF/metadata-based rules for images

**Dependencies:**
- Internal: Selection logic
- External: `kamadak-exif` for metadata

**Effort Estimate:** 5 days (40 hours)

**Assigned Phase:** v0.5.0

**Technical Notes:**
- JSON/YAML rule definition format
- Rules evaluate to priority scores
- Highest score wins as "original"

---

### Phase 4 Features

#### FC-014: Security Audit

**Description:**
Comprehensive security audit including fuzzing, dependency scanning, and code review.

**Acceptance Criteria:**
1. Fuzz testing for all parsers
2. Dependency vulnerability scan
3. Code review for unsafe blocks
4. Path traversal protection verified
5. Input validation audit

**Dependencies:**
- Internal: All code
- External: `cargo-audit`, `cargo-fuzz`

**Effort Estimate:** 5 days (40 hours)

**Assigned Phase:** v1.0.0

**Technical Notes:**
- Use `cargo-audit` for dependency scanning
- Fuzz test file path parsing
- Review all unsafe blocks with miri

---

#### FC-015: Enterprise Features

**Description:**
Features for enterprise deployments: centralized policies, LDAP integration, audit logging.

**Acceptance Criteria:**
1. Group policy templates (Windows)
2. Centralized configuration management
3. Audit log format (JSON/Syslog)
4. LDAP/AD authentication (optional)
5. Compliance reporting

**Dependencies:**
- Internal: Config system
- External: LDAP crate (optional)

**Effort Estimate:** 5 days (40 hours)

**Assigned Phase:** v1.0.0

**Technical Notes:**
- Structured audit logging with JSON output
- Consider SCAP/CIS compliance checks
- Integration with SIEM systems

---

## Dependencies & Prerequisites

### Dependency Graph

```
v0.3.0 Foundation
├── Core Scanning (stable)
├── TUI Framework (stable)
└── Config Management (NEW)
    └── TOML parsing

v0.4.0 Performance
├── Bloom Filters (NEW)
│   └── Hash pipeline
├── Perceptual Hashing (NEW)
│   ├── Image processing
│   └── BK-tree
└── Fuzzy Matching (NEW)
    └── Text extraction

v0.5.0 Integration
├── Cloud APIs (NEW)
│   ├── OAuth
│   └── REST clients
├── File Monitoring (NEW)
│   └── notify crate
└── Plugin System (NEW)
    └── Dynamic loading

v0.6.0/v1.0.0 Production
├── All previous features
├── Security hardening
└── Documentation
```

### External Dependencies by Phase

#### Phase 1 Dependencies

| Crate | Version | Purpose | License |
|-------|---------|---------|---------|
| `figment` | ^0.10 | Layered configuration | MIT |
| `toml` | ^0.8 | Config file parsing | MIT/Apache-2.0 |
| `memmap2` | ^0.9 | Memory-mapped files | MIT/Apache-2.0 |

#### Phase 2 Dependencies

| Crate | Version | Purpose | License |
|-------|---------|---------|---------|
| `bloom` | ^0.3 | Bloom filters | MIT/Apache-2.0 |
| `image_hasher` | ^1.0 | Perceptual hashing | MIT |
| `image` | ^0.25 | Image loading | MIT/Apache-2.0 |
| `bk-tree` | ^0.5 | BK-tree search | MIT |
| `pdf-extract` | ^0.7 | PDF text extraction | MIT |
| `docx-rs` | ^0.4 | DOCX parsing | MIT |

#### Phase 3 Dependencies

| Crate | Version | Purpose | License |
|-------|---------|---------|---------|
| `notify` | ^6.0 | File system monitoring | CC0-1.0 |
| `reqwest` | ^0.12 | HTTP client | MIT/Apache-2.0 |
| `oauth2` | ^4.4 | OAuth authentication | MIT/Apache-2.0 |
| `abi_stable` | ^0.11 | Plugin ABI | MIT |

#### Phase 4 Dependencies

| Crate | Version | Purpose | License |
|-------|---------|---------|---------|
| `cargo-audit` | latest | Security scanning | MIT/Apache-2.0 |
| `ldap3` | ^0.11 | LDAP integration | MIT/Apache-2.0 (optional) |

### Prerequisites Checklist

- [x] Rust 1.85+ (current requirement)
- [ ] CI/CD pipeline with multi-platform testing
- [ ] Benchmark infrastructure
- [ ] Documentation hosting
- [ ] Security scanning integration

---

## Risk Mitigation

### Identified Risks

#### Phase 1 Risks

| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| Multi-dir scanning complexity | Medium | Medium | Start with simple aggregation, iterate |
| Config file migration issues | Low | Medium | Maintain CLI parity, deprecation warnings |
| TUI framework limitations | Low | High | ratatui is mature; have fallback plan |

#### Phase 2 Risks

| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| Perceptual hashing performance | High | High | Benchmark early; make optional |
| Bloom filter false positives | Medium | Medium | Tuning period; document behavior |
| Image format support gaps | Medium | Low | Use `image` crate; document limits |

#### Phase 3 Risks

| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| Cloud API rate limits | High | Medium | Exponential backoff; local cache |
| OAuth complexity | Medium | High | Use established `oauth2` crate |
| Plugin security | Medium | High | WASM sandboxing option; code signing |

#### Phase 4 Risks

| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| Security audit findings | Medium | High | Start security review in Phase 3 |
| API stability concerns | Low | High | Semantic versioning; beta period |
| Enterprise feature bloat | Medium | Medium | Keep enterprise features optional |

### Contingency Plans

1. **Performance Not Meeting Targets:**
   - Fallback to partial hashing (first/last blocks)
   - Profile-guided optimization
   - Consider GPU acceleration for Phase 4

2. **Cloud API Changes:**
   - Abstract cloud interface
   - Local sync folder fallback
   - Community-driven API updates

3. **Resource Constraints:**
   - Defer lower-priority features
   - Focus on core differentiators (TUI, performance)
   - Community contribution program

---

## Success Metrics

### Phase 1 KPIs

| Metric | Target | Measurement |
|--------|--------|-------------|
| Multi-directory support | 100% | Test coverage |
| Config file adoption | 30% | Telemetry/survey |
| Crash rate | <0.1% | Error reports |
| User satisfaction | 4.0/5 | GitHub stars, feedback |

### Phase 2 KPIs

| Metric | Target | Measurement |
|--------|--------|-------------|
| Performance vs fclones | ≤10% slower | Benchmark suite |
| Image similarity accuracy | 95%+ | Test dataset |
| Bloom filter efficiency | 30%+ reduction | Comparison test |
| Test coverage | 90%+ | cargo-tarpaulin |

### Phase 3 KPIs

| Metric | Target | Measurement |
|--------|--------|-------------|
| Cloud integration users | 100+ | Downloads/feedback |
| Plugin downloads | 10+ | Plugin registry |
| Real-time monitoring uptime | 99%+ | Long-running tests |
| Documentation completeness | 100% | Coverage check |

### Phase 4 KPIs

| Metric | Target | Measurement |
|--------|--------|-------------|
| Security issues | 0 critical/high | Audit results |
| API stability | 100% | Breaking changes |
| Enterprise adoption | 5+ orgs | Surveys/feedback |
| Overall rating | 4.5/5 | GitHub, crates.io |

### Long-Term Goals (Post-v1.0)

| Metric | Target | Timeline |
|--------|--------|----------|
| crates.io downloads | 100K+ | 12 months post-v1.0 |
| GitHub stars | 2,000+ | 12 months post-v1.0 |
| Contributor count | 20+ | 12 months post-v1.0 |
| Package manager adoption | 5+ distros | 18 months post-v1.0 |

---

## Resource Requirements

### Effort Summary by Phase

| Phase | Estimated Days | FTE Months* | Parallel Tracks |
|-------|----------------|-------------|-----------------|
| v0.3.0 Foundation | 23 days | 1.2 | 2 |
| v0.4.0 Performance | 33 days | 1.7 | 3 |
| v0.5.0 Integration | 36 days | 1.8 | 3 |
| v0.6.0/v1.0.0 Production | 25 days | 1.3 | 2 |
| **Total** | **117 days** | **6.0** | - |

\* Assumes 20 working days per month with parallel workstreams

### Skills Required

| Skill | Phase 1 | Phase 2 | Phase 3 | Phase 4 |
|-------|---------|---------|---------|---------|
| Rust (core) | ★★★ | ★★★ | ★★★ | ★★★ |
| TUI Development | ★★☆ | ★☆☆ | ★☆☆ | ★☆☆ |
| Image Processing | ☆☆☆ | ★★☆ | ★☆☆ | ★☆☆ |
| Cryptography/Hashing | ★☆☆ | ★★☆ | ★☆☆ | ★☆☆ |
| Cloud APIs | ☆☆☆ | ☆☆☆ | ★★☆ | ★☆☆ |
| Security | ★☆☆ | ★☆☆ | ★☆☆ | ★★★ |
| Technical Writing | ★☆☆ | ★☆☆ | ★★☆ | ★★★ |

### Tooling Requirements

| Tool | Purpose | Cost |
|------|---------|------|
| GitHub Actions | CI/CD | Free (open source) |
| GitHub Projects | Issue tracking | Free |
| crates.io | Package distribution | Free |
| docs.rs | Documentation hosting | Free |
| Criterion.rs | Benchmarking | Free |
| cargo-audit | Security scanning | Free |
| Coveralls/Codecov | Test coverage | Free (open source) |

### Infrastructure Needs

- **Benchmark Server:** Dedicated machine for consistent performance testing
- **Test Data Storage:** 500GB+ for benchmark datasets
- **Cloud Test Accounts:** Google Drive, OneDrive, Dropbox for integration testing
- **Cross-Platform VMs:** Windows, macOS, Linux for manual testing

---

## Appendix: Deferred Features

### Deferred to Post-v1.0

| Feature | Reason for Deferral | Potential Phase |
|---------|---------------------|-----------------|
| Audio fingerprinting (Chromaprint) | High complexity, limited demand | v1.1+ |
| Video deduplication | Requires ffmpeg, complex dependencies | v1.2+ |
| GUI version (egui/iced) | Would compete with czkawka; focus on TUI | v1.x or separate project |
| Mobile apps (Android/iOS) | Platform restrictions, new codebase | Future project |
| Machine learning similarity | Overkill for most use cases; high resource cost | Research phase |
| GPU acceleration | CPU performance sufficient; adds complexity | v1.x if needed |
| Network deduplication | Enterprise niche; complex distributed systems | v2.0+ |
| Continuous background dedup | Daemon complexity, resource concerns | v1.x optional component |

### Deferred Features - Detailed Rationale

#### Audio Fingerprinting

**Why Deferred:**
- Chromaprint integration requires external binary or complex FFI
- Audio duplicate detection is niche use case
- Would add significant binary size

**Conditions for Revival:**
- Plugin system enables external implementation
- Community demand reaches threshold
- Simpler Rust-native fingerprinting available

#### Video Deduplication

**Why Deferred:**
- Requires ffmpeg dependency or complex video parsing
- Keyframe extraction is computationally expensive
- Video files are typically large; hashing is already slow

**Conditions for Revival:**
- Plugin system provides isolation
- GPU acceleration makes it feasible
- Frame sampling approach proves effective

#### GUI Version

**Why Deferred:**
- Would directly compete with czkawka (excellent GUI already exists)
- Maintaining GUI + TUI = double maintenance burden
- TUI is unique differentiator

**Conditions for Revival:**
- Separate GUI crate in same repository
- egui or iced matures with good accessibility
- Strong community contributor interest

---

## Document Control

| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2026-02-05 | Documentation Team | Initial release |

### Approval

This roadmap represents the strategic direction for RustDupe development. Individual features may be reprioritized based on community feedback and technical discoveries.

---

**Next Review Date:** 2026-04-01 (post-v0.3.0 release)

**Feedback:** Please open an issue at https://github.com/MasuRii/RustDupe/issues for roadmap feedback