Expand description
A high-performance disk cleanup library with parallel processing.
This library provides functionality to scan directories, find duplicate files
based on MD5 hashes, detect storage outliers, and generate detailed reports
using Polars DataFrames
.
Re-exports§
pub use comfy_table;
pub use globset;
pub use regex;
Modules§
- clustering
- Clustering module for detecting groups of similar files
- mcp_
server - models
- outliers
- Outlier detection module for finding files that consume disproportionate disk space.
Structs§
- File
Info - Information about a file including metadata and hash.
- Glob
- Glob represents a successfully parsed shell glob pattern.
- GlobSet
- GlobSet represents a group of globs that can be matched together in a single pass.
- Glob
SetBuilder - GlobSetBuilder builds a group of patterns that can be used to simultaneously match a file path.
- Regex
- A compiled regular expression for searching Unicode haystacks.
- Walk
Options - Options for directory walking.
Enums§
- Pattern
Type - Pattern matching type for file filtering.
Functions§
- calculate_
similarity - Calculate similarity between two fuzzy hashes.
- checksum
- Compute checksums for files in parallel.
- collect_
file_ info - Collect detailed file information in parallel.
- create_
dataframe - display_
thread_ info - Display threading information including CPU cores and thread pool size.
- find
- Find files matching a pattern.
- find_
advanced - Find files matching an advanced pattern.
- find_
duplicates - find_
similar_ files - Vector of groups of similar files with their similarity scores
- generate_
csv_ report - generate_
statistics - Generate file statistics summary.
- run
- Run the deduplication process.
- run_
with_ advanced_ options - Run deduplication with
DataFrame
support using advanced pattern matching. - run_
with_ dataframe - Run deduplication with
DataFrame
support and optional CSV output. - run_
with_ similarity - Run deduplication with similarity detection using fuzzy hashing.
- validate_
duplicates - walk
- walk_
with_ options - Walk a directory recursively with gitignore support and return all file paths.
Type Aliases§
- Similar
File Group - Find similar files based on fuzzy hashing and similarity threshold.