pub fn run(input: PathBuf) -> Result<()>Expand description
Executes the analyze command to optimize CDC parameters using DCAM.
Reads a sample of the input file, performs a baseline CDC chunking pass to
measure deduplication characteristics, calculates the change probability c,
and then uses a greedy search algorithm to find optimal chunking parameters
(fingerprint bits f and minimum chunk size m). Displays the baseline,
recommended parameters, and predicted deduplication ratio.
§Arguments
input- Path to the disk image file (raw, qcow2, or any binary file)
§Output Format
Analyzing disk.img using DCAM...
Reading 512.0 MB sample for analysis...
Running Baseline CDC Pass (Avg Chunk: 8KB)...
Processed: 512.0 MB
Unique: 384.0 MB (75.0%)
Chunks: 65536
Estimated Change Prob (c): 0.750000
Optimizing parameters using DCAM...
--- Optimization Results ---
Parameter | Baseline (LBFS) | Recommended
--------------------------|-----------------|----------------
Fingerprint Bits (f) | 13 | 14
Min Chunk Size (m) | 256 | 1024
Avg Chunk Size | 8.0 KB | 16.0 KB
--- Predictions ---
Predicted Ratio: 0.7234
Est. Final Size: 7.2 GB
Est. Savings: 2.8 GB§Algorithm Details
The function implements these steps:
- File Reading: Opens input file and reads up to 512 MiB sample
- Header Skipping: For large files, skips first 1 MiB to avoid partition metadata
- Baseline Chunking: Runs FastCDC with LBFS parameters (f=13, m=256)
- Statistics Collection: Counts total bytes, unique bytes, and chunks
- Change Probability: Calculates
c = unique_bytes / total_bytes - Greedy Optimization: Calls
find_optimal_parametersto search parameter space - Prediction: Uses DCAM model to estimate deduplication ratio
- Result Display: Prints comparison table and predicted savings
§Errors
Returns an error if:
- Input file cannot be opened (file not found, permission denied)
- File metadata cannot be read
- File read operations fail (I/O error, disk full)
- CDC analysis fails (invalid data, algorithm error)
Note: Empty files are handled gracefully with an early return.
§Examples
use std::path::PathBuf;
use hexz_cli::cmd::data::analyze;
// Analyze a disk image
analyze::run(PathBuf::from("vm-disk.img"))?;
// Analyze a large backup file
analyze::run(PathBuf::from("/backup/system.tar"))?;