A difference library similar to
den is performant enough for large files and files, exceeding 10MB (when using rolling hashes).
Please compile with the release preset for 10X the performance.
Allocating data and keeping it in memory is very fast compared to hashing. In the future, Den will support reading data bit by bit, greatly reducing the memory usage.
Keep in mind this isn’t guaranteed to give the exact same data. Please check the data with a secure hashing algorithm (e.g. SHA-3) to ensure consistency.
Sending the data is possible due to
serde providing serialization and deserialization.
This requires the cargo feature
serde to be enabled.
You serialize all the structs in this library to any format.
These examples should cover what rsync does.
Gets a small diff to send to others, almost like how
base_data is considered prior knowledge.
target_data is the modified data.
The data segments can be any size. Performance should still be good.
let base_data = b"This is a document everyone has. It's about some new difference library."; let target_data = b"This is a document only I have. It's about some new difference library."; let mut signature = Signature::new(128); signature.write(base_data); let signature = signature.finish(); let diff = signature.diff(target_data); // This is the small diff you could serialize with Serde and send. let minified = diff.minify(8, base_data) .expect("This won't panic, as the data hasn't changed from calling the other functions.");
- Rolling hash
Multi-threadedThere is no feasible way to implement this, as we look ahead and change which window we’re looking at after each iteration. Now with rolling hash, the performance is great.
- Support to diff a reader
- Support to apply to a writer
Fetch API for apply to get data on demand.
- This could slow things down dramatically.
Implement Write for
Use SHA(1|256?) to verify integrity of data. Bundled with theThe implementer should provide this.
A generic rolling hash implementation.
One or more successive blocks found in the common data.
A segment with unknown contents. This will transmit the data.
A identifier of a file, much smaller than the file itself.
The algorithms which can be used for hashing the data.