FastCDC
This crate implements the "FastCDC" content defined chunking algorithm in pure Rust. A critical aspect of its behavior is that it returns exactly the same results for the same input. To learn more about content defined chunking and its applications, see the reference material linked below.
Requirements
- Rust stable (2018 edition)
Building and Testing
$ cargo clean
$ cargo build
$ cargo test
Example Usage
An example can be found in the examples
directory of the source repository,
which demonstrates reading files of arbitrary size into a memory-mapped buffer
and passing them through the chunker (and computing the SHA256 hash digest of
each chunk).
The unit tests also have some short examples of using the chunker, of which this code snippet is an example:
let read_result = read;
assert!;
let contents = read_result.unwrap;
let chunker = new;
let results: = chunker.collect;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
Reference Material
The algorithm is as described in "FastCDC: a Fast and Efficient Content-Defined Chunking Approach for Data Deduplication"; see the paper, and presentation for details.
Prior Art
This crate is little more than a rewrite of the implementation by Joran Dirk Greef (see the ronomon link below), in Rust, and greatly simplified in usage. One significant difference is that the chunker in this crate does not calculate a hash digest of the chunks.
- ronomon/deduplication
- C++ and JavaScript implementation on which this code is based.
- rdedup_cdc at docs.rs
- An alternative implementation of FastCDC to the one in this crate.
- jrobhoward/quickcdc
- Similar but slightly earlier algorithm by some of the same researchers.