fastcdc 1.0.2

FastCDC (content defined chunking) in pure Rust.
Documentation

FastCDC

This crate implements the "FastCDC" content defined chunking algorithm in pure Rust. A critical aspect of its behavior is that it returns exactly the same results for the same input. To learn more about content defined chunking and its applications, see the reference material linked below.

Requirements

  • Rust stable (2018 edition)

Building and Testing

$ cargo clean
$ cargo build
$ cargo test

Example Usage

An example can be found in the examples directory of the source repository, which demonstrates reading files of arbitrary size into a memory-mapped buffer and passing them through the chunker (and computing the SHA256 hash digest of each chunk).

The unit tests also have some short examples of using the chunker, of which this code snippet is an example:

let read_result = fs::read("test/fixtures/SekienAkashita.jpg");
assert!(read_result.is_ok());
let contents = read_result.unwrap();
let chunker = FastCDC::new(&contents, 16384, 32768, 65536);
let results: Vec<Chunk> = chunker.collect();
assert_eq!(results.len(), 3);
assert_eq!(results[0].offset, 0);
assert_eq!(results[0].length, 32857);
assert_eq!(results[1].offset, 32857);
assert_eq!(results[1].length, 16408);
assert_eq!(results[2].offset, 49265);
assert_eq!(results[2].length, 60201);

Reference Material

The algorithm is as described in "FastCDC: a Fast and Efficient Content-Defined Chunking Approach for Data Deduplication"; see the paper, and presentation for details.

Prior Art

This crate is little more than a rewrite of the implementation by Joran Dirk Greef (see the ronomon link below), in Rust, and greatly simplified in usage. One significant difference is that the chunker in this crate does not calculate a hash digest of the chunks.