pub struct FastCDC { /* private fields */ }
Expand description
The FastCDC chunker implementation from 2020.
There are two ways in which to use this struct.
One is to tell expected content length to the struct using
set_content_length() and subsequently invoke
cut() with buffers util all data is chunked.
The other is to use the as_iterator() method to get a
FastCDCIterator that yields Chunk structs.
See the FastCDC::cut() method for more usage documentation on the first way.
The following example example reads a file into memory and splits it into chunks that are
roughly 16 KB in size. The minimum and maximum sizes are the absolute limit
on the returned chunk sizes. With this algorithm, it is helpful to be more
lenient on the maximum chunk size as the results are highly dependent on the
input data. Changing the minimum chunk size will affect the results as the
algorithm may find different cut points given it uses the minimum as a
starting point (cut-point skipping).
let contents = fs::read("test/fixtures/SekienAkashita.jpg").unwrap();
let mut chunker = v2020::FastCDC::new(8192, 16384, 65535).unwrap();
for chunk in chunker.as_iterator(&contents) {
println!("offset={} size={}", chunk.offset, chunk.get_length());
}
Implementations§
Source§impl FastCDC
impl FastCDC
Sourcepub fn new(min_size: u32, avg_size: u32, max_size: u32) -> Result<Self, Error>
pub fn new(min_size: u32, avg_size: u32, max_size: u32) -> Result<Self, Error>
Construct a FastCDC
that will process the given slice of bytes.
Uses chunk size normalization level 1 by default.
Sourcepub fn new_advanced(
min_size: u32,
avg_size: u32,
max_size: u32,
level: Normalization,
content_length: Option<usize>,
) -> Result<Self, Error>
pub fn new_advanced( min_size: u32, avg_size: u32, max_size: u32, level: Normalization, content_length: Option<usize>, ) -> Result<Self, Error>
Create a new FastCDC
with the given normalization level and pre-set content length.
Sourcepub fn set_content_length(&mut self, length: usize)
pub fn set_content_length(&mut self, length: usize)
Set the content length to which create chunks for. This method resets the internal context. Preceding buffers processed by cut() that did not yield a chunk will then be forgotten and no more included in the calculation of the next chunk.
Sourcepub fn cut(&mut self, buffer: &[u8]) -> Option<Chunk>
pub fn cut(&mut self, buffer: &[u8]) -> Option<Chunk>
Try to identify the next cut point in the data.
If no chunk has been identified, this method returns None.
Calls that do not yield a chunk are remembered in an internal context
and will be relevant for identifying the chunk in subsequent calls.
If None is returned, the next passed buffer must not overlap with the previous one.
This method returns a Chunk struct when a chunk has been successfully identified.
See the documentation for Chunk for all available fields.
If a Chunk is returned, the next passed buffer must not overlap with the identified chunk.
See below example usage, where a cursor
is used to prevent that from happening.
Example usage:
let buffers = vec![
vec![0; 1024],
vec![0; 1024],
vec![0; 1024],
vec![0; 1024]
];
let mut chunker = FastCDC::new(64, 1024, 2560).unwrap();
chunker.set_content_length(4096);
for (i, buffer) in buffers.iter().enumerate() {
let mut cursor = 0;
loop {
// Buffer 1 with cursor at 0 returned None.
// Buffer 2 with cursor at 0 returned None.
// Buffer 3 with cursor at 0 returned offset -2048 and cut point 512.
// Buffer 3 with cursor at 512 returned None.
// Buffer 4 with cursor at 0 returned offset -512 and cut point 1024.
if let Some(chunk) = chunker.cut(&buffer[cursor..]) {
println!("Buffer {} with cursor at {} returned offset {} and cut point {}.", i + 1, cursor, chunk.offset, chunk.cutpoint);
cursor += chunk.cutpoint;
if cursor == buffer.len() { break }
} else {
println!("Buffer {} with cursor at {} returned None.", i + 1, cursor);
break;
}
}
}
There is a special case in which the remaining bytes are less than the minimum chunk size, at which point this function returns a hash of 0 and the cut point is the end of the source data.
Sourcepub fn as_iterator<'a, 'b>(
&'a mut self,
buffer: &'b [u8],
) -> FastCDCIterator<'a, 'b> ⓘ
pub fn as_iterator<'a, 'b>( &'a mut self, buffer: &'b [u8], ) -> FastCDCIterator<'a, 'b> ⓘ
Construct a FastCDCIterator by mutably referencing the base FastCDC instance.