Struct FastCDC

Source
pub struct FastCDC { /* private fields */ }
Expand description

The FastCDC chunker implementation from 2020.

There are two ways in which to use this struct.
One is to tell expected content length to the struct using set_content_length() and subsequently invoke cut() with buffers util all data is chunked.
The other is to use the as_iterator() method to get a FastCDCIterator that yields Chunk structs.

See the FastCDC::cut() method for more usage documentation on the first way.

The following example example reads a file into memory and splits it into chunks that are roughly 16 KB in size. The minimum and maximum sizes are the absolute limit on the returned chunk sizes. With this algorithm, it is helpful to be more lenient on the maximum chunk size as the results are highly dependent on the input data. Changing the minimum chunk size will affect the results as the algorithm may find different cut points given it uses the minimum as a starting point (cut-point skipping).

let contents = fs::read("test/fixtures/SekienAkashita.jpg").unwrap();
let mut chunker = v2020::FastCDC::new(8192, 16384, 65535).unwrap();
for chunk in chunker.as_iterator(&contents) {
    println!("offset={} size={}", chunk.offset, chunk.get_length());
}

Implementations§

Source§

impl FastCDC

Source

pub fn new(min_size: u32, avg_size: u32, max_size: u32) -> Result<Self, Error>

Construct a FastCDC that will process the given slice of bytes.

Uses chunk size normalization level 1 by default.

Source

pub fn new_advanced( min_size: u32, avg_size: u32, max_size: u32, level: Normalization, content_length: Option<usize>, ) -> Result<Self, Error>

Create a new FastCDC with the given normalization level and pre-set content length.

Source

pub fn set_content_length(&mut self, length: usize)

Set the content length to which create chunks for. This method resets the internal context. Preceding buffers processed by cut() that did not yield a chunk will then be forgotten and no more included in the calculation of the next chunk.

Source

pub fn cut(&mut self, buffer: &[u8]) -> Option<Chunk>

Try to identify the next cut point in the data.
If no chunk has been identified, this method returns None.
Calls that do not yield a chunk are remembered in an internal context and will be relevant for identifying the chunk in subsequent calls.
If None is returned, the next passed buffer must not overlap with the previous one.

This method returns a Chunk struct when a chunk has been successfully identified.
See the documentation for Chunk for all available fields.
If a Chunk is returned, the next passed buffer must not overlap with the identified chunk.
See below example usage, where a cursor is used to prevent that from happening.

Example usage:

let buffers = vec![
    vec![0; 1024],
    vec![0; 1024],
    vec![0; 1024],
    vec![0; 1024]
];

let mut chunker = FastCDC::new(64, 1024, 2560).unwrap();
chunker.set_content_length(4096);

for (i, buffer) in buffers.iter().enumerate() {
    let mut cursor = 0;
    loop {
        // Buffer 1 with cursor at 0 returned None.
        // Buffer 2 with cursor at 0 returned None.
        // Buffer 3 with cursor at 0 returned offset -2048 and cut point 512.
        // Buffer 3 with cursor at 512 returned None.
        // Buffer 4 with cursor at 0 returned offset -512 and cut point 1024.
        if let Some(chunk) = chunker.cut(&buffer[cursor..]) {
            println!("Buffer {} with cursor at {} returned offset {} and cut point {}.", i + 1, cursor, chunk.offset, chunk.cutpoint);

            cursor += chunk.cutpoint;
            if cursor == buffer.len() { break }
        } else {
            println!("Buffer {} with cursor at {} returned None.", i + 1, cursor);
            break;
        }
    }
}

There is a special case in which the remaining bytes are less than the minimum chunk size, at which point this function returns a hash of 0 and the cut point is the end of the source data.

Source

pub fn as_iterator<'a, 'b>( &'a mut self, buffer: &'b [u8], ) -> FastCDCIterator<'a, 'b>

Construct a FastCDCIterator by mutably referencing the base FastCDC instance.

Trait Implementations§

Source§

impl Clone for FastCDC

Source§

fn clone(&self) -> FastCDC

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for FastCDC

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl PartialEq for FastCDC

Source§

fn eq(&self, other: &FastCDC) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Eq for FastCDC

Source§

impl StructuralPartialEq for FastCDC

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.