Chunk

Trait Chunk 

Source
pub trait Chunk {
    type SearchState;

    // Required methods
    fn to_search_state(&self) -> Self::SearchState;
    fn find_chunk_edge(
        &self,
        state: &mut Self::SearchState,
        data: &[u8],
    ) -> (Option<usize>, usize);
}
Expand description

Impl on algorthms that define methods of chunking data

This is the lowest level (but somewhat restrictive) trait for chunking algorthms. It assumes that the input is provided to it in a contiguous slice. If you don’t have your input as a contiguous slice, ChunkIncr may be a better choice (it allows non-contiguous input, but may be slowing for some chunking algorthms).

Required Associated Types§

Source

type SearchState

SearchState allows searching for the chunk edge to resume without duplicating work already done.

Required Methods§

Source

fn to_search_state(&self) -> Self::SearchState

Provide an initial [SearchState] for use with [find_chunk_edge()]. Generally, for each input one should generate a new [SearchState].

Source

fn find_chunk_edge( &self, state: &mut Self::SearchState, data: &[u8], ) -> (Option<usize>, usize)

Find the next “chunk” in data to emit

The return value is a pair of a range representing the start and end of the chunk being emitted, and the offset from which subsequent data subsets should be passed to the next call to find_chunk_edge.

state is mutated so that it does not rexamine previously examined data, even when a chunk is not emitted.

data may be extended with additional data between calls to find_chunk_edge(). The bytes that were previously in data and are not indicated by discard_ct must be preserved in the next data buffer called.

use hash_roll::Chunk;

fn some_chunk() -> impl Chunk {
    hash_roll::mii::Mii::default()
}

let chunk = some_chunk();
let orig_data = b"hello";
let mut data = &orig_data[..];
let mut ss = chunk.to_search_state();
let mut prev_cut = 0;

loop {
   let (chunk, discard_ct) = chunk.find_chunk_edge(&mut ss, data);

   match chunk {
       Some(cut_point) => {
           // map `cut_point` from the current slice back into the original slice so we can
           // have consistent indexes
           let g_cut = cut_point + orig_data.len() - data.len();
           println!("chunk: {:?}", &orig_data[prev_cut..cut_point]);
       },
       None => {
           println!("no chunk, done with data we have");
           println!("remain: {:?}", &data[discard_ct..]);
           break;
       }
   }

   data = &data[discard_ct..];
}

Note: call additional times on the same SearchState and the required data to obtain subsequent chunks in the same input data. To handle a seperate input, use a new SearchState.

Note: calling with a previous state with a new data that isn’t an extention of the previous data will result in split points that may not follow the design of the underlying algorithm. Avoid relying on consistent cut points to reason about memory safety.

Implementors§