pub trait Chunk {
type SearchState;
// Required methods
fn to_search_state(&self) -> Self::SearchState;
fn find_chunk_edge(
&self,
state: &mut Self::SearchState,
data: &[u8],
) -> (Option<usize>, usize);
}Expand description
Impl on algorthms that define methods of chunking data
This is the lowest level (but somewhat restrictive) trait for chunking algorthms. It assumes
that the input is provided to it in a contiguous slice. If you don’t have your input as a
contiguous slice, ChunkIncr may be a better choice (it allows non-contiguous input, but may
be slowing for some chunking algorthms).
Required Associated Types§
Sourcetype SearchState
type SearchState
SearchState allows searching for the chunk edge to resume without duplicating work
already done.
Required Methods§
Sourcefn to_search_state(&self) -> Self::SearchState
fn to_search_state(&self) -> Self::SearchState
Provide an initial [SearchState] for use with [find_chunk_edge()]. Generally, for each
input one should generate a new [SearchState].
Sourcefn find_chunk_edge(
&self,
state: &mut Self::SearchState,
data: &[u8],
) -> (Option<usize>, usize)
fn find_chunk_edge( &self, state: &mut Self::SearchState, data: &[u8], ) -> (Option<usize>, usize)
Find the next “chunk” in data to emit
The return value is a pair of a range representing the start and end of the chunk being
emitted, and the offset from which subsequent data subsets should be passed to the next
call to find_chunk_edge.
state is mutated so that it does not rexamine previously examined data, even when a chunk
is not emitted.
data may be extended with additional data between calls to find_chunk_edge(). The bytes
that were previously in data and are not indicated by discard_ct must be preserved in
the next data buffer called.
use hash_roll::Chunk;
fn some_chunk() -> impl Chunk {
hash_roll::mii::Mii::default()
}
let chunk = some_chunk();
let orig_data = b"hello";
let mut data = &orig_data[..];
let mut ss = chunk.to_search_state();
let mut prev_cut = 0;
loop {
let (chunk, discard_ct) = chunk.find_chunk_edge(&mut ss, data);
match chunk {
Some(cut_point) => {
// map `cut_point` from the current slice back into the original slice so we can
// have consistent indexes
let g_cut = cut_point + orig_data.len() - data.len();
println!("chunk: {:?}", &orig_data[prev_cut..cut_point]);
},
None => {
println!("no chunk, done with data we have");
println!("remain: {:?}", &data[discard_ct..]);
break;
}
}
data = &data[discard_ct..];
}Note: call additional times on the same SearchState and the required data to obtain
subsequent chunks in the same input data. To handle a seperate input, use a new
SearchState.
Note: calling with a previous state with a new data that isn’t an extention of the
previous data will result in split points that may not follow the design of the
underlying algorithm. Avoid relying on consistent cut points to reason about memory safety.