pub trait Chunk {
type SearchState;
// Required methods
fn to_search_state(&self) -> Self::SearchState;
fn find_chunk_edge(
&self,
state: &mut Self::SearchState,
data: &[u8],
) -> (Option<usize>, usize);
}
Expand description
Impl on algorthms that define methods of chunking data
This is the lowest level (but somewhat restrictive) trait for chunking algorthms. It assumes
that the input is provided to it in a contiguous slice. If you don’t have your input as a
contiguous slice, ChunkIncr
may be a better choice (it allows non-contiguous input, but may
be slowing for some chunking algorthms).
Required Associated Types§
Sourcetype SearchState
type SearchState
SearchState
allows searching for the chunk edge to resume without duplicating work
already done.
Required Methods§
Sourcefn to_search_state(&self) -> Self::SearchState
fn to_search_state(&self) -> Self::SearchState
Provide an initial [SearchState
] for use with [find_chunk_edge()
]. Generally, for each
input one should generate a new [SearchState
].
Sourcefn find_chunk_edge(
&self,
state: &mut Self::SearchState,
data: &[u8],
) -> (Option<usize>, usize)
fn find_chunk_edge( &self, state: &mut Self::SearchState, data: &[u8], ) -> (Option<usize>, usize)
Find the next “chunk” in data
to emit
The return value is a pair of a range representing the start and end of the chunk being
emitted, and the offset from which subsequent data
subsets should be passed to the next
call to find_chunk_edge
.
state
is mutated so that it does not rexamine previously examined data, even when a chunk
is not emitted.
data
may be extended with additional data between calls to find_chunk_edge()
. The bytes
that were previously in data
and are not indicated by discard_ct
must be preserved in
the next data
buffer called.
use hash_roll::Chunk;
fn some_chunk() -> impl Chunk {
hash_roll::mii::Mii::default()
}
let chunk = some_chunk();
let orig_data = b"hello";
let mut data = &orig_data[..];
let mut ss = chunk.to_search_state();
let mut prev_cut = 0;
loop {
let (chunk, discard_ct) = chunk.find_chunk_edge(&mut ss, data);
match chunk {
Some(cut_point) => {
// map `cut_point` from the current slice back into the original slice so we can
// have consistent indexes
let g_cut = cut_point + orig_data.len() - data.len();
println!("chunk: {:?}", &orig_data[prev_cut..cut_point]);
},
None => {
println!("no chunk, done with data we have");
println!("remain: {:?}", &data[discard_ct..]);
break;
}
}
data = &data[discard_ct..];
}
Note: call additional times on the same SearchState
and the required data
to obtain
subsequent chunks in the same input data. To handle a seperate input, use a new
SearchState
.
Note: calling with a previous state
with a new data
that isn’t an extention of the
previous data
will result in split points that may not follow the design of the
underlying algorithm. Avoid relying on consistent cut points to reason about memory safety.