Struct cdchunking::Chunker [] [src]

pub struct Chunker<I: ChunkerImpl> { /* fields omitted */ }

Chunker object, wraps the rolling hash into a stream-splitting object.

Methods

impl<I: ChunkerImpl> Chunker<I>
[src]

Create a Chunker from a specific way of finding chunk boundaries.

Iterates on whole chunks from a file, read into new vectors.

Reads all the chunks at once, in a vector of chunks (also vectors).

This is similar to .whole_chunks().collect(), but takes care of the IO errors, returning an error if any of the chunks failed to read.

Reads chunks with zero allocations.

This streaming iterator provides you with the chunk from an internal buffer that gets reused, instead of allowing memory to hold each chunk. This is very memory efficient, even if reading large chunks from a large file (you will get chunks in multiple parts). Unfortunately because the buffer gets reused, you have to use a while loop; Iterator cannot be implemented.

Example:

let mut chunk_iterator = chunker.stream(reader);
while let Some(chunk) = chunk_iterator.read() {
    let chunk = chunk.unwrap();
    match chunk {
        ChunkInput::Data(d) => {
            print!("{:?}, ", d);
        }
        ChunkInput::End => println!(" end of chunk"),
    }
}

Describes the chunks (don't return the data).

This iterator gives you the offset and size of the chunks, but not the data in them. If you want to iterate on the data in the chunks in an easy way, use the whole_chunks() method.

Iterate on chunks in an in-memory buffer as slices.

If your data is already in memory, you can use this method instead of whole_chunks() to get slices referencing the buffer rather than copying it to new vectors.

Returns a new Chunker object that will not go over a size limit.

Note that the inner chunking method IS reset when a chunk boundary is emitted because of the size limit. That means that using a size limit will not only add new boundary, inside of blocks too big, it might cause the boundary after such a one to not happen anymore.