pub struct OwnedChunker { /* private fields */ }Expand description
Owned chunker for FFI bindings (Python, WASM).
Unlike Chunker, this owns its data and returns owned chunks.
Use this when you need to cross FFI boundaries where lifetimes can’t be tracked.
§Example
use chunk::OwnedChunker;
let text = b"Hello world. How are you?".to_vec();
let mut chunker = OwnedChunker::new(text)
.size(15)
.delimiters(b"\n.?".to_vec());
while let Some(chunk) = chunker.next_chunk() {
println!("{:?}", chunk);
}Implementations§
Source§impl OwnedChunker
impl OwnedChunker
Sourcepub fn delimiters(self, delimiters: Vec<u8>) -> Self
pub fn delimiters(self, delimiters: Vec<u8>) -> Self
Set single-byte delimiters to split on.
Mutually exclusive with pattern() - last one set wins.
Sourcepub fn pattern(self, pattern: Vec<u8>) -> Self
pub fn pattern(self, pattern: Vec<u8>) -> Self
Set a multi-byte pattern to split on.
Use this for multi-byte delimiters like UTF-8 characters (e.g., metaspace ▁).
Mutually exclusive with delimiters() - last one set wins.
Sourcepub fn suffix(self) -> Self
pub fn suffix(self) -> Self
Put delimiter at the end of the current chunk (suffix mode, default).
Sourcepub fn consecutive(self) -> Self
pub fn consecutive(self) -> Self
Enable consecutive delimiter/pattern handling.
When splitting, ensures we split at the START of a consecutive run
of the same delimiter/pattern, not in the middle.
Works with both .pattern() and .delimiters().
Sourcepub fn forward_fallback(self) -> Self
pub fn forward_fallback(self) -> Self
Enable forward fallback search.
When no delimiter/pattern is found in the backward search window,
search forward from target_end instead of doing a hard split.
Works with both .pattern() and .delimiters().
Sourcepub fn next_chunk(&mut self) -> Option<Vec<u8>>
pub fn next_chunk(&mut self) -> Option<Vec<u8>>
Get the next chunk, or None if exhausted.
Sourcepub fn collect_offsets(&mut self) -> Vec<(usize, usize)>
pub fn collect_offsets(&mut self) -> Vec<(usize, usize)>
Collect all chunk offsets as (start, end) pairs. This is more efficient for FFI as it returns all offsets in one call.