OwnedChunker

Struct OwnedChunker 

Source
pub struct OwnedChunker { /* private fields */ }
Expand description

Owned chunker for FFI bindings (Python, WASM).

Unlike Chunker, this owns its data and returns owned chunks. Use this when you need to cross FFI boundaries where lifetimes can’t be tracked.

§Example

use chunk::OwnedChunker;

let text = b"Hello world. How are you?".to_vec();
let mut chunker = OwnedChunker::new(text)
    .size(15)
    .delimiters(b"\n.?".to_vec());

while let Some(chunk) = chunker.next_chunk() {
    println!("{:?}", chunk);
}

Implementations§

Source§

impl OwnedChunker

Source

pub fn new(text: Vec<u8>) -> Self

Create a new owned chunker with the given text.

Source

pub fn size(self, size: usize) -> Self

Set the target chunk size in bytes.

Source

pub fn delimiters(self, delimiters: Vec<u8>) -> Self

Set single-byte delimiters to split on.

Mutually exclusive with pattern() - last one set wins.

Source

pub fn pattern(self, pattern: Vec<u8>) -> Self

Set a multi-byte pattern to split on.

Use this for multi-byte delimiters like UTF-8 characters (e.g., metaspace ). Mutually exclusive with delimiters() - last one set wins.

Source

pub fn prefix(self) -> Self

Put delimiter at the start of the next chunk (prefix mode).

Source

pub fn suffix(self) -> Self

Put delimiter at the end of the current chunk (suffix mode, default).

Source

pub fn consecutive(self) -> Self

Enable consecutive delimiter/pattern handling.

When splitting, ensures we split at the START of a consecutive run of the same delimiter/pattern, not in the middle. Works with both .pattern() and .delimiters().

Source

pub fn forward_fallback(self) -> Self

Enable forward fallback search.

When no delimiter/pattern is found in the backward search window, search forward from target_end instead of doing a hard split. Works with both .pattern() and .delimiters().

Source

pub fn next_chunk(&mut self) -> Option<Vec<u8>>

Get the next chunk, or None if exhausted.

Source

pub fn reset(&mut self)

Reset the chunker to start from the beginning.

Source

pub fn text(&self) -> &[u8]

Get a reference to the underlying text.

Source

pub fn collect_offsets(&mut self) -> Vec<(usize, usize)>

Collect all chunk offsets as (start, end) pairs. This is more efficient for FFI as it returns all offsets in one call.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.