pub struct TokenAwareSplitter { /* private fields */ }Expand description
Splits text so each chunk’s token count (per the supplied Tokenizer)
stays under max_tokens. Falls back to a recursive char splitter for
the structural cuts; just adds a token-aware re-pack step on top.
Implementations§
Source§impl TokenAwareSplitter
impl TokenAwareSplitter
Sourcepub fn new(tokenizer: Arc<dyn Tokenizer>, max_tokens: usize) -> Self
pub fn new(tokenizer: Arc<dyn Tokenizer>, max_tokens: usize) -> Self
Build with a tokenizer + max-token cap.
Sourcepub fn with_overlap_tokens(self, n: usize) -> Self
pub fn with_overlap_tokens(self, n: usize) -> Self
Token-overlap between adjacent chunks. Clamped to max_tokens - 1
so the second-pass walker always makes progress; if the caller
passes a larger value, the chunk would just feed itself back in
as overlap and the splitter would loop or emit chunks > max.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for TokenAwareSplitter
impl !RefUnwindSafe for TokenAwareSplitter
impl Send for TokenAwareSplitter
impl Sync for TokenAwareSplitter
impl Unpin for TokenAwareSplitter
impl UnsafeUnpin for TokenAwareSplitter
impl !UnwindSafe for TokenAwareSplitter
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more