pub struct TokenCountSplitter<C: TokenCounter + ?Sized + 'static = dyn TokenCounter> { /* private fields */ }Expand description
Recursive token-budget splitter.
Construct via Self::new with any Arc<C> where
C: TokenCounter + ?Sized + 'static. Both concrete
(Arc<TiktokenCounter>) and type-erased
(Arc<dyn TokenCounter>) inputs are accepted — the splitter
monomorphises per concrete counter for inlined hot-path
dispatch, or falls through to dyn dispatch when the operator
passes an erased Arc. Chain Self::with_chunk_size /
Self::with_chunk_overlap / Self::with_separators for
tuning. Cloning is cheap — the counter sits behind an Arc
and the separator list is held by Arc<[String]>.
Implementations§
Source§impl<C: TokenCounter + ?Sized + 'static> TokenCountSplitter<C>
impl<C: TokenCounter + ?Sized + 'static> TokenCountSplitter<C>
Sourcepub fn new(counter: Arc<C>) -> Self
pub fn new(counter: Arc<C>) -> Self
Build with the supplied TokenCounter and the default
512-token / 64-token shape.
Sourcepub const fn with_chunk_size(self, chunk_size: usize) -> Self
pub const fn with_chunk_size(self, chunk_size: usize) -> Self
Override the target chunk size in tokens.
Sourcepub const fn with_chunk_overlap(self, chunk_overlap: usize) -> Self
pub const fn with_chunk_overlap(self, chunk_overlap: usize) -> Self
Override the overlap (in tokens) between consecutive chunks.
Values at or above the chunk size silently clamp to
chunk_size - 1 at split time so the recursion terminates.
Sourcepub fn with_separators<I, S>(self, separators: I) -> Self
pub fn with_separators<I, S>(self, separators: I) -> Self
Override the separator priority list. Defaults to
["\n\n", "\n", " ", ""] — paragraph → line → word → unit
fallback. Pipelines splitting source code or LaTeX ship
alternative priorities.
Sourcepub const fn chunk_size(&self) -> usize
pub const fn chunk_size(&self) -> usize
Effective chunk size in tokens.
Sourcepub const fn chunk_overlap(&self) -> usize
pub const fn chunk_overlap(&self) -> usize
Effective chunk overlap in tokens.
Sourcepub const fn counter(&self) -> &Arc<C>
pub const fn counter(&self) -> &Arc<C>
Borrow the wired token counter — surfaces
TokenCounter::encoding_name for OTel attribute emission
and operator diagnostics.
Trait Implementations§
Source§impl<C: Clone + TokenCounter + ?Sized + 'static> Clone for TokenCountSplitter<C>
impl<C: Clone + TokenCounter + ?Sized + 'static> Clone for TokenCountSplitter<C>
Source§fn clone(&self) -> TokenCountSplitter<C>
fn clone(&self) -> TokenCountSplitter<C>
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl<C: TokenCounter + ?Sized + 'static> Debug for TokenCountSplitter<C>
impl<C: TokenCounter + ?Sized + 'static> Debug for TokenCountSplitter<C>
Source§impl<C: TokenCounter + ?Sized + 'static> TextSplitter for TokenCountSplitter<C>
impl<C: TokenCounter + ?Sized + 'static> TextSplitter for TokenCountSplitter<C>
Source§fn name(&self) -> &'static str
fn name(&self) -> &'static str
Lineage::splitter
field. "recursive-character", "markdown-structure",
"token-count", etc.Source§fn split(&self, document: &Document) -> Vec<Document>
fn split(&self, document: &Document) -> Vec<Document>
document and return the resulting chunks. Returning
a single-element vec equal to the input is a valid no-op
(e.g. when the document already fits the target size); an
empty vec is also valid (e.g. content is whitespace-only). Read moreAuto Trait Implementations§
impl<C> Freeze for TokenCountSplitter<C>where
C: ?Sized,
impl<C> RefUnwindSafe for TokenCountSplitter<C>where
C: RefUnwindSafe + ?Sized,
impl<C> Send for TokenCountSplitter<C>where
C: ?Sized,
impl<C> Sync for TokenCountSplitter<C>where
C: ?Sized,
impl<C> Unpin for TokenCountSplitter<C>where
C: ?Sized,
impl<C> UnsafeUnpin for TokenCountSplitter<C>where
C: ?Sized,
impl<C> UnwindSafe for TokenCountSplitter<C>where
C: RefUnwindSafe + ?Sized,
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more