pub struct RecursiveCharacterSplitter { /* private fields */ }Expand description
Recursive character-budget splitter.
Construct via Self::new for the default 1000-char / 100-char
shape, or Self::with_chunk_size / Self::with_chunk_overlap
/ Self::with_separators for tuning. Cloning is cheap — the
separator list is held by Arc so multiple pipelines can share
one configured splitter.
Implementations§
Source§impl RecursiveCharacterSplitter
impl RecursiveCharacterSplitter
Sourcepub fn new() -> Self
pub fn new() -> Self
Build with the default chunk size + overlap + separator priority. The 99% case for English / Latin-script corpora.
Sourcepub const fn with_chunk_size(self, chunk_size: usize) -> Self
pub const fn with_chunk_size(self, chunk_size: usize) -> Self
Override the target chunk size in characters. Chunks larger than this are recursed; chunks smaller are accumulated greedily.
Sourcepub const fn with_chunk_overlap(self, chunk_overlap: usize) -> Self
pub const fn with_chunk_overlap(self, chunk_overlap: usize) -> Self
Override the overlap (in characters) between consecutive
chunks. Must be strictly less than Self::chunk_size —
equal-or-greater overlap would loop indefinitely. Values at
or above the chunk size silently clamp to chunk_size - 1
at split time.
Sourcepub fn with_separators<I, S>(self, separators: I) -> Self
pub fn with_separators<I, S>(self, separators: I) -> Self
Override the separator priority list. Pipelines splitting LaTeX, Python source, or other domain-specific corpora ship alternative priority lists that bias toward language-meaningful boundaries.
Sourcepub const fn chunk_size(&self) -> usize
pub const fn chunk_size(&self) -> usize
Effective chunk size in characters.
Sourcepub const fn chunk_overlap(&self) -> usize
pub const fn chunk_overlap(&self) -> usize
Effective chunk overlap in characters.
Trait Implementations§
Source§impl Clone for RecursiveCharacterSplitter
impl Clone for RecursiveCharacterSplitter
Source§fn clone(&self) -> RecursiveCharacterSplitter
fn clone(&self) -> RecursiveCharacterSplitter
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for RecursiveCharacterSplitter
impl Debug for RecursiveCharacterSplitter
Source§impl Default for RecursiveCharacterSplitter
impl Default for RecursiveCharacterSplitter
Source§impl TextSplitter for RecursiveCharacterSplitter
impl TextSplitter for RecursiveCharacterSplitter
Source§fn name(&self) -> &'static str
fn name(&self) -> &'static str
Lineage::splitter
field. "recursive-character", "markdown-structure",
"token-count", etc.Source§fn split(&self, document: &Document) -> Vec<Document>
fn split(&self, document: &Document) -> Vec<Document>
document and return the resulting chunks. Returning
a single-element vec equal to the input is a valid no-op
(e.g. when the document already fits the target size); an
empty vec is also valid (e.g. content is whitespace-only). Read moreAuto Trait Implementations§
impl Freeze for RecursiveCharacterSplitter
impl RefUnwindSafe for RecursiveCharacterSplitter
impl Send for RecursiveCharacterSplitter
impl Sync for RecursiveCharacterSplitter
impl Unpin for RecursiveCharacterSplitter
impl UnsafeUnpin for RecursiveCharacterSplitter
impl UnwindSafe for RecursiveCharacterSplitter
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more