pub struct TextSplitter {
pub split_separator: Separator,
pub recursive: bool,
pub clean_text: bool,
pub max_token_size: Option<u32>,
pub tokenizer: Option<Arc<dyn Tokenizer + Send + Sync>>,
}Fields§
§split_separator: Separator§recursive: bool§clean_text: bool§max_token_size: Option<u32>Optional maximum token size for chunks. When set, recursive splitting will only drill down to finer separators when a chunk exceeds this limit.
tokenizer: Option<Arc<dyn Tokenizer + Send + Sync>>Optional tokenizer for accurate token counting. If not set, token counting is skipped and max_token_size constraint is ignored.
Implementations§
Source§impl TextSplitter
impl TextSplitter
pub fn new() -> Self
pub fn split_text(&self, text: &str) -> Option<VecDeque<TextSplit>>
pub fn on_two_plus_newline(self) -> Self
pub fn on_single_newline(self) -> Self
pub fn on_sentences_rule_based(self) -> Self
pub fn on_sentences_unicode(self) -> Self
pub fn on_words_unicode(self) -> Self
pub fn on_graphemes_unicode(self) -> Self
pub fn on_separator(self, split_separator: &Separator) -> Self
pub fn recursive(self, recursive: bool) -> Self
pub fn clean_text(self, clean_text: bool) -> Self
Sourcepub fn max_token_size(self, max_tokens: u32) -> Self
pub fn max_token_size(self, max_tokens: u32) -> Self
Set the maximum token size for chunks. When set with a tokenizer, recursive splitting will only drill down to finer separators when a chunk exceeds this limit.
Sourcepub fn with_tokenizer(self, tokenizer: Arc<dyn Tokenizer + Send + Sync>) -> Self
pub fn with_tokenizer(self, tokenizer: Arc<dyn Tokenizer + Send + Sync>) -> Self
Set the tokenizer for accurate token counting.
Sourcepub fn split_text_split(self, split: &TextSplit) -> Option<VecDeque<TextSplit>>
pub fn split_text_split(self, split: &TextSplit) -> Option<VecDeque<TextSplit>>
Split a TextSplit into smaller splits using the configured separator. This is the public API that accepts a TextSplit directly.
pub fn splits_to_text( splits: &VecDeque<TextSplit>, with_separator: bool, ) -> String
Trait Implementations§
Source§impl Default for TextSplitter
impl Default for TextSplitter
Source§fn default() -> TextSplitter
fn default() -> TextSplitter
Returns the “default value” for a type. Read more
Auto Trait Implementations§
impl Freeze for TextSplitter
impl !RefUnwindSafe for TextSplitter
impl Send for TextSplitter
impl Sync for TextSplitter
impl Unpin for TextSplitter
impl UnsafeUnpin for TextSplitter
impl !UnwindSafe for TextSplitter
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more