Structs§
- Char
Ratio Tokenizer - A simple character-based tokenizer that approximates token count. Useful as a fallback when no real tokenizer is available.
- Text
Split - Text
Splitter
Enums§
Statics§
Traits§
- Tokenizer
- Trait for counting tokens in text. Implement this to integrate with specific tokenizers (e.g., tiktoken, HuggingFace tokenizers).
Functions§
- split_
text_ into_ indices - Split text into sentence indices using Unicode-aware sentence boundary detection.
- split_
text_ into_ sentences - Split text into sentences using improved Unicode-aware sentence boundary detection.