pub struct Split { /* private fields */ }Expand description
Split input strings using a pattern.
Implementations§
Source§impl Split
impl Split
Sourcepub fn new(opts: SplitOptions<'_>) -> Result<Self, PreTokenizeError>
pub fn new(opts: SplitOptions<'_>) -> Result<Self, PreTokenizeError>
Construct a pre-tokenizer which splits input using a given regex pattern.
Sourcepub fn gpt2() -> Self
pub fn gpt2() -> Self
Split input strings into chunks using the GPT2_REGEX pattern
originating from GPT-2 and subsequently used by many other models.
Use new to specify a custom pattern.
Trait Implementations§
Source§impl PreTokenizer for Split
impl PreTokenizer for Split
Source§fn pre_tokenize<'a>(
&self,
text: &'a str,
) -> Result<Vec<&'a str>, PreTokenizeError>
fn pre_tokenize<'a>( &self, text: &'a str, ) -> Result<Vec<&'a str>, PreTokenizeError>
Split
text into chunks and return a vector of sub-slices.Auto Trait Implementations§
impl Freeze for Split
impl RefUnwindSafe for Split
impl Send for Split
impl Sync for Split
impl Unpin for Split
impl UnwindSafe for Split
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more