pub struct BpeTokenizer {
pub vocab_size: usize,
/* private fields */
}Expand description
BPE (Byte Pair Encoding) tokenizer. Subword tokens learned from corpus.
Training: iteratively merge the most frequent adjacent pair until target vocab size is reached. Each merge creates a new token from two existing ones.
Encoding: greedily apply merges in learned order (longest match first). Decoding: expand each token to its character sequence.
Fields§
§vocab_size: usizeVocabulary size (base chars + merges).
Implementations§
Source§impl BpeTokenizer
impl BpeTokenizer
Sourcepub fn train(text: &str, target_vocab: usize) -> Self
pub fn train(text: &str, target_vocab: usize) -> Self
Train a BPE tokenizer from a text corpus.
target_vocab: desired vocabulary size (base characters + merge tokens).
Typical values: 512, 1024, 2048. Larger = more semantic content per token.
Trait Implementations§
Source§impl Clone for BpeTokenizer
impl Clone for BpeTokenizer
Source§fn clone(&self) -> BpeTokenizer
fn clone(&self) -> BpeTokenizer
Returns a duplicate of the value. Read more
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source. Read moreAuto Trait Implementations§
impl Freeze for BpeTokenizer
impl RefUnwindSafe for BpeTokenizer
impl Send for BpeTokenizer
impl Sync for BpeTokenizer
impl Unpin for BpeTokenizer
impl UnsafeUnpin for BpeTokenizer
impl UnwindSafe for BpeTokenizer
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more