#[non_exhaustive]pub struct UnicodeBpeConfig {
pub vocab_size: usize,
pub min_frequency: usize,
pub normalize: bool,
pub byte_fallback: bool,
}Expand description
Configuration for the Unicode-aware BPE tokenizer.
Fields (Non-exhaustive)§
This struct is marked as non-exhaustive
Non-exhaustive structs could have additional fields added in future. Therefore, non-exhaustive structs cannot be constructed in external crates using the traditional
Struct { .. } syntax; cannot be matched against without a wildcard ..; and struct update syntax will not work.vocab_size: usizeTarget vocabulary size (base chars + merge operations).
min_frequency: usizeMinimum pair frequency for a merge operation to be kept.
normalize: boolApply NFC-style normalization (simplified: recompose via canonical form).
byte_fallback: boolRepresent characters absent from the training vocabulary as <0xHH> byte tokens.
Trait Implementations§
Source§impl Clone for UnicodeBpeConfig
impl Clone for UnicodeBpeConfig
Source§fn clone(&self) -> UnicodeBpeConfig
fn clone(&self) -> UnicodeBpeConfig
Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source. Read moreSource§impl Debug for UnicodeBpeConfig
impl Debug for UnicodeBpeConfig
Auto Trait Implementations§
impl Freeze for UnicodeBpeConfig
impl RefUnwindSafe for UnicodeBpeConfig
impl Send for UnicodeBpeConfig
impl Sync for UnicodeBpeConfig
impl Unpin for UnicodeBpeConfig
impl UnsafeUnpin for UnicodeBpeConfig
impl UnwindSafe for UnicodeBpeConfig
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
The inverse inclusion map: attempts to construct
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
Checks if
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
Use with care! Same as
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
The inclusion map: converts
self to the equivalent element of its superset.