pub struct MiTokenizer {
pub vocab_size: usize,
/* private fields */
}Expand description
Single-Pass Mutual Information Tokenizer (BA-37).
Instead of BPE’s iterative pair counting (O(merges × corpus_len)), computes mutual information for all character pairs in ONE pass: MI(a,b) = ln(P(ab) / (P(a) × P(b))) Pairs with MI > ln(φ) ≈ 0.481 co-occur φ× more than chance — merge them.
Total cost: O(corpus_len) for counting + O(V²) for MI + O(corpus_len) per merge round. Typically 2-3 rounds on progressively shorter corpora. ~500x faster than BPE.
The golden ratio appears as the significance threshold: ln(φ) separates coherent pairs (signal) from independent pairs (noise).
Fields§
§vocab_size: usizeImplementations§
Trait Implementations§
Source§impl Clone for MiTokenizer
impl Clone for MiTokenizer
Source§fn clone(&self) -> MiTokenizer
fn clone(&self) -> MiTokenizer
Returns a duplicate of the value. Read more
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source. Read moreAuto Trait Implementations§
impl Freeze for MiTokenizer
impl RefUnwindSafe for MiTokenizer
impl Send for MiTokenizer
impl Sync for MiTokenizer
impl Unpin for MiTokenizer
impl UnsafeUnpin for MiTokenizer
impl UnwindSafe for MiTokenizer
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more