pub struct Tokenizer { /* private fields */ }Expand description
Wraps a CoreBPE for one specific encoding.
Implementations§
Source§impl Tokenizer
impl Tokenizer
Sourcepub fn for_model(model: &str) -> Result<Self>
pub fn for_model(model: &str) -> Result<Self>
Construct from an OpenAI model name ("gpt-4", "gpt-4o",
"gpt-4.1", "gpt-5", etc.). Tries tiktoken_rs::get_bpe_from_model
first; if that fails (the model is too new for the bundled
mapping), falls back to encoding inference via name pattern.
Sourcepub fn for_encoding(name: &str) -> Result<Self>
pub fn for_encoding(name: &str) -> Result<Self>
Construct from an encoding name. Accepts "cl100k_base" and
"o200k_base".
Sourcepub fn encoding_name(&self) -> &str
pub fn encoding_name(&self) -> &str
Encoding name ("cl100k_base" or "o200k_base").
Sourcepub fn count_many(&self, texts: &[&str], parallel: bool) -> Vec<usize>
pub fn count_many(&self, texts: &[&str], parallel: bool) -> Vec<usize>
Bulk count. With parallel = true distributes across rayon’s pool.
Sourcepub fn encode(&self, text: &str) -> Vec<u32>
pub fn encode(&self, text: &str) -> Vec<u32>
Encode to BPE token IDs (ordinary mode, no special tokens).
Sourcepub fn decode(&self, tokens: &[u32]) -> Result<String>
pub fn decode(&self, tokens: &[u32]) -> Result<String>
Decode a slice of BPE token IDs back to a string.
Sourcepub fn fits(&self, text: &str, budget: usize) -> bool
pub fn fits(&self, text: &str, budget: usize) -> bool
True iff text encodes to <= budget BPE tokens.
Sourcepub fn truncate_to(&self, text: &str, budget: usize) -> Result<String>
pub fn truncate_to(&self, text: &str, budget: usize) -> Result<String>
Encode text, truncate to the first budget tokens, and decode back.
If text already fits, returns it unchanged. Boundary handling is
whatever tiktoken-rs’s decode does on a mid-token cut, which is
well-defined for cl100k/o200k since each token decodes to a complete
UTF-8 sequence in the merged-vocabulary case.
Auto Trait Implementations§
impl Freeze for Tokenizer
impl RefUnwindSafe for Tokenizer
impl Send for Tokenizer
impl Sync for Tokenizer
impl Unpin for Tokenizer
impl UnsafeUnpin for Tokenizer
impl UnwindSafe for Tokenizer
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more