pub struct TiktokenTokenizer { /* private fields */ }Expand description
Tiktoken tokenizer wrapper — supports both built-in OpenAI encodings and hub-loaded models.
Implementations§
Source§impl TiktokenTokenizer
impl TiktokenTokenizer
Sourcepub fn new(model: TiktokenModel) -> Result<Self>
pub fn new(model: TiktokenModel) -> Result<Self>
Create a new Tiktoken tokenizer for the specified built-in model
Sourcepub fn from_dir(dir: &Path) -> Result<Self>
pub fn from_dir(dir: &Path) -> Result<Self>
Create from a directory containing tiktoken.model + tokenizer_config.json
Sourcepub fn from_dir_with_chat_template(
dir: &Path,
chat_template_path: Option<&str>,
) -> Result<Self>
pub fn from_dir_with_chat_template( dir: &Path, chat_template_path: Option<&str>, ) -> Result<Self>
Create from a directory with an optional chat template file path.
Discovers the tiktoken model file automatically via find_tiktoken_file.
Sourcepub fn from_file(tiktoken_path: &Path) -> Result<Self>
pub fn from_file(tiktoken_path: &Path) -> Result<Self>
Create from an exact tiktoken file path (.tiktoken or tiktoken.model).
Looks for tokenizer_config.json in the same directory.
Sourcepub fn from_file_with_chat_template(
tiktoken_path: &Path,
chat_template_path: Option<&str>,
) -> Result<Self>
pub fn from_file_with_chat_template( tiktoken_path: &Path, chat_template_path: Option<&str>, ) -> Result<Self>
Create from an exact tiktoken file path with an optional chat template.
Sourcepub fn from_model_name(model_name: &str) -> Result<Self>
pub fn from_model_name(model_name: &str) -> Result<Self>
Create a tokenizer from a model string (e.g., “gpt-4”, “gpt-3.5-turbo”)
Trait Implementations§
Source§impl Decoder for TiktokenTokenizer
impl Decoder for TiktokenTokenizer
fn decode( &self, token_ids: &[TokenIdType], _skip_special_tokens: bool, ) -> Result<String>
Source§fn decode_step(
&self,
token_id: TokenIdType,
ids: &mut Vec<TokenIdType>,
prefix: &mut String,
prefix_index: &mut usize,
skip_special_tokens: bool,
) -> Result<Option<String>>
fn decode_step( &self, token_id: TokenIdType, ids: &mut Vec<TokenIdType>, prefix: &mut String, prefix_index: &mut usize, skip_special_tokens: bool, ) -> Result<Option<String>>
Incremental decode step — called once per generated token. Read more
Source§impl Encoder for TiktokenTokenizer
impl Encoder for TiktokenTokenizer
Source§impl Tokenizer for TiktokenTokenizer
impl Tokenizer for TiktokenTokenizer
fn vocab_size(&self) -> usize
fn get_special_tokens(&self) -> &SpecialTokens
fn token_to_id(&self, token: &str) -> Option<TokenIdType>
fn id_to_token(&self, id: TokenIdType) -> Option<String>
Source§fn apply_chat_template(
&self,
messages: &[Value],
params: ChatTemplateParams<'_>,
) -> Result<String>
fn apply_chat_template( &self, messages: &[Value], params: ChatTemplateParams<'_>, ) -> Result<String>
Apply chat template to messages. Default returns an error for tokenizers without template support.
Source§fn chat_template_content_format(&self) -> ChatTemplateContentFormat
fn chat_template_content_format(&self) -> ChatTemplateContentFormat
Get the content format expected by the chat template.
Source§fn thinking_toggle(&self) -> ThinkingToggle
fn thinking_toggle(&self) -> ThinkingToggle
Get the thinking toggle support for this template.
Source§fn thinking_key_name(&self) -> Option<ThinkingKeyName>
fn thinking_key_name(&self) -> Option<ThinkingKeyName>
The variable name the template uses for the thinking toggle.
Source§fn think_in_prefill(&self) -> bool
fn think_in_prefill(&self) -> bool
Whether the template injects
<think> in the generation prompt.Auto Trait Implementations§
impl !Freeze for TiktokenTokenizer
impl !RefUnwindSafe for TiktokenTokenizer
impl Send for TiktokenTokenizer
impl Sync for TiktokenTokenizer
impl Unpin for TiktokenTokenizer
impl UnsafeUnpin for TiktokenTokenizer
impl !UnwindSafe for TiktokenTokenizer
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more