pub struct TiktokenTokenCounter { /* private fields */ }Expand description
Accurate BPE-based token counter using OpenAI’s o200k_base encoding.
Uses tiktoken-rs with the vocabulary bundled at compile time — no runtime
downloads. This is the recommended counter for production use.
Implementations§
Source§impl TiktokenTokenCounter
impl TiktokenTokenCounter
Sourcepub fn truncate_to_token_prefix(&self, text: &str, max_tokens: u32) -> String
pub fn truncate_to_token_prefix(&self, text: &str, max_tokens: u32) -> String
Truncate text to at most max_tokens tokens, keeping the START.
Encodes the text once and decodes the first max_tokens tokens back
to a string — O(N) (one encode + one decode), versus the O(N²)
char-by-char re-tokenization the previous find_prefix_within_tokens
performed (which called count_text(&text[..i]) on every char index).
§Semantics
max_tokens == 0→ empty string (exactly 0 tokens; never exceeds budget).- Text already within
max_tokens→ returned unchanged (fast path). - Otherwise the result is an exact prefix of
text(its START preserved), is valid UTF-8, and re-counts to ≤max_tokens.
If the o200k encoder is unavailable (the issue #25 fallback path), this degrades to a conservative char-based cut instead of panicking.
Sourcepub fn truncate_to_token_suffix(&self, text: &str, max_tokens: u32) -> String
pub fn truncate_to_token_suffix(&self, text: &str, max_tokens: u32) -> String
Truncate text to at most max_tokens tokens, keeping the END.
Symmetric to truncate_to_token_prefix:
encodes once and decodes the last max_tokens tokens. Same budget /
fast-path / fallback semantics; the result is a valid-UTF-8 exact suffix
of text (its END preserved) that re-counts to ≤ max_tokens.