Skip to main content

TiktokenTokenCounter

bamboo_compression::counter

Struct TiktokenTokenCounter

pub struct TiktokenTokenCounter { /* private fields */ }

Expand description

Accurate BPE-based token counter using OpenAI’s o200k_base encoding.

Uses tiktoken-rs with the vocabulary bundled at compile time — no runtime downloads. This is the recommended counter for production use.

Implementations§

impl TiktokenTokenCounter

pub fn new(metadata_overhead: u32) -> Self

Create with a custom metadata overhead.

pub fn truncate_to_token_prefix(&self, text: &str, max_tokens: u32) -> String

Truncate text to at most max_tokens tokens, keeping the START.

Encodes the text once and decodes the first max_tokens tokens back to a string — O(N) (one encode + one decode), versus the O(N²) char-by-char re-tokenization the previous find_prefix_within_tokens performed (which called count_text(&text[..i]) on every char index).

§Semantics

max_tokens == 0 → empty string (exactly 0 tokens; never exceeds budget).
Text already within max_tokens → returned unchanged (fast path).
Otherwise the result is an exact prefix of text (its START preserved), is valid UTF-8, and re-counts to ≤ max_tokens.

If the o200k encoder is unavailable (the issue #25 fallback path), this degrades to a conservative char-based cut instead of panicking.

pub fn truncate_to_token_suffix(&self, text: &str, max_tokens: u32) -> String

Truncate text to at most max_tokens tokens, keeping the END.

Symmetric to truncate_to_token_prefix: encodes once and decodes the last max_tokens tokens. Same budget / fast-path / fallback semantics; the result is a valid-UTF-8 exact suffix of text (its END preserved) that re-counts to ≤ max_tokens.

Trait Implementations§

impl Debug for TiktokenTokenCounter

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

impl Default for TiktokenTokenCounter

fn default() -> Self

Returns the “default value” for a type. Read more

impl TokenCounter for TiktokenTokenCounter

fn count_message(&self, message: &Message) -> u32

Count tokens in a single message.

fn count_text(&self, text: &str) -> u32

Count tokens in a plain text string.

fn count_messages(&self, messages: &[Message]) -> u32

Count tokens in multiple messages.

Auto Trait Implementations§

impl Freeze for TiktokenTokenCounter

impl RefUnwindSafe for TiktokenTokenCounter

impl Send for TiktokenTokenCounter

impl Sync for TiktokenTokenCounter

impl Unpin for TiktokenTokenCounter

impl UnsafeUnpin for TiktokenTokenCounter

impl UnwindSafe for TiktokenTokenCounter

Blanket Implementations§

impl<T> Any for T
where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

impl<T> Borrow<T> for T
where T: ?Sized,

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

impl<T> BorrowMut<T> for T
where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

impl<T> From<T> for T

fn from(t: T) -> T

Returns the argument unchanged.

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more

impl<T, U> Into<U> for T
where U: From<T>,

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

impl<T, U> TryFrom<U> for T
where U: Into<T>,

type Error = Infallible

The type returned in the event of a conversion error.

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.

impl<T> WithSubscriber for T

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more