Struct Sequence

Source

pub struct Sequence { /* private fields */ }

Expand description

Maintains state for an ongoing sequence of tokens and their decoded text.

Mirrors the design of the native DecodeStream in the HuggingFace tokenizers crate but works through the dyn Tokenizer trait so it supports all backends (HuggingFace, Tiktoken, Mock).

Key design decisions (matching native DecodeStream):

Token draining: consumed tokens are drained from the buffer after each successful step, keeping memory bounded regardless of generation length.
Prefix caching: the decoded prefix string is cached between calls, avoiding a redundant decode() on the next step.

Implementations§

Source §

impl Sequence

Source

pub fn new(tokenizer: Arc<dyn TokenizerTrait>) -> Self

Create a new empty sequence

Source

pub fn new_with_options( tokenizer: Arc<dyn TokenizerTrait>, skip_special_tokens: bool, ) -> Self

Create a new empty sequence with skip_special_tokens option

Source

pub fn with_tokens( tokenizer: Arc<dyn TokenizerTrait>, token_ids: Vec<TokenIdType>, ) -> Self

Create a sequence with initial tokens

Source

pub fn with_tokens_and_options( tokenizer: Arc<dyn TokenizerTrait>, token_ids: Vec<TokenIdType>, skip_special_tokens: bool, ) -> Self

Create a sequence with initial tokens and skip_special_tokens option

Source

pub fn is_empty(&self) -> bool

Check if the sequence is empty

Source

pub fn len(&self) -> usize

Get the total number of tokens appended (logical length, not buffer size)

Source

pub fn clear(&mut self)

Clear the sequence

Source

pub fn append_text( &mut self, input: &str, add_special_tokens: bool, ) -> Result<()>

Append text to the sequence by encoding it.

WARNING: Do not mix append_text() and append_token() on the same instance. append_text() does not invalidate the incremental decode cache (cached_prefix/prefix_index), so subsequent append_token() calls would diff against stale state.

Set add_special_tokens to true for embeddings, or false for chat completion where the chat template already handles special tokens.

Source

pub fn append_token(&mut self, token_id: TokenIdType) -> Result<String>

Append a single token to the sequence and return newly decoded text.

Delegates to Decoder::decode_step on the tokenizer trait. For HuggingFace tokenizers this uses the native step_decode_stream; other backends use the default double-decode fallback. Both paths handle token draining and prefix caching internally.

Source

pub fn tokenizer(&self) -> &Arc<dyn TokenizerTrait>

Get a reference to the tokenizer

Source

pub fn token_ids(&self) -> &[TokenIdType] ⓘ

Get the current token ids in the buffer (sliding window, not full history)

Source

pub fn text(&self) -> Result<String>

Decode the current buffer to text.

WARNING: after append_token() calls, this only decodes the sliding window (retained tokens), not the full sequence history. Use the incremental return values from append_token() to build the full text.

Source

pub fn skip_special_tokens(&self) -> bool

Get whether special tokens are skipped during decoding

Trait Implementations§

Source §

impl Debug for Sequence

Source §

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl !UnwindSafe for Sequence

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T> Instrument for T

Source §

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more

Source §

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §