Skip to main content

Sequence

Struct Sequence 

Source
pub struct Sequence { /* private fields */ }
Expand description

Maintains state for an ongoing sequence of tokens and their decoded text.

Mirrors the design of the native DecodeStream in the HuggingFace tokenizers crate but works through the dyn Tokenizer trait so it supports all backends (HuggingFace, Tiktoken, Mock).

Key design decisions (matching native DecodeStream):

  • Token draining: consumed tokens are drained from the buffer after each successful step, keeping memory bounded regardless of generation length.
  • Prefix caching: the decoded prefix string is cached between calls, avoiding a redundant decode() on the next step.

Implementations§

Source§

impl Sequence

Source

pub fn new(tokenizer: Arc<dyn TokenizerTrait>) -> Self

Create a new empty sequence

Source

pub fn new_with_options( tokenizer: Arc<dyn TokenizerTrait>, skip_special_tokens: bool, ) -> Self

Create a new empty sequence with skip_special_tokens option

Source

pub fn with_tokens( tokenizer: Arc<dyn TokenizerTrait>, token_ids: Vec<TokenIdType>, ) -> Self

Create a sequence with initial tokens

Source

pub fn with_tokens_and_options( tokenizer: Arc<dyn TokenizerTrait>, token_ids: Vec<TokenIdType>, skip_special_tokens: bool, ) -> Self

Create a sequence with initial tokens and skip_special_tokens option

Source

pub fn is_empty(&self) -> bool

Check if the sequence is empty

Source

pub fn len(&self) -> usize

Get the total number of tokens appended (logical length, not buffer size)

Source

pub fn clear(&mut self)

Clear the sequence

Source

pub fn append_text( &mut self, input: &str, add_special_tokens: bool, ) -> Result<()>

Append text to the sequence by encoding it.

WARNING: Do not mix append_text() and append_token() on the same instance. append_text() does not invalidate the incremental decode cache (cached_prefix/prefix_index), so subsequent append_token() calls would diff against stale state.

Set add_special_tokens to true for embeddings, or false for chat completion where the chat template already handles special tokens.

Source

pub fn append_token(&mut self, token_id: TokenIdType) -> Result<String>

Append a single token to the sequence and return newly decoded text.

Delegates to Decoder::decode_step on the tokenizer trait. For HuggingFace tokenizers this uses the native step_decode_stream; other backends use the default double-decode fallback. Both paths handle token draining and prefix caching internally.

Source

pub fn tokenizer(&self) -> &Arc<dyn TokenizerTrait>

Get a reference to the tokenizer

Source

pub fn token_ids(&self) -> &[TokenIdType]

Get the current token ids in the buffer (sliding window, not full history)

Source

pub fn text(&self) -> Result<String>

Decode the current buffer to text.

WARNING: after append_token() calls, this only decodes the sliding window (retained tokens), not the full sequence history. Use the incremental return values from append_token() to build the full text.

Source

pub fn skip_special_tokens(&self) -> bool

Get whether special tokens are skipped during decoding

Trait Implementations§

Source§

impl Debug for Sequence

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more