pub struct StreamingDecoder<'a> { /* private fields */ }Expand description
A streaming decoder that yields well-formed UTF-8 slices as tokens arrive.
The decoder holds a reference to its parent OxiTokenizer so that
special-token handling, vocabulary lookup and byte-level decoding remain
consistent with OxiTokenizer::decode.
Implementations§
Source§impl<'a> StreamingDecoder<'a>
impl<'a> StreamingDecoder<'a>
Sourcepub fn new(tokenizer: &'a OxiTokenizer) -> Self
pub fn new(tokenizer: &'a OxiTokenizer) -> Self
Create a fresh decoder tied to tokenizer.
Sourcepub fn push_token(&mut self, id: u32) -> Option<String>
pub fn push_token(&mut self, id: u32) -> Option<String>
Push a single token ID and return the next well-formed UTF-8 slice, if
any. Returns None when the token’s bytes do not extend any
previously-pending prefix into a full UTF-8 character.
The returned String contains all characters that became complete as
a result of this push — may be multiple characters if the token
carries several whole code points.
Sourcepub fn push_tokens(&mut self, ids: &[u32]) -> Option<String>
pub fn push_tokens(&mut self, ids: &[u32]) -> Option<String>
Push many tokens at once. Equivalent to repeatedly calling
Self::push_token but only returns once, with all complete
characters concatenated.
Sourcepub fn finish(self) -> TokenizerResult<String>
pub fn finish(self) -> TokenizerResult<String>
Finish the stream and return any remaining bytes as a String.
Returns an error if the pending buffer still contains an incomplete
UTF-8 sequence (strict mode). If lossy finishing is desired, use
Self::finish_lossy instead.
Sourcepub fn finish_lossy(self) -> String
pub fn finish_lossy(self) -> String
Finish the stream, replacing any trailing invalid bytes with
\u{FFFD}. Never fails.
Sourcepub fn pending_len(&self) -> usize
pub fn pending_len(&self) -> usize
Number of bytes currently held in the pending buffer.
A non-zero value after a push_token call indicates that the last
token ended mid-UTF-8-sequence.
Sourcepub fn reset(&mut self)
pub fn reset(&mut self)
Reset the decoder state without destroying the OxiTokenizer
reference — useful when processing multiple independent streams.
Sourcepub fn total_bytes(&self) -> usize
pub fn total_bytes(&self) -> usize
Total bytes processed since construction or last Self::reset.
Sourcepub fn total_tokens(&self) -> usize
pub fn total_tokens(&self) -> usize
Total tokens processed since construction or last Self::reset.