pub struct HydraByteTokenizer { /* private fields */ }Expand description
Byte-level tokenizer that matches Hydra’s training tokenizer.
Uses the same encoding as the Python SimpleTokenizer:
- PAD = 0, EOS = 1, BOS = 2
- Byte values 0-255 map to token IDs 3-258
- Sequences are wrapped with BOS and EOS tokens
Implementations§
Source§impl HydraByteTokenizer
impl HydraByteTokenizer
Sourcepub const PAD_TOKEN_ID: u32 = 0
pub const PAD_TOKEN_ID: u32 = 0
PAD token ID
Sourcepub const EOS_TOKEN_ID: u32 = 1
pub const EOS_TOKEN_ID: u32 = 1
EOS token ID
Sourcepub const BOS_TOKEN_ID: u32 = 2
pub const BOS_TOKEN_ID: u32 = 2
BOS token ID
Sourcepub const BYTE_OFFSET: u32 = 3
pub const BYTE_OFFSET: u32 = 3
Offset for byte values (first 3 IDs reserved for special tokens)
Sourcepub fn with_max_length(max_length: usize) -> Self
pub fn with_max_length(max_length: usize) -> Self
Create tokenizer with custom max length.
Trait Implementations§
Source§impl Clone for HydraByteTokenizer
impl Clone for HydraByteTokenizer
Source§fn clone(&self) -> HydraByteTokenizer
fn clone(&self) -> HydraByteTokenizer
Returns a duplicate of the value. Read more
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source. Read moreSource§impl Debug for HydraByteTokenizer
impl Debug for HydraByteTokenizer
Source§impl Default for HydraByteTokenizer
impl Default for HydraByteTokenizer
Source§impl HydraTokenizer for HydraByteTokenizer
impl HydraTokenizer for HydraByteTokenizer
Auto Trait Implementations§
impl Freeze for HydraByteTokenizer
impl RefUnwindSafe for HydraByteTokenizer
impl Send for HydraByteTokenizer
impl Sync for HydraByteTokenizer
impl Unpin for HydraByteTokenizer
impl UnwindSafe for HydraByteTokenizer
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more