pub struct CustomFormatTokenizer { /* private fields */ }Expand description
Custom format tokenizer implementation
Implementations§
Source§impl CustomFormatTokenizer
impl CustomFormatTokenizer
Sourcepub fn from_format(format: CustomTokenizerFormat) -> Result<Self>
pub fn from_format(format: CustomTokenizerFormat) -> Result<Self>
Create a new tokenizer from a custom format
Sourcepub fn from_file<P: AsRef<Path>>(path: P) -> Result<Self>
pub fn from_file<P: AsRef<Path>>(path: P) -> Result<Self>
Load tokenizer from custom format file
Sourcepub fn save_to_file<P: AsRef<Path>>(&self, path: P) -> Result<()>
pub fn save_to_file<P: AsRef<Path>>(&self, path: P) -> Result<()>
Save tokenizer to custom format file
Sourcepub fn with_max_length(self, max_length: Option<usize>) -> Self
pub fn with_max_length(self, max_length: Option<usize>) -> Self
Set maximum sequence length
Sourcepub fn vocab_size(&self) -> usize
pub fn vocab_size(&self) -> usize
Get vocabulary size
Sourcepub fn token_to_id(&self, token: &str) -> Option<u32>
pub fn token_to_id(&self, token: &str) -> Option<u32>
Get token ID
Sourcepub fn id_to_token(&self, id: u32) -> Option<String>
pub fn id_to_token(&self, id: u32) -> Option<String>
Get token from ID
Trait Implementations§
Source§impl Clone for CustomFormatTokenizer
impl Clone for CustomFormatTokenizer
Source§fn clone(&self) -> CustomFormatTokenizer
fn clone(&self) -> CustomFormatTokenizer
Returns a duplicate of the value. Read more
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source. Read moreSource§impl Debug for CustomFormatTokenizer
impl Debug for CustomFormatTokenizer
Source§impl Tokenizer for CustomFormatTokenizer
impl Tokenizer for CustomFormatTokenizer
Source§fn encode(&self, text: &str) -> Result<TokenizedInput>
fn encode(&self, text: &str) -> Result<TokenizedInput>
Encodes a single text string into tokens. Read more
Source§fn encode_pair(&self, text_a: &str, text_b: &str) -> Result<TokenizedInput>
fn encode_pair(&self, text_a: &str, text_b: &str) -> Result<TokenizedInput>
Encodes a pair of texts for sequence-pair tasks. Read more
Source§fn vocab_size(&self) -> usize
fn vocab_size(&self) -> usize
Returns the size of the tokenizer’s vocabulary. Read more
Source§fn get_vocab(&self) -> HashMap<String, u32>
fn get_vocab(&self) -> HashMap<String, u32>
Returns a copy of the vocabulary as a mapping from tokens to IDs. Read more
Auto Trait Implementations§
impl Freeze for CustomFormatTokenizer
impl RefUnwindSafe for CustomFormatTokenizer
impl Send for CustomFormatTokenizer
impl Sync for CustomFormatTokenizer
impl Unpin for CustomFormatTokenizer
impl UnsafeUnpin for CustomFormatTokenizer
impl UnwindSafe for CustomFormatTokenizer
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more