Struct rust_tokenizers::vocab::BaseVocab
source · pub struct BaseVocab {
pub values: HashMap<String, i64>,
pub indices: HashMap<i64, String>,
pub special_token_map: SpecialTokenMap,
pub special_values: HashMap<String, i64>,
pub special_indices: HashMap<i64, String>,
}
Expand description
BaseVocab
Base vocabulary with [UNK] unknown token used as a pre-tokenization step for BERT-class tokenizers. Expects a flat text vocabulary when created from file.
Fields§
§values: HashMap<String, i64>
A mapping of tokens as string to indices (i.e. the encoder base)
indices: HashMap<i64, String>
A mapping of token ids to strings (i.e. the decoder base)
special_token_map: SpecialTokenMap
Special tokens used by the vocabulary
special_values: HashMap<String, i64>
A mapping of special value tokens as strings to IDs (i.e. the encoder base for special values), special values typically include things like BOS/EOS markers, class markers, mask markers and padding markers
special_indices: HashMap<i64, String>
A mapping of special value tokens as IDs to strings (i.e. the decoder base for special values)
Trait Implementations§
source§impl Vocab for BaseVocab
impl Vocab for BaseVocab
source§fn get_unknown_value(&self) -> &str
fn get_unknown_value(&self) -> &str
Returns the unknown value on an instance
source§fn special_indices(&self) -> &HashMap<i64, String>
fn special_indices(&self) -> &HashMap<i64, String>
Return the map of token IDs to strings for special values
source§fn values_mut(&mut self) -> &mut HashMap<String, i64>
fn values_mut(&mut self) -> &mut HashMap<String, i64>
Return a mutable reference to the map of token strings to IDs
source§fn indices_mut(&mut self) -> &mut HashMap<i64, String>
fn indices_mut(&mut self) -> &mut HashMap<i64, String>
Return a mutable reference to the map of token IDs to strings
source§fn special_values_mut(&mut self) -> &mut HashMap<String, i64>
fn special_values_mut(&mut self) -> &mut HashMap<String, i64>
Return a mutable reference to the map of token strings to IDs
source§fn special_indices_mut(&mut self) -> &mut HashMap<i64, String>
fn special_indices_mut(&mut self) -> &mut HashMap<i64, String>
Return a mutable reference to the map of token IDs to strings for special values
source§fn from_file<P: AsRef<Path>>(path: P) -> Result<BaseVocab, TokenizerError>
fn from_file<P: AsRef<Path>>(path: P) -> Result<BaseVocab, TokenizerError>
Read a vocabulary from file Read more
source§fn from_file_with_special_token_mapping<P: AsRef<Path>, S: AsRef<Path>>(
path: P,
special_token_mapping_path: S
) -> Result<Self, TokenizerError>
fn from_file_with_special_token_mapping<P: AsRef<Path>, S: AsRef<Path>>( path: P, special_token_mapping_path: S ) -> Result<Self, TokenizerError>
Read a vocabulary from file with special token mapping Read more
fn from_values_and_special_token_map( values: HashMap<String, i64>, special_token_map: SpecialTokenMap ) -> Result<Self, TokenizerError>where Self: Sized,
source§fn _token_to_id(
&self,
token: &str,
values: &HashMap<String, i64>,
special_values: &HashMap<String, i64>,
unknown_value: &str
) -> i64
fn _token_to_id( &self, token: &str, values: &HashMap<String, i64>, special_values: &HashMap<String, i64>, unknown_value: &str ) -> i64
Converts a token to an id, provided a
HashMap
of values, a HashMap
of special values and
the unknown value token string representation. This is not meant to be directly used, the method
token_to_id
offers a more convenient interface for most vocabularies, but needs to be implemented
by the specific vocabulary. Read moresource§fn _id_to_token(
&self,
id: &i64,
indices: &HashMap<i64, String>,
special_indices: &HashMap<i64, String>,
unknown_value: &str
) -> String
fn _id_to_token( &self, id: &i64, indices: &HashMap<i64, String>, special_indices: &HashMap<i64, String>, unknown_value: &str ) -> String
Converts an id to a token, provided a
HashMap
of values, a HashMap
of special values and
the unknown value token string representation. This is not meant to be directly used, the method
id_to_token
offers a more convenient interface for most vocabularies, but needs to be implemented
by the specific vocabulary. Read moresource§fn convert_tokens_to_ids(&self, tokens: &[&str]) -> Vec<i64>
fn convert_tokens_to_ids(&self, tokens: &[&str]) -> Vec<i64>
Converts a list of tokens to a list of indices. Read more
source§fn add_extra_ids(&mut self, num_extra_ids: i64)
fn add_extra_ids(&mut self, num_extra_ids: i64)
Add extra token ids to the vocab Read more
source§fn add_tokens(&mut self, tokens: &[&str])
fn add_tokens(&mut self, tokens: &[&str])
Add arbitrary tokens to the vocabulary. Read more
Auto Trait Implementations§
impl RefUnwindSafe for BaseVocab
impl Send for BaseVocab
impl Sync for BaseVocab
impl Unpin for BaseVocab
impl UnwindSafe for BaseVocab
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more