Struct rust_tokenizers::vocab::RobertaVocab[][src]

pub struct RobertaVocab {
    pub values: HashMap<String, i64>,
    pub indices: HashMap<i64, String>,
    pub unknown_value: &'static str,
    pub special_values: HashMap<String, i64>,
    pub special_indices: HashMap<i64, String>,
}

RoBERTa Vocab

Vocabulary for RoBERTa tokenizer. Contains the following special values:

  • PAD token
  • BOS token
  • EOS token
  • SEP token
  • MASK token
  • CLS token

Expects a JSON-format vocabulary when created from file.

Fields

values: HashMap<String, i64>

A mapping of tokens as string to indices (i.e. the encoder base)

indices: HashMap<i64, String>

A mapping of token IDs to strings (i.e. the decoder base)

unknown_value: &'static str

The string to use for unknown (out of vocabulary) tokens

special_values: HashMap<String, i64>

A mapping of special value tokens as strings to IDs (i.e. the encoder base for special values), special values typically include things like BOS/EOS markers, class markers, mask markers and padding markers

special_indices: HashMap<i64, String>

A mapping of special value tokens as IDs to strings (i.e. the decoder base for special values)

Implementations

impl RobertaVocab[src]

pub fn pad_value() -> &'static str[src]

Returns the PAD token for RoBERTa (<pad>)

pub fn bos_value() -> &'static str[src]

Returns the BOS token for RoBERTa (<s>)

pub fn eos_value() -> &'static str[src]

Returns the EOS token for RoBERTa (</s>)

pub fn sep_value() -> &'static str[src]

Returns the SEP token for RoBERTa (</s>)

pub fn cls_value() -> &'static str[src]

Returns the CLS token for RoBERTa (<s>)

pub fn mask_value() -> &'static str[src]

Returns the MASK token for RoBERTa (<mask>)

Trait Implementations

impl Clone for RobertaVocab[src]

impl Debug for RobertaVocab[src]

impl MultiThreadedTokenizer<RobertaVocab> for RobertaTokenizer[src]

impl Tokenizer<RobertaVocab> for RobertaTokenizer[src]

impl Vocab for RobertaVocab[src]

fn from_file(path: &str) -> Result<RobertaVocab, TokenizerError>[src]

Read a Roberta-style vocab.json file

Auto Trait Implementations

Blanket Implementations

impl<T> Any for T where
    T: 'static + ?Sized
[src]

impl<T> Borrow<T> for T where
    T: ?Sized
[src]

impl<T> BorrowMut<T> for T where
    T: ?Sized
[src]

impl<T> From<T> for T[src]

impl<T, U> Into<U> for T where
    U: From<T>, 
[src]

impl<T> Pointable for T

type Init = T

The type for initializers.

impl<T> ToOwned for T where
    T: Clone
[src]

type Owned = T

The resulting type after obtaining ownership.

impl<T, U> TryFrom<U> for T where
    U: Into<T>, 
[src]

type Error = Infallible

The type returned in the event of a conversion error.

impl<T, U> TryInto<U> for T where
    U: TryFrom<T>, 
[src]

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.