pub struct BpePairVocab {
    pub values: HashMap<(String, String), i64>,
}
Expand description

Byte pair Encoding Vocab

BPE vocab containing the merges (dictionary of pairs with their priority) used to merge pairs together. This vocabulary element is used on BPE tokenizers such as GPT2 or RoBERTa. This vocabulary is not meant to be used directly, but rather as part of a BPE Tokenizer.

Fields§

§values: HashMap<(String, String), i64>

Implementations§

source§

impl BpePairVocab

source

pub fn from_file<P: AsRef<Path>>( path: P ) -> Result<BpePairVocab, TokenizerError>

Create a new BpePairVocab from a flat file containing merges in the format first_element second_element) The indices are implied by the lien position of each pair in the merges file. The first line needs to be a header and is skipped.

Example
use rust_tokenizers::vocab::{BpePairVocab, Vocab};
let path = "path/to/file";

let bpe_vocab = BpePairVocab::from_file(path);
source

pub fn from_sentencepiece_file<P: AsRef<Path>>( path: P ) -> Result<BpePairVocab, TokenizerError>

Create a new BpePairVocab from a SentencePiece file containing a BPE model.

Example
use rust_tokenizers::vocab::{BpePairVocab, Vocab};
let path = "path/to/spiece.model";

let bpe_vocab = BpePairVocab::from_sentencepiece_file(path);
source

pub fn byte_pair_to_id(&self, byte_pair: &BpePairRef<'_>) -> Option<&i64>

Gets the id of a “byte pair” in the merges vocab. Returns an optional index for the pair if it is found in the vocabulary.

Example
use rust_tokenizers::vocab::{BpePairRef, BpePairVocab, Vocab};
let path = "path/to/file";

let bpe_vocab = BpePairVocab::from_file(path).unwrap();

let query = BpePairRef {
    byte_1: &"won".to_string(),
    byte_2: &"derful".to_string(),
};
let id = bpe_vocab.byte_pair_to_id(&query);

Trait Implementations§

source§

impl Clone for BpePairVocab

source§

fn clone(&self) -> BpePairVocab

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for BpePairVocab

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for Twhere T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for Twhere T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for Twhere T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for Twhere U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

§

impl<T> Pointable for T

§

const ALIGN: usize = _

The alignment of pointer.
§

type Init = T

The type for initializers.
§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
source§

impl<T> ToOwned for Twhere T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.