BpeOptions

Struct BpeOptions 

Source
pub struct BpeOptions<'a> {
    pub merges: &'a [(Cow<'a, str>, Cow<'a, str>)],
    pub vocab: Option<FxHashMap<EncodedBytes, TokenId>>,
    pub added_tokens: FxHashMap<TokenId, String>,
    pub end_of_word_suffix: Option<String>,
    pub ignore_merges: bool,
}
Expand description

Configuration for a Bpe tokenization model.

Fields§

§merges: &'a [(Cow<'a, str>, Cow<'a, str>)]

Ordered entries of the merge list. Each entry is a pair of strings representing byte sequences. See also merge_pairs_from_lines which can be used to extract pairs from the space-separated format used in eg. merges.txt files.

§vocab: Option<FxHashMap<EncodedBytes, TokenId>>

Mapping between token strings and IDs. If not provided, the ID of a token is 256 + the index of the pair in the merge list which form the token string when concatenated. For example, if index 10 in the merge list is “foo bar”, then the token ID of “foobar” would be 266. Token IDs below 256 are reserved for individual bytes.

§added_tokens: FxHashMap<TokenId, String>

Set of tokens which don’t appear in merges but do have a mapping in vocab. These are used for special purposes such as representing the end of output.

§end_of_word_suffix: Option<String>

A string which is implicitly appended to each substring that is tokenized, after initial splitting.

§ignore_merges: bool

When encoding a string piece, match the entire piece against the vocabulary before applying merge rules.

Trait Implementations§

Source§

impl<'a> Default for BpeOptions<'a>

Source§

fn default() -> BpeOptions<'a>

Returns the “default value” for a type. Read more

Auto Trait Implementations§

§

impl<'a> Freeze for BpeOptions<'a>

§

impl<'a> RefUnwindSafe for BpeOptions<'a>

§

impl<'a> Send for BpeOptions<'a>

§

impl<'a> Sync for BpeOptions<'a>

§

impl<'a> Unpin for BpeOptions<'a>

§

impl<'a> UnwindSafe for BpeOptions<'a>

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.