Struct BpeOptions

Source

pub struct BpeOptions<'a> {
    pub merges: &'a [(Cow<'a, str>, Cow<'a, str>)],
    pub vocab: Option<FxHashMap<EncodedBytes, TokenId>>,
    pub added_tokens: FxHashMap<TokenId, String>,
    pub end_of_word_suffix: Option<String>,
    pub ignore_merges: bool,
}

Expand description

Configuration for a Bpe tokenization model.

Fields§

§merges: &'a [(Cow<'a, str>, Cow<'a, str>)]

Ordered entries of the merge list. Each entry is a pair of strings representing byte sequences. See also merge_pairs_from_lines which can be used to extract pairs from the space-separated format used in eg. merges.txt files.

§vocab: Option<FxHashMap<EncodedBytes, TokenId>>

Mapping between token strings and IDs. If not provided, the ID of a token is 256 + the index of the pair in the merge list which form the token string when concatenated. For example, if index 10 in the merge list is “foo bar”, then the token ID of “foobar” would be 266. Token IDs below 256 are reserved for individual bytes.

§added_tokens: FxHashMap<TokenId, String>

Set of tokens which don’t appear in merges but do have a mapping in vocab. These are used for special purposes such as representing the end of output.

§end_of_word_suffix: Option<String>

A string which is implicitly appended to each substring that is tokenized, after initial splitting.

§ignore_merges: bool

When encoding a string piece, match the entire piece against the vocabulary before applying merge rules.

BpeOptions

Struct BpeOptions Copy item path

Fields§

Trait Implementations§

impl<'a> Default for BpeOptions<'a>

fn default() -> BpeOptions<'a>

Auto Trait Implementations§

impl<'a> Freeze for BpeOptions<'a>

impl<'a> RefUnwindSafe for BpeOptions<'a>

impl<'a> Send for BpeOptions<'a>

impl<'a> Sync for BpeOptions<'a>

impl<'a> Unpin for BpeOptions<'a>

impl<'a> UnwindSafe for BpeOptions<'a>

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct BpeOptions

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,