Enum rust_tokenizers::Mask[][src]

pub enum Mask {
    None,
    Whitespace,
    Punctuation,
    CJK,
    Special,
    Begin,
    Continuation,
    Unfinished,
    Unknown,
}
Expand description

Variants

None

The token has no particular mask. This is the default situation. It may indicate that further processing can be done on a token.

Whitespace

The token represents a whitespace (in any shape or form)

Punctuation

The token represents punctuation (in any shape or form)

CJK

The token represents a single Chinese/Japanese/Korean character (including kana and hangul)

Special

The token is a special marker (such as a separator marker, a class marker, etc)

Begin

The token is the begin in a series of subtokens, the offset refers specifically to the sub-token. Subsequent tokens in this sequence will carry the ‘Continuation’ mask

Continuation

The token is the continuation of the previous token, the offset refers specifically to the sub-token. All but the first sub-token in a sequence carry this mask (the first carries ‘Begin’). (this is the reverse of Mask::Unfinished)

Unfinished

The token is the start of a token but not finished yet. All but the last sub-token in the a token sequence carry this mask. This is the reverse of Mask::Continuation.

Unknown

The token is out of vocabulary, it is unknown by the tokenizer and it will decode to unknown. Tokens that can be decoded properly (but may still be out of vocabulary) should not set this.

Trait Implementations

Returns a copy of the value. Read more

Performs copy-assignment from source. Read more

Formats the value using the given formatter. Read more

Returns the “default value” for a type. Read more

Deserialize this value from the given Serde deserializer. Read more

This method tests for self and other values to be equal, and is used by ==. Read more

This method tests for !=.

This method returns an ordering between self and other values if one exists. Read more

This method tests less than (for self and other) and is used by the < operator. Read more

This method tests less than or equal to (for self and other) and is used by the <= operator. Read more

This method tests greater than (for self and other) and is used by the > operator. Read more

This method tests greater than or equal to (for self and other) and is used by the >= operator. Read more

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more

Immutably borrows from an owned value. Read more

Mutably borrows from an owned value. Read more

Performs the conversion.

Performs the conversion.

The alignment of pointer.

The type for initializers.

Initializes a with the given initializer. Read more

Dereferences the given pointer. Read more

Mutably dereferences the given pointer. Read more

Drops the object pointed to by the given pointer. Read more

The resulting type after obtaining ownership.

Creates owned data from borrowed data, usually by cloning. Read more

🔬 This is a nightly-only experimental API. (toowned_clone_into)

recently added

Uses borrowed data to replace owned data, usually by cloning. Read more

The type returned in the event of a conversion error.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.