Struct notmecab::LexerToken
source · pub struct LexerToken {
pub left_context: u16,
pub right_context: u16,
pub pos: u16,
pub cost: i16,
pub original_id: u32,
pub feature_offset: u32,
pub start: usize,
pub end: usize,
pub kind: TokenType,
}
Fields§
§left_context: u16
Used internally during lattice pathfinding.
right_context: u16
Used internally during lattice pathfinding.
pos: u16
I don’t know what this is.
cost: i16
Used internally during lattice pathfinding.
original_id: u32
Unique identifier of what specific lexeme realization this is, from the mecab dictionary. changes between dictionary versions.
feature_offset: u32
Feed this to read_feature_string to get this token’s “feature” string.
The feature string contains almost all useful information, including things like part of speech, spelling, pronunciation, etc.
The exact format of the feature string is dictionary-specific.
feature_offset is currently !0u32 (i.e. 0xFFFFFFFF) for tokens of the kind TokenType::UNK. Feeding this value to read_feature_string will result in a blank string, not an error.
start: usize
Location, in codepoints, of the surface of this LexerToken in the string it was parsed from.
end: usize
Corresponding ending location, in codepoints. Exclusive. (i.e. when start+1 == end, the LexerToken’s surface is one codepoint long)
kind: TokenType
Origin of token. BOS and UNK are virtual origins (“beginning/ending-of-string” and “unknown”, respectively). Normal means it came from the mecab dictionary.
The BOS (beginning/ending-of-string) tokens are stripped away in parse_to_lexertokens.
Trait Implementations§
source§impl Clone for LexerToken
impl Clone for LexerToken
source§fn clone(&self) -> LexerToken
fn clone(&self) -> LexerToken
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read more