pub struct Dict { /* private fields */ }
Implementations§
source§impl Dict
impl Dict
sourcepub fn load<T: Read + Seek, Y: Read + Seek>(
sysdic: &mut BufReader<T>,
matrix: &mut BufReader<Y>
) -> Result<Dict, &'static str>
pub fn load<T: Read + Seek, Y: Read + Seek>(
sysdic: &mut BufReader<T>,
matrix: &mut BufReader<Y>
) -> Result<Dict, &'static str>
Load sys.dic and matrix.bin files into memory and prepare the data that’s stored in them to be used by the parser.
Returns a Dict or, on error, a string describing an error that prevented the Dict from being created.
Only supports UTF-8 mecab dictionaries with a version number of 0x66.
Ensures that sys.dic and matrix.bin have compatible connection matrix sizes.
sourcepub fn read_feature_string(
&self,
feature_offset: u32
) -> Result<String, &'static str>
pub fn read_feature_string(
&self,
feature_offset: u32
) -> Result<String, &'static str>
Takes an offset into an internal byte table that stores feature strings, returns the feature string starting at that offset.
This is the way that feature strings are stored internally in mecab dictionaries, and decoding them all on load time would slow down loading dramatically.
Does not check that the given offset is ACTUALLY the start of a feature string, so if you give an offset half way into a feature string, you’ll get the tail end of that feature string.
You should only feed this function the feature_offset field of a LexerToken.