Struct vibrato::dictionary::Dictionary
source · pub struct Dictionary { /* private fields */ }
Expand description
Dictionary for tokenization.
Implementations§
source§impl Dictionary
impl Dictionary
sourcepub fn word_feature(&self, word_idx: WordIdx) -> &str
pub fn word_feature(&self, word_idx: WordIdx) -> &str
Gets the reference to the feature string.
sourcepub fn write<W>(&self, wtr: W) -> Result<usize>where
W: Write,
pub fn write<W>(&self, wtr: W) -> Result<usize>where W: Write,
Exports the dictionary data.
Examples
use std::fs::File;
use vibrato::SystemDictionaryBuilder;
let dict = SystemDictionaryBuilder::from_readers(
File::open("src/tests/resources/lex.csv")?,
File::open("src/tests/resources/matrix.def")?,
File::open("src/tests/resources/char.def")?,
File::open("src/tests/resources/unk.def")?,
)?;
let writer = File::create("path/to/system.dic")?;
dict.write(writer)?;
Errors
When bincode generates an error, it will be returned as is.
sourcepub fn read<R>(rdr: R) -> Result<Self>where
R: Read,
pub fn read<R>(rdr: R) -> Result<Self>where R: Read,
Creates a dictionary from raw dictionary data.
The argument must be a byte sequence exported by the Dictionary::write()
function.
Examples
use std::fs::File;
use vibrato::Dictionary;
let reader = File::open("path/to/system.dic")?;
let dict = Dictionary::read(reader)?;
Errors
When bincode generates an error, it will be returned as is.
sourcepub unsafe fn read_unchecked<R>(rdr: R) -> Result<Self>where
R: Read,
pub unsafe fn read_unchecked<R>(rdr: R) -> Result<Self>where R: Read,
Creates a dictionary from raw dictionary data.
The argument must be a byte sequence exported by the Dictionary::write()
function.
Unlike the Dictionary::read()
function, this function does not check the correctness of
the dictionary.
Examples
use std::fs::File;
use vibrato::Dictionary;
let reader = File::open("path/to/system.dic")?;
let dict = unsafe { Dictionary::read_unchecked(reader)? } ;
Safety
The given reader must be a correct file exported by Dictionary::write()
.
Errors
When bincode generates an error, it will be returned as is.
sourcepub fn reset_user_lexicon_from_reader<R>(
self,
user_lexicon_rdr: Option<R>
) -> Result<Self>where
R: Read,
pub fn reset_user_lexicon_from_reader<R>( self, user_lexicon_rdr: Option<R> ) -> Result<Self>where R: Read,
Resets the user dictionary from a reader.
Arguments
user_lexicon_rdr
: A reader of a lexicon file*.csv
in the MeCab format. IfNone
, clear the current user dictionary.
Errors
VibratoError
is returned when an input format is invalid.
sourcepub fn map_connection_ids_from_iter<L, R>(
self,
lmap: L,
rmap: R
) -> Result<Self>where
L: IntoIterator<Item = u16>,
R: IntoIterator<Item = u16>,
pub fn map_connection_ids_from_iter<L, R>( self, lmap: L, rmap: R ) -> Result<Self>where L: IntoIterator<Item = u16>, R: IntoIterator<Item = u16>,
Edits connection ids with the given mappings.
Arguments
lmap/rmap
: An iterator of mappings of left/right ids, where thei
-th item (1-origin) indicates a new id mapped from idi
.
Errors
VibratoError
is returned when
- a new id of
BOS_EOS_CONNECTION_ID
is included, - new ids are duplicated, or
- the set of new ids are not same as that of old ids.