Crate furigana

Source
Expand description

§furigana

Crates.io docs.rs Crates.io GitHub

Contains functionality for correctly mapping furigana to a word given a reading, optionally using kanji reading data.

§Usage

for mapping in furigana::map_naive("物の怪", "もののけ") {
    println!("{mapping}");
}

prints out the following mappings:

のけ

もの

The second mapping is correct one, but based only on a word and its reading there’s no way to determine that.

If given information about kanji readings (for example, from KANJIDIC2), furigana::map is able to grade the potential mappings:

let mut kanji_to_readings = HashMap::new();
kanji_to_readings.insert("物".to_string(), vec!["もの".to_string()]);
kanji_to_readings.insert("怪".to_string(), vec!["け".to_string()]);
let mapping = furigana::map("物の怪", "もののけ", &kanji_to_readings)[0];
println!("{mapping}");

Here, the incorrect mapping is rejected using the knowledge given about kanji readings, so that only the correct mapping is printed:

もの

§Notes

  • The algorithm used is recursive and not optimised, so it may be inefficient for very long inputs and certain edge cases that produce a large amount of potential mappings. When using real data and dividing it into shorter segments (e.g. by word or by sentence) there should be no issue.

  • Irregular readings such as おとな for 大人 and とおか 10日 are handled case by case so these may be mapped in correctly in some cases. Issues on these are appreciated.

  • If the library fails to produce the correct mapping, or if its accuracy is lower than that of an incorrect mapping, an issue is much appreciated!

§License

Licensed under the Mozilla Public License Version 2.0.

Structs§

Furigana
A mapping of furigana to a word.
FuriganaNode
FuriganaSegment

Functions§

map
Returns a list of all possible ways to map the reading to the text, matching the kana in the reading to the ones in the text. Uses the information in kanji_to_readings to approximate the accuracy of each mapping. Returns an empty list if the segments and readings are impossible to match.
map_naive
Returns a list of all possible ways to map the reading to the text, matching the kana in the reading to the ones in the word. Returns an empty list if the segments and readings are impossible to match.