Expand description
A fast Japanese romanizer.
§Usage
use ib_romaji::HepburnRomanizer;
let romanizer = HepburnRomanizer::default();
let mut romajis = Vec::new();
romanizer.romanize_and_try_for_each("日本語", |len, romaji| {
romajis.push((len, romaji));
None::<()>
});
assert_eq!(romajis, vec![(9, "nippongo"), (3, "a"), (3, "aki"), (3, "bi"), (3, "chi"), (3, "he"), (3, "hi"), (3, "iru"), (3, "jitsu"), (3, "ka"), (3, "kou"), (3, "ku"), (3, "kusa"), (3, "nchi"), (3, "ni"), (3, "nichi"), (3, "nitsu"), (3, "su"), (3, "tachi")]);
assert_eq!(romanizer.romanize_vec("日本語"), vec![(9, "nippongo"), (3, "a"), (3, "aki"), (3, "bi"), (3, "chi"), (3, "he"), (3, "hi"), (3, "iru"), (3, "jitsu"), (3, "ka"), (3, "kou"), (3, "ku"), (3, "kusa"), (3, "nchi"), (3, "ni"), (3, "nichi"), (3, "nitsu"), (3, "su"), (3, "tachi")]);§Binary size
The dictionary will take ~4.8 MiB (5.5 MiB without compression) in the binary at the moment.
§Design
&[&str] will cause each str to occupy 16 extra bytes to store the pointer and length. While CStr only needs 1 byte for each str.
- For words, this can save 3.14 MiB (actually 3.54 MiB).
- Source file: 2.98 MiB ->
\0+\: 2.80 MiB,\n: 2.54 MiB build()time:split()/memchr +10%
- Source file: 2.98 MiB ->
- And this way the str can also be compressed and then streamly decompressed.
§Features
compress-words(enabled by default) — Binary size (and memory usage) -696 KiB (771 KiB if zstd is already used), romanizer build time +1.1 ms.cache— Enable serialization/deserialization of HepburnRomanizer for caching initialization state. When combined withstd, also enables file-based caching via the builder API.std(enabled by default) — Enable standard library support for file-based caching.
Modules§
Structs§
- Hepburn
Romanizer - Hepburn romanization
- Hepburn
Romanizer Builder - Use builder syntax to set the inputs and finish with
build().