Features
- Phoneme <-> Word mapping: Provides detailed mappings linking morphological analysis results with phonemes (
g2p_pairs,g2p_mapping,g2p_mapping_prosody,g2p_mapping_detailed), which were previously unavailable. (See Advanced Features) - Prosody Information Retrieval: Provides phoneme sequences enriched with prosodic symbols as well as near-lossless mappings to surface forms (
g2p_prosody,g2p_mapping_prosody). (For more details, see Prosody Features.) - Performance: Enables fast processing through a native Rust implementation. (See Benchmark)
- Accuracy: Improves accuracy by incorporating English pronunciation estimation via
haqumei-kanalizerand other corrections, alongside various techniques frompyopenjtalk-plus. (See Accuracy) - Output Formats: Provides results in various formats, including a simple phoneme sequence (
g2p) and a detailed list including unknown word information (g2p_detailed). - Concurrency: Enables concurrent G2P processing across multiple threads using the
*_batchmethods.
Examples can be found in haqumei/examples.
Install
Rust
During the initial build of haqumei, the dictionary is downloaded and embedded into the binary due to the file size limits on crates.io.
For custom dictionaries, or for environments where network access is unavailable during the build, please refer to here.
Python
Supported Platforms
Pre-built wheels are available for the following platforms:
| OS | Architecture |
|---|---|
| Linux | x86_64, aarch64 |
| macOS | aarch64 (e.g., Apple Silicon M1/M2/M3) |
| Windows | x86_64 |
Pre-built wheels bundle the embedded dictionary and require no network access during installation.
If a wheel is unavailable for your platform, installation falls back to building from source, which requires a Rust toolchain. In that case, the dictionary is downloaded and embedded during the build (same as the Rust crate build process).
Command-Line Tool
We also provide haqumei-cli, a command-line interface for text processing from the terminal.
For detailed usage, including pipeline processing and JSON output, please see haqumei-cli/README.md
Usage
Rust
use Haqumei;
Python
# Initialize Haqumei (the dictionary will be automatically set up)
=
=
# Convert to a phoneme list
=
# -> Phonemes: ["k", "o", "N", "n", "i", "ch", "i", "w", "a", "pau", "s", "e", "k", "a", "i"]
# Get phoneme list with prosodic symbols
=
# -> Prosody-annotated phonemes: ^ k o [ N n i ch i w a _ s e ] k a i ! $
# Convert to katakana reading
=
# -> Katakana reading: コンニチワ、セカイ!
Advanced Features
Getting Phoneme Mapping with the Original Word String
Haqumei implements g2p_pairs to obtain the correspondence between phonemes and their original words.
This is achieved by traversing the JPCommon structure and tracking the pointers to the words to which each phoneme belongs.
use Haqumei;
Detailed G2P Output
In Open JTalk (pyopenjtalk), unknown words are treated as pau (pauses), and Haqumei's standard g2p function follows this behavior.
However, by using the g2p_**_detailed functions, you can detect otherwise ignored unknown words and spaces as unk and sp respectively.
Please note that sp does not refer to raw space characters in the input, but rather the "記号,空白" (symbol, space) part-of-speech output by Mecab, which is normally ignored in pyopenjtalk. Therefore, symbols that Mecab itself ignores (e.g., \t, \n) are not included in sp.
- Known words: Regular phoneme sequence (punctuation marks become
pau). - Unknown words:
unk - Spaces, etc.:
sp(Space)
Using g2p_mapping, you can obtain the phoneme-to-word mapping along with flags indicating whether a word is unknown (is_unknown) and whether it would normally be ignored in the original pipeline (is_ignored).
In addition, using g2p_mapping_detailed allows you to retrieve not only the mapping but also part-of-speech information and accent details.
use Haqumei;
Modifying Output with G2P Options
You can customize the behavior of Haqumei by using Haqumei::with_options.
For details on the default behavior and available options, please refer to HaqumeiOptions.
In the following example, normalize_unicode (which is disabled by default) is enabled to apply Unicode NFC normalization to the input text.
use ;
Prosody Features (g2p_prosody / g2p_mapping_prosody)
Specification of g2p_prosody_with_options
Converts the input text into a phoneme list annotated with prosodic symbols based on the ProsodyFormat setting.
(The g2p_prosody method behaves identically to specifying ProsodyFormat::Default.)
The output commonly includes the following prosodic symbols:
| Symbol | Meaning | Position |
|---|---|---|
^ |
Beginning of utterance (BOS) | Sentence-initial |
$ |
End of utterance (EOS) | Sentence-final |
? |
End of interrogative (?) | Sentence-medial |
! |
End of exclamation (Custom extension) | Sentence-medial |
_ |
Pause / Comma (、) | Sentence-medial |
# |
Accent phrase boundary | Sentence-medial |
{...} |
Unknown word | Sentence-medial |
For more information on Japanese accents, please refer to the tdmelodic User Manual / Preliminary Knowledge (Japanese).
ProsodyFormat::Default
In addition to the above, the output includes the following prosodic symbols:
| Symbol | Meaning | Position |
|---|---|---|
[ |
Pitch rise (Phrase head) | Near the beginning of a phrase |
] |
Pitch fall (Accent nucleus) | Right after the nuclear mora |
The symbols [ and ] are based on the accent notation commonly used in tdmelodic and similar tools.
They correspond to ^ and ! in the algorithm described by Kurihara et al. (2021) in "Prosodic Features Control by Symbols as Input of Sequence-to-Sequence Acoustic Modeling for Neural TTS".
ProsodyFormat::Prefix
Instead of using pitch rise/fall symbols ([ and ]), pitch high/low is attached as a prefix to each phoneme:
H_: High pitchL_: Low pitch
The pitch is explicitly indicated for each phoneme.
Example: "青い空" -> ["^", "L_a", "H_o", "L_i", "#", "H_s", "H_o", "L_r", "L_a", "$"]
ProsodyFormat::Numeric
Pitch high/low is attached as a suffix to each phoneme as a numeric value:
:1: High pitch:0: Low pitch
Example: "青い空" -> ["^", "a:0", "o:1", "i:0", "#", "s:1", "o:1", "r:0", "a:0", "$"]
Example
use Haqumei;
Specification of g2p_mapping_prosody
On the other hand, g2p_mapping_prosody analyzes the input text and retrieves an alignment between detailed linguistic information for each morpheme (word) and phonemes with prosodic symbols.
While [Haqumei::g2p_prosody] and [Haqumei::g2p_prosody_with_options] return a flat list of strings (Vec<String>), this function returns structured data (Vec<WordPhonemeProsody>) annotated with part-of-speech, accent type, reading, and pitch information.
This is suitable for speech synthesis frontend processing when you want to maintain the correspondence between morphemes and phonemes, individually retrieve and manipulate pitch high/low ([PitchAccent]), or handle unknown words.
Information included in WordPhonemeProsody
The following information is included as data for each morpheme:
| Field | Description | Example |
|---|---|---|
word |
Surface form of the morpheme | "空" |
pos, pos_group1~3 |
Part-of-speech and its subdivisions | "名詞", "一般" |
orig, read, pron |
Original form, reading, pronunciation form | "空", "ソラ", "ソラ" |
accent_nucleus |
Accent nucleus position (0: Heiban type, 1~: n-th mora) | 1 |
mora_count |
Number of moras | 2 |
is_unknown |
Whether it was judged as an unknown word by MeCab | false |
is_ignored |
Whether no phoneme was assigned | false |
Prosodic Phoneme (ProsodicPhoneme)
The phonemes field contains a list of the following elements:
| Variant | Meaning | Output symbol in g2p_prosody, etc. |
|---|---|---|
Phoneme |
The phoneme itself and its pitch (High / Low) |
a, a:0, H_a, etc. |
AccentPhraseBoundary |
Accent phrase boundary | # |
Pause |
Regular pause / comma | _ |
Interrogative |
End of interrogative / Pause | ? |
Exclamatory |
End of exclamation / Pause | ! |
Example
use ;
Accuracy
We evaluated the accuracy using the haqumei-eval crate. Below are the results:
- Phoneme Error Rate (PER) evaluated on prj-beatrice/jsut-label, a fork of
jsut-labelproviding annotations for the Basic5000 subset of the JSUT corpus. - Katakana Error Rate (Katakana ER) evaluated on ROHAN.
jsut-label
Phoneme Error Rate (S+D+I / N_expected): 1.24% (Substitute=2244, Delete=572, Insert=889, N=297843)
HaqumeiOptions:
HaqumeiOptions
ROHAN
Katakana Error Rate (S+D+I / N_expected): 1.64% (Substitute=1689, Delete=493, Insert=288, N=150637)
HaqumeiOptions:
HaqumeiOptions
Benchmark
Here are the comparison results between pyopenjtalk (Baseline) and haqumei using approximately 318,000 characters of Japanese text.
Input data: I Am a Cat (吾輩は猫である) 318,407 chars / 8,451 lines (Average 37 chars/line) (Ruby characters have been removed)
| Execution Mode | Execution Time (Mean) | Throughput | Speedup |
|---|---|---|---|
| pyopenjtalk (Baseline) | 2.358 s | 135k chars/s | 1.00x |
| haqumei (Default) | 1.303 s | 244k chars/s | 1.81x |
haqumei (g2p_batch, Default) |
0.098 s | 3.24M chars/s | 24.04x |
| haqumei (Heavy) | 2.101 s | 151k chars/s | 1.12x |
haqumei (g2p_batch, Heavy) |
0.268 s | 1.18M chars/s | 8.80x |
The detailed benchmark code can be found in haqumei-bench/pyopenjtalk.
Additionally, Rust-layer benchmarks for Haqumei using Criterion.rs can be run via cargo bench in the haqumei-bench crate. The comparison benchmark with pyopenjtalk-plus is located in haqumei-bench/pyopenjtalk-plus.
Performance Notes
- Throughput Variation by Input Structure:
Especially in the*_batchAPIs, throughput (chars/s) tends to increase as the number of characters per line grows (up to approximately 4KB), compared with pyopenjtalk. This efficiency stems from an implementation that directly extracts labels from Open JTalk's internal structures, combined with minimal FFI overhead. When processing large volumes of text, it is most efficient to pass content in substantial chunks rather than splitting it into excessively short lines. - Difference Between Default and Heavy:
In the table, "Default" represents the configuration usingHaqumei::newas is, while "Heavy" shows the results whenpredict_nanianduse_unidic_yomiare enabled in HaqumeiOptions.
Building with a Custom Embedded Dictionary
By default, haqumei downloads the dictionary at build time and embeds it into the binary.
This allows the crate to be published to crates.io while still producing a self-contained binary.
If you want to build with your own dictionary embedded in the binary, you can change the configuration as follows.
Change the Cargo Features
Disable the default download-dictionary feature and enable build-dictionary.
[]
= { = "x.y.z", = ["embed-dictionary", "build-dictionary"], = false }
Prepare the Dictionary Source and Set the Environment Variable
Prepare a dictionary source directory containing .csv and .def files to be compiled at build time, then set its path to the HAQUMEI_DICT_SRC environment variable before running the build.
On Unix-like systems:
HAQUMEI_DICT_SRC="/path/to/your/dictionary"
On Windows (PowerShell):
& { $env:HAQUMEI_DICT_SRC="C:\path\to\your\dictionary"; cargo build --release }
Note: If the environment variable is not set, the build script falls back to
dictionary, relative to the crate root.
Dictionary
Haqumei uses the dictionary included in pyopenjtalk-plus.
License
Haqumei, excluding haqumei-jlabel and haqumei-kanalizer, is distributed under the terms of the Apache License 2.0.
Licenses and Origins of Bundled Software
haqumei includes C/C++ source code and dictionary data from modified versions of Open JTalk to provide its Grapheme-to-Phoneme (G2P) functionality. The origins and licenses of this bundled code are as follows:
-
Bundled Open JTalk Source Code
- Origin: The code contained in the
vendor/open_jtalkdirectory is based on the tsukumijima/open_jtalk repository, which integrates improvements from various community forks (e.g., VOICEVOX project) into an enhanced version of Open JTalk. - License: The bundled Open JTalk source code is licensed under the Modified BSD License. This license applies
only to the code located in
vendor/open_jtalk, and does not apply to the rest of this project. In accordance with redistribution requirements, the full text of the Modified BSD License is included invendor/open_jtalk/src/COPYING.
- Origin: The code contained in the
-
Bundled Dictionary Data
- Origin: The dictionary data contained in the
haqumei/dictionarydirectory is based on tsukumijima/pyopenjtalk-plus, a modified fork of r9y9/pyopenjtalk. - License: The dictionary data is covered by the license notices in
haqumei/dictionary/COPYING.
- Origin: The dictionary data contained in the
-
Bundled
haqumei-jlabelSource Code- Origin: The code contained in the
haqumei-jlabeldirectory is based on the jpreprocess/jlabel repository. - License: The bundled
haqumei-jlabelsource code is licensed under the BSD 3-Clause License. This license applies only to the code located inhaqumei-jlabel, and does not apply to the rest of this project. In accordance with redistribution requirements, the full text of the BSD 3-Clause License is included inhaqumei-jlabel/LICENSE.
- Origin: The code contained in the
-
Bundled
haqumei-kanalizerCrate- Origin: The ONNX models bundled in
haqumei-kanalizerare based on VOICEVOX/kanalizer, with model weights from VOICEVOX/kanalizer-model (converted via o24s/kanalizer-onnx). - License: The entire
haqumei-kanalizercrate (both the Rust code and the bundled model weights) is licensed under the MIT License.
- Origin: The ONNX models bundled in
Acknowledgements
The fundamental design and API of haqumei are inspired by pyopenjtalk and its highly improved fork, pyopenjtalk-plus.
In addition, some implementations are based on jlabel and kanalizer to improve usability and accuracy.
- pyopenjtalk: Copyright (c) 2018 Ryuichi Yamamoto
- pyopenjtalk-plus: Copyright (c) 2023 tsukumijima
- jlabel: Copyright (c) 2024 JPreprocess Team
- kanalizer: Copyright (c) 2025 VOICEVOX
We are deeply grateful to the authors and contributors of these foundational projects.