jntajis-rs
A Rust port of jntajis-python, providing character transliteration functionality for Japanese text processing.
What's jntajis-rs?
jntajis-rs is a transliteration library specifically designed for dealing with three different character sets: JIS X 0208, JIS X 0213, and Unicode. This is a native Rust implementation that provides the same functionality as the original Python library.
use ;
Features
This library provides access to three different character tables:
- MJ character table (MJ文字一覧表) - A vast set of kanji characters used in Japanese text processing, developed by the Information-technology Promotion Agency
- MJ shrink conversion map (MJ縮退マップ) - For transliterating complex, less-frequently-used character variants to commonly-used ones
- NTA shrink conversion map (国税庁JIS縮退マップ) - Developed by Japan National Tax Agency to canonicalize user inputs
Usage
Add this to your Cargo.toml:
[]
= "0.2.0"
Basic Example
use ;
// Get all possible shrink candidates
let candidates: = mj_shrink_candidates
.take
.collect;
// Use specific shrink scheme
let jis_only = builder
.with;
let candidates: = mj_shrink_candidates
.take
.collect;
// Handle multiple characters
let candidates: = mj_shrink_candidates
.take
.collect;
Advanced Usage
The library supports various MJ shrink schemes:
JISIncorporationUCSUnificationRule- JIS incorporation and UCS unification rulesMOJNotice582- MOJ Notice 582 transliteration rulesMOJFamilyRegisterActRelatedNotice- Family register act related notice rulesInferenceByReadingAndGlyph- Inference by reading and glyph rules
You can combine multiple schemes:
let combined = builder
.with
.with;
See examples/mj_shrink_example.rs for more detailed usage examples.
Examples
Run the included example:
Building
# Standard build
# Run tests
Character Mapping Relationships
The relationship between Unicode, MJ character mappings, JIS X 0213, and JIS X 0208 follows the same structure as the original Python implementation:
- JNTA transliteration: Direct conversion using the JNTA character mappings table
- MJ transliteration: Two-phase process involving Unicode to MJ character mappings, then MJ shrink mappings
License
The source code is published under the BSD 3-clause license.
The embedded character mapping data comes from:
-
JIS shrink conversion mappings (国税庁: JIS縮退マップ)
- Publisher: National Tax Agency
- Source: https://www.houjin-bangou.nta.go.jp/download/
- License: CC BY 4.0
-
MJ character table (文字情報技術促進協議会: MJ文字一覧表)
- Publisher: Character Information Technology Promotion Council (CITPC)
- Author: Information-technology Promotion Agency (IPA)
- Source: https://moji.or.jp/mojikiban/mjlist/
- License: CC BY-SA 2.1 JP
-
MJ shrink conversion mappings (文字情報技術促進協議会: MJ縮退マップ)
- Publisher: Character Information Technology Promotion Council (CITPC)
- Author: Information-technology Promotion Agency (IPA)
- Source: https://moji.or.jp/mojikiban/map/
- License: CC BY-SA 2.1 JP
Related Projects
- jntajis-python - The original Python implementation