gukhanmun-unihan 0.1.0-dev.16+395401fa8b3429703127c66d1db02c435606d87b

Generates gukhanmun-core fallback readings from Unicode Unihan data.
gukhanmun-unihan-0.1.0-dev.16+395401fa8b3429703127c66d1db02c435606d87b is not a library.

gukhanmun-unihan

Code generator that downloads the Unicode Unihan database and produces the unihan_readings.rs source file that gukhanmun-core compiles in for fallback hanja phonetization.

This is a development-time tool, not a library. Normal users of Gukhanmun do not need to run it: the generated file is committed to the repository and updated only when the Unicode version changes or the extraction logic is revised.

What it generates

The tool reads the kHangul field from Unihan_Readings.txt inside Unihan.zip and emits a sorted static array mapping Unicode scalar values to their Korean readings. gukhanmun-core compiles this array into its fallback phonetizer so that characters not found in any loaded dictionary still receive a plausible reading.

The Unicode version and the expected SHA-256 of Unihan.zip are pinned as constants in the source. A checksum mismatch causes the tool to abort, so accidental use of a different Unicode release is caught immediately.

Running

cargo run -p gukhanmun-unihan -- \
    --output crates/gukhanmun-core/src/generated/unihan_readings.rs

The download is cached next to the output path between runs.

License

GPL-3.0-only. See LICENSE at the repository root.