zhconv 0.4.1

Traditional, Simplified and regional Chinese variants converter powered by MediaWiki & OpenCC rulesets and the Aho-Corasick algorithm 中文简繁及地區詞轉換
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Dataset

## Files

- `zhConversion.php` is sourced from [MediaWiki]https://github.com/wikimedia/mediawiki/blob/master/includes/languages/Data/ZhConversion.php (licensed under GPLv2.0 or later) and `*.txt` are from [OpenCC]https://github.com/BYVoid/OpenCC/tree/master/data (licensed under Apache 2.0). They are used by `build.rs` to generate dictionaries and automata to be bundled.

- `update_basic.py` updates `zhConversion.php`, `*.txt` from upstream and relevant references in `build.rs` automatically.

- `update_cgroups.py` pulls and formats CGroups (common conversion groups) from Chinese Wikipedia into `cgroups/`. ([source]https://zh.wikipedia.org/w/index.php?search=CGroup&title=Special:%E6%90%9C%E7%B4%A2&profile=advanced&fulltext=1&ns10=1&ns828=1)

- `cgroups/merge_for_web.py` combines `cgroups/*.json` into a monolithic `cgroups.json` to be placed in `:/web/public/cgroups.json`, which is included in the web app. The libs and the cli are not referencing CGroups for now. 

## Licenses

Rulesets are neither maintained nor licensed by zhconv-rs. Check their sources for licenses.