zhconv 0.1.0-alpha

Convert Traditional/Simplified Chinese and regional words of Taiwan/Hong Kong/mainland China/Singapore based on Wikipedia conversion tables 轉換中文簡體、繁體及兩岸、新馬地區詞,基於中文維基轉換表
# (WIP) zhconv-rs
zhconv-rs in a Rust lib, cli tool, and also a web app to convert Chinese text among several script or region variants (e.g. `zh-TW <-> zh-CN <-> zh-HK <-> zh-Hans <-> zh-Hant`). 

It is built on the top the [zhConversion.php](https://github.com/wikimedia/mediawiki/blob/master/includes/languages/data/ZhConversion.php#L14) conversion tables from Mediawiki, which is the one also used on Chinese Wikipedia.


## Supported variants

| Target                                 | Tag       | Script  | Description                                 |
| -------------------------------------- | --------- | ------- | ------------------------------------------- |
| **S**implified **C**hinese / 简体中文  | `zh-Hans` | SC / 简 | W/O substituing region-specific words.  |
| **T**raditional **C**hinese / 繁體中文 | `zh-Hant` | TC / 繁 | Ditto.                                      |
| Chinese (Taiwan) / 臺灣正體            | `zh-TW`   | TC / 繁 | With Taiwan-specific words adapted.         |
| Chinese (Hong Kong) / 香港繁體         | `zh-HK`   | TC / 繁 | With Hong Kong-specific words adapted.      |
| Chinese (Macau) / 澳门繁體             | `zh-MO`   | TC / 繁 | With Taiwan-specific words adapted.         |
| Chinese (Mainland China) / 大陆简体    | `zh-CN`   | SC / 简 | With mainland China-specific words adapted. |
| Chinese (Singapore) / 新加坡简体       | `zh-SG`   | SC / 简 | With Singapore-specific words adapted.      |
| Chinese (Malaysia) / 大马简体          | `zh-MY`   | SC / 简 | With Malaysia-specific words adapted.       |

*Note:*  `zh-TW`, `zh-HK` and `zh-MO` are based on `zh-Hant`. `zh-CN`, `zh-MY` and `zh-SG` are based on `zh-Hans`. Currently, `zh-MO` is based on `zh-HK` with few addional Macau-specific words; `zh-MY` and `zh-SG` are both based on `zh-CN` with with few addional Singapore/Malaysia-specific words. 

## Comparisions with other tools
- OpenCC: Dict::MatchPrefix (iterating from maxlen to minlen character by character to match) [https://github.dev/BYVoid/OpenCC/blob/21995f5ea058441423aaff3ee89b0a5d4747674c/src/Dict.cpp#L25]MatchPrefix, [segments converter]https://github.dev/BYVoid/OpenCC/blob/21995f5ea058441423aaff3ee89b0a5d4747674c/src/Conversion.cpp#L27 [segmentizer]https://github.dev/BYVoid/OpenCC/blob/21995f5ea058441423aaff3ee89b0a5d4747674c/src/MaxMatchSegmentation.cpp#L34
- zhConversion.php: strtr (iterating from maxlen to minlen for every known key length to match) [https://github.dev/php/php-src/blob/217fd932fa57d746ea4786b01d49321199a2f3d5/ext/standard/string.c#L2974]
- zhconv-rs regex-based automaton

## TODO
[] Support [Module:CGroup](https://zh.wikipedia.org/wiki/Module:CGroup)