Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
hanzi-sort
Sort Chinese text the way Chinese readers expect.
hanzi-sort is a Rust CLI and library for sorting Hanzi by Hanyu Pinyin or by stroke count, with deterministic tie-breaking, phrase-level override rules for polyphonic characters, and terminal-friendly tabular output.
Migration note
pinyin-sorthas been renamed tohanzi-sort. This is a hard rename: there is no compatibility binary alias.
Why it exists
Unicode codepoint order is not Chinese sort order.
If you want useful output for names, glossaries, study lists, or publishing workflows, you usually need more than plain lexical comparison:
- pinyin order for alphabetic-style indexes
- stroke order for dictionary-style or teaching workflows
- phrase-level override rules for polyphonic characters like
重庆or银行 - stable tie-breaking so the same dataset always produces the same order
Quick examples
Install the CLI:
Sort by pinyin:
Sort by stroke count:
Resolve a polyphonic phrase with an override file:
Features
- Sort by
pinyinorstrokes(default), or opt in tojyutping,zhuyin,radicalvia cargo features - Read from stdin when no input flags are given (
cat names.txt | hanzi-sort) - Keep unknown characters in the comparison key instead of dropping them
- Break ties by original character so output stays deterministic
- Override single characters or full phrases with TOML
- Format output with configurable columns, alignment, padding, separators, and blank-line cadence
- Reuse the same core sorter from Rust via
PinyinCollator,StrokesCollator, and theCollatortrait
Performance
hanzi-sort is 3.8×–4.8× faster than icu_collator 2.x with zh-u-co-pinyin
on Chinese pinyin sort workloads (Apple Silicon, deterministic input):
| N | hanzi-sort | ICU zh-u-co-pinyin |
speedup |
|---|---|---|---|
| 1,000 | 188 µs | 759 µs | 4.0× |
| 10,000 | 2.51 ms | 11.99 ms | 4.8× |
| 100,000 | 34.9 ms | 131.5 ms | 3.8× |
The win comes from hanzi-sort trading ICU's full-locale generality for a
domain-specific compact representation: every primary pinyin syllable fits
in a u128 after byte-packed encoding, so per-character comparison is two
integer compares instead of multi-level CE table lookup. See
BENCHMARKS.md
for methodology, caveats, and reproduction.
Opt-in collators
A source install defaults to pinyin + strokes. Add the others explicitly, or grab a prebuilt release binary — those bundle all five collators with no extra flags.
# pinyin + strokes only (default)
# add Cantonese Jyutping (Unihan kCantonese)
# add Mandarin Zhuyin / Bopomofo
# add Kangxi radical sort (Unihan kRSUnicode)
# everything
Library usage
use ;
let sorted = sort_strings_with;
use AnyCollator;
let sorted = pinyin.sort;
Learn more
- Repository: https://github.com/Acture/hanzi-sort
- Documentation: https://docs.rs/hanzi-sort
- License: https://github.com/Acture/hanzi-sort/blob/master/LICENSE