docs.rs failed to build hanzi-sort-0.2.3
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

hanzi-sort

Sort Chinese text the way Chinese readers expect.

hanzi-sort is a Rust CLI and library for sorting Hanzi by Hanyu Pinyin or by stroke count, with deterministic tie-breaking, phrase-level override rules for polyphonic characters, and terminal-friendly tabular output.

Migration note
pinyin-sort has been renamed to hanzi-sort. This is a hard rename: there is no compatibility binary alias.

Why it exists

Unicode codepoint order is not Chinese sort order.

If you want useful output for names, glossaries, study lists, or publishing workflows, you usually need more than plain lexical comparison:

pinyin order for alphabetic-style indexes
stroke order for dictionary-style or teaching workflows
phrase-level override rules for polyphonic characters like 重庆 or 银行
stable tie-breaking so the same dataset always produces the same order

Quick examples

Install the CLI:

cargo install hanzi-sort

Sort by pinyin:

hanzi-sort -t 汉字 张三 赵四

Sort by stroke count:

hanzi-sort -t 天 一 十 --sort-by strokes --columns 1 --entry-width 2 --blank-every 0

Resolve a polyphonic phrase with an override file:

hanzi-sort -t 重庆 银行 --config ./override.toml

Features

Sort by pinyin or strokes (default), or opt in to jyutping, zhuyin, radical via cargo features
Read from stdin when no input flags are given (cat names.txt | hanzi-sort)
Keep unknown characters in the comparison key instead of dropping them
Break ties by original character so output stays deterministic
Override single characters or full phrases with TOML
Format output with configurable columns, alignment, padding, separators, and blank-line cadence
Reuse the same core sorter from Rust via PinyinCollator, StrokesCollator, and the Collator trait

Performance

hanzi-sort is 3.8×–4.8× faster than icu_collator 2.x with zh-u-co-pinyin on Chinese pinyin sort workloads (Apple Silicon, deterministic input):

N	hanzi-sort	ICU `zh-u-co-pinyin`	speedup
1,000	188 µs	759 µs	4.0×
10,000	2.51 ms	11.99 ms	4.8×
100,000	34.9 ms	131.5 ms	3.8×

The win comes from hanzi-sort trading ICU's full-locale generality for a domain-specific compact representation: every primary pinyin syllable fits in a u128 after byte-packed encoding, so per-character comparison is two integer compares instead of multi-level CE table lookup. See BENCHMARKS.md for methodology, caveats, and reproduction.

Opt-in collators

A source install defaults to pinyin + strokes. Add the others explicitly, or grab a prebuilt release binary — those bundle all five collators with no extra flags.

# pinyin + strokes only (default)
cargo install hanzi-sort

# add Cantonese Jyutping (Unihan kCantonese)
cargo install hanzi-sort --features collator-jyutping

# add Mandarin Zhuyin / Bopomofo
cargo install hanzi-sort --features collator-zhuyin

# add Kangxi radical sort (Unihan kRSUnicode)
cargo install hanzi-sort --features collator-radical

# everything
cargo install hanzi-sort --all-features

Library usage

use hanzi_sort::{StrokesCollator, sort_strings_with};

let sorted = sort_strings_with(
    vec!["一".into(), "十".into(), "天".into()],
    &StrokesCollator,
);

use hanzi_sort::AnyCollator;

let sorted = AnyCollator::pinyin().sort(vec!["赵四".into(), "汉字".into()]);

Learn more

Repository: https://github.com/Acture/hanzi-sort
Documentation: https://docs.rs/hanzi-sort
License: https://github.com/Acture/hanzi-sort/blob/master/LICENSE

hanzi-sort 0.2.3