rehuman
Unicode-safe text cleaning & normalization for Rust.
Strip invisible characters, normalize typography, and enforce consistent formatting for text sourced from web scraping, user input, or LLMs.
This crate is a Rust rewrite and expansion of humanize-ai-lib by Nordth.
Install
Add the Rust library crate:
[]
= "0.1.2" # replace with the latest published version
Install CLI binaries (rehuman, ishuman):
For the latest version(s), clone this repo and run cargo install --path .:
Binaries will be installed to ~/.cargo/bin by default.[^1]
[^1]: You may need to add ~/.cargo/bin to your PATH if it is not already there; add export PATH="$HOME/.cargo/bin:$PATH" to your shell profile (.bashrc, .zshrc, etc.).
Quick Start
use ;
let cleaned = clean; // -> "Hellothere"
let humanized = humanize; // -> "\"Quote\"-and...more"
use clean;
// Default behavior removes emoji
let cleaned = clean; // -> "Thanks"
By default, keyboard-only mode emits ASCII-safe output.
Non-ASCII text is normalized/transliterated when feasible; unmappable
characters are removed.
Tune this with --non-ascii-policy, --extended-keyboard, and
--preserve-joiners (details in docs/api.md
and docs/cli.md).
For docs/source files where Unicode glyphs matter (for example box-drawing diagrams),
use the CLI with --preset code-safe (or --keyboard-only false).
For detailed semantics and option behavior, use the API reference links below.
Documentation
Primary docs by concern:
- Rust API semantics (defaults, options, presets, stats, errors): docs/api.md
- CLI flags, modes, config, and exit behavior: docs/cli.md
- Usage recipes: docs/examples.md
- Python bindings (
import rehuman): python/docs/index.md - Roadmap and development notes: docs/development.md
For CLI help at runtime: rehuman --help and ishuman --help.
License
MIT