Expand description
Pure-Rust core for textsanity. Configurable text cleanup before
the input gets near a tokenizer or an LLM.
Each operation is independent and toggleable via Options. The
defaults reflect what most LLM-app builders actually want: NFKC,
zero-width strip, control-char strip, whitespace collapse, trim.
Structs§
- Options
- Cleanup pipeline configuration.
Enums§
- Sanity
Error - All errors surfaced by
textsanity-core.
Functions§
- normalize_
newlines - Normalize line endings to
\n. Converts both\r\n(CRLF) and lone\r(CR) to\n. Idempotent. Cheap to apply before or after the mainsanitizepipeline. - sanitize
- Run the cleanup pipeline against
textwith the given options. - sanitize_
many - Bulk variant. With
parallel = true, distributes across rayon’s pool.
Type Aliases§
- Result
- Crate-wide result alias.