Skip to main content

Crate textsanity_core

Crate textsanity_core 

Source
Expand description

Pure-Rust core for textsanity. Configurable text cleanup before the input gets near a tokenizer or an LLM.

Each operation is independent and toggleable via Options. The defaults reflect what most LLM-app builders actually want: NFKC, zero-width strip, control-char strip, whitespace collapse, trim.

Structs§

Options
Cleanup pipeline configuration.

Enums§

SanityError
All errors surfaced by textsanity-core.

Functions§

normalize_newlines
Normalize line endings to \n. Converts both \r\n (CRLF) and lone \r (CR) to \n. Idempotent. Cheap to apply before or after the main sanitize pipeline.
sanitize
Run the cleanup pipeline against text with the given options.
sanitize_many
Bulk variant. With parallel = true, distributes across rayon’s pool.

Type Aliases§

Result
Crate-wide result alias.