Skip to main content

normalize_text

Function normalize_text 

Source
pub fn normalize_text(raw: &str) -> String
Expand description

Canonicalize extracted text so output is stable across adapters:

  1. Normalize line endings to \n (drop \r).
  2. Trim trailing whitespace on each line.
  3. Collapse three-or-more consecutive blank lines to a single blank line.
  4. Trim leading/trailing blank lines, then append exactly one \n (unless the whole text is empty, which stays empty — the image-only-PDF contract).

This is layout tid-up only; it never reorders or drops words. Word-level content is whatever the adapter recovered.