Skip to main content

Module compress

Module compress 

Source

Functionsยง

compress_block
Collapse repeated blank lines while preserving paragraph breaks, and compress whitespace within each line.
compress_text
Semantic text reduction: strip decorative glyphs, then collapse runs of whitespace, then trim.
estimate_tokens
Fast token approximation: ~4 characters per token, matching common BPE tokenizers closely enough for budgeting.
truncate_to_tokens
Truncate text to roughly max_tokens, on a character boundary, appending an elision marker when content is dropped.