Skip to main contentModule compress
Source - compress_block
- Collapse repeated blank lines while preserving paragraph breaks, and
compress whitespace within each line.
- compress_text
- Semantic text reduction: strip decorative glyphs, then collapse runs of
whitespace, then trim.
- estimate_tokens
- Fast token approximation: ~4 characters per token, matching common
BPE tokenizers closely enough for budgeting.
- truncate_to_tokens
- Truncate text to roughly
max_tokens, on a character boundary, appending
an elision marker when content is dropped.