Skip to main content

Module utils

Module utils 

Source
Expand description

Text normalization helpers. Text normalization helpers shared by source implementations.

Functionsยง

file_mtime
Best-effort file modified time.
file_times
Best-effort (created_at, updated_at) pair for a file.
is_text_file
True if the path has a .txt extension (case-insensitive).
make_section
Convenience helper to construct a RecordSection with normalized text metadata. Convenience helper to build a RecordSection with precomputed sentences.
normalize_inline_whitespace
Collapse repeated whitespace in-place while preserving single spaces. Collapse runs of whitespace into single spaces and trim.
platform_newline
Returns the newline string for the current platform ("\n" on Unix, "\r\n" on Windows).
sentences
Split a block of text into sentences, falling back to the whole string when needed. Heuristic sentence splitter with tokenizer-friendly rules.