Expand description
Text normalization helpers. Text normalization helpers shared by source implementations.
Functionsยง
- file_
mtime - Best-effort file modified time.
- file_
times - Best-effort (created_at, updated_at) pair for a file.
- is_
text_ file - True if the path has a
.txtextension (case-insensitive). - make_
section - Convenience helper to construct a
RecordSectionwith normalized text metadata. Convenience helper to build aRecordSectionwith precomputed sentences. - normalize_
inline_ whitespace - Collapse repeated whitespace in-place while preserving single spaces. Collapse runs of whitespace into single spaces and trim.
- platform_
newline - Returns the newline string for the current platform (
"\n"on Unix,"\r\n"on Windows). - sentences
- Split a block of text into sentences, falling back to the whole string when needed. Heuristic sentence splitter with tokenizer-friendly rules.