Skip to main content

Module file_processor

Module file_processor 

Source

Functions§

estimate_tokens
Rough token estimate: 1 token ≈ 4 characters (common approximation).
extract_content_from_bytes
Extract text from bytes with file extension
extract_file_content
Extract text content from various file formats
is_extraction_sentinel
Returns true when file_processor returned a sentinel error string instead of real content. Sentinels always start with [ and describe a failure. Used by both stream_api (cache-hit guard) and attachment_api (preprocess guard).
truncate_to_budget
Truncate text so it fits within max_tokens, breaking at the last newline before the limit to avoid cutting mid-sentence.