Skip to main content

Module pdf

Module pdf 

Source
Expand description

PDF → text extraction for the research context layer.

Delegates to the pdf-extract crate. Because PDF parsers can panic on malformed or unusual input — and ctx_url_read accepts arbitrary agent-supplied URLs — extraction is wrapped in std::panic::catch_unwind so a bad document yields an error instead of taking down the handler.

Functions§

extract_text
Extract and normalize the text content of a PDF byte buffer.
looks_like_pdf
PDFs start with %PDF- (optionally after a small BOM/whitespace preamble).