lean_ctx::core::web

Module pdf

Expand description

PDF → text extraction for the research context layer.

Delegates to the pdf-extract crate. Because PDF parsers can panic on malformed or unusual input — and ctx_url_read accepts arbitrary agent-supplied URLs — extraction is wrapped in std::panic::catch_unwind so a bad document yields an error instead of taking down the handler.

Functions§

extract_text: Extract and normalize the text content of a PDF byte buffer.
looks_like_pdf: PDFs start with %PDF- (optionally after a small BOM/whitespace preamble).

Module pdf

Module pdf Copy item path

Functions§

Module pdf