Crate webfetch

Expand description

webfetch — token-efficient web content fetcher.

The defining feature is reference-style URL preservation: instead of stripping links to their domain (losing the ability to cite or follow them) or expanding full URLs inline (wasting tokens), links are replaced with compact [N] markers and collected into a recoverable reference list.

Re-exports§

pub use fetch::fetch_page;

Modules§

compress
convert: Output dispatcher: routes an HTML document to the requested format.
extract
fetch
guard: SSRF guard for the fetch path.
media: Decide how to treat a fetched body. The HTML extractor only makes sense for HTML; running it over a JSON API response, a raw .txt, or a Markdown file would mangle or drop the content. We classify by Content-Type when present, and sniff the body otherwise.
refs: Shared reference-style URL preservation.
types

Functions§

convert_body: Convert a fetched body to a FetchResult, choosing how to treat it based on its Content-Type (or a sniff of the body). HTML is extracted; JSON is pretty-printed; other text is passed through verbatim; binary is summarized.
convert_html: Convert already-fetched HTML into a FetchResult without any network I/O.
fetch_and_convert: Fetch a URL and convert it according to options.
parse_content_type: Parse a content-type string (“text” | “markdown” | “structured”).

Crate webfetch

Crate webfetch Copy item path

Re-exports§

Modules§

Functions§

Crate webfetch