Skip to main content

Crate webfetch

Crate webfetch 

Source
Expand description

webfetch — token-efficient web content fetcher.

The defining feature is reference-style URL preservation: instead of stripping links to their domain (losing the ability to cite or follow them) or expanding full URLs inline (wasting tokens), links are replaced with compact [N] markers and collected into a recoverable reference list.

Re-exports§

pub use fetch::fetch_page;

Modules§

compress
convert
Output dispatcher: routes an HTML document to the requested format.
extract
fetch
guard
SSRF guard for the fetch path.
media
Decide how to treat a fetched body. The HTML extractor only makes sense for HTML; running it over a JSON API response, a raw .txt, or a Markdown file would mangle or drop the content. We classify by Content-Type when present, and sniff the body otherwise.
refs
Shared reference-style URL preservation.
types

Functions§

convert_body
Convert a fetched body to a FetchResult, choosing how to treat it based on its Content-Type (or a sniff of the body). HTML is extracted; JSON is pretty-printed; other text is passed through verbatim; binary is summarized.
convert_html
Convert already-fetched HTML into a FetchResult without any network I/O.
fetch_and_convert
Fetch a URL and convert it according to options.
parse_content_type
Parse a content-type string (“text” | “markdown” | “structured”).