Crate fetchkit

Expand description

FetchKit - AI-friendly web content fetching library

This crate provides a reusable library API for fetching web content, with optional HTML to markdown/text conversion optimized for LLM consumption.

§Quick Start

use fetchkit::{FetchRequest, fetch};

let request = FetchRequest::new("https://example.com").as_markdown();
let response = fetch(request).await?;
println!("Content: {}", response.content.unwrap_or_default());

§Tool Builder

For more control, use the ToolBuilder to configure options:

use fetchkit::{FetchRequest, ToolBuilder};

let tool = ToolBuilder::new()
    .enable_markdown(true)
    .user_agent("MyBot/1.0")
    .block_prefix("https://blocked.example.com")
    .build();

let request = FetchRequest::new("https://example.com");
let response = tool.execute(request).await?;

§HTML Conversion

Convert HTML to markdown or plain text directly:

use fetchkit::{html_to_markdown, html_to_text};

let html = "<h1>Hello</h1><p>World</p>";
let md = html_to_markdown(html);
assert!(md.contains("# Hello"));

let text = html_to_text(html);
assert!(text.contains("Hello"));

§Fetcher System

FetchKit uses a pluggable fetcher system where specialized fetchers handle specific URL patterns. The FetcherRegistry dispatches requests to the appropriate fetcher based on URL matching.

Built-in fetchers:

ArXivFetcher - arXiv paper metadata and abstract
DefaultFetcher - General HTTP/HTTPS fetcher with HTML conversion
DocsSiteFetcher - llms.txt probe with DefaultFetcher fallback
GitHubCodeFetcher - GitHub source file content with language metadata
HackerNewsFetcher - Hacker News thread content via Firebase API
GitHubIssueFetcher - GitHub issue and PR metadata with comments
GitHubRepoFetcher - GitHub repository metadata and README
PackageRegistryFetcher - PyPI, crates.io, npm package metadata
RSSFeedFetcher - RSS/Atom feed parsing
StackOverflowFetcher - Stack Overflow Q&A content
TwitterFetcher - Twitter/X tweet content with article metadata
WikipediaFetcher - Wikipedia article content via MediaWiki API
YouTubeFetcher - YouTube video metadata via oEmbed

Re-exports§

pub use client::batch_fetch;
pub use client::batch_fetch_with_options;
pub use client::fetch;
pub use client::fetch_with_options;
pub use client::FetchOptions;
pub use fetchers::ArXivFetcher;
pub use fetchers::DefaultFetcher;
pub use fetchers::DocsSiteFetcher;
pub use fetchers::Fetcher;
pub use fetchers::FetcherRegistry;
pub use fetchers::GitHubCodeFetcher;
pub use fetchers::GitHubIssueFetcher;
pub use fetchers::GitHubRepoFetcher;
pub use fetchers::HackerNewsFetcher;
pub use fetchers::PackageRegistryFetcher;
pub use fetchers::RSSFeedFetcher;
pub use fetchers::StackOverflowFetcher;
pub use fetchers::TwitterFetcher;
pub use fetchers::WikipediaFetcher;
pub use fetchers::YouTubeFetcher;
pub use file_saver::FileSaveError;
pub use file_saver::FileSaver;
pub use file_saver::LocalFileSaver;
pub use file_saver::SaveResult;

Modules§

client: HTTP client for FetchKit
fetchers: Fetcher system for specialized content fetching
file_saver: File saving abstractions for FetchKit

Structs§

DnsPolicy: Policy for DNS resolution and IP validation
FetchRequest: Request to fetch a URL
FetchResponse: Response from a fetch operation
PageLink: A link extracted from the page with its text and href.
PageMetadata: Structured metadata extracted from an HTML page.
Tool: Configured FetchKit tool
ToolBuilder: Builder for configuring the FetchKit tool
ToolExecution: Single-use runtime execution for one tool call.
ToolImage: Output image returned by the toolkit-library contract.
ToolOutput: Structured tool output for the toolkit-library contract.
ToolOutputMetadata: Consumer-only metadata returned by the toolkit-library contract.
ToolService: Generic JSON args → JSON result service.
ToolStatus: Status update during tool execution

Enums§

FetchError: Errors that can occur during fetch operations
HttpMethod: HTTP method for the request
ToolError: Errors returned by the toolkit-library contract surface.

Constants§

DEFAULT_USER_AGENT: Default User-Agent string
TOOL_DESCRIPTION: Backward-compatible full description string with file-saving enabled.

Statics§

TOOL_LLMTXT: Backward-compatible help document with file-saving enabled.

Functions§

extract_headings: Second pass specifically for heading extraction (cheap — headings are sparse). Called after the main metadata extraction to keep the main function clean.
extract_metadata: Extract structured metadata from HTML in a single pass.
html_to_markdown: Convert HTML to markdown
html_to_text: Convert HTML to plain text
strip_boilerplate: Strip boilerplate elements from HTML, keeping only main content.

Crate fetchkit

Crate fetchkit Copy item path

§Quick Start

§Tool Builder

§HTML Conversion

§Fetcher System

Re-exports§

Modules§

Structs§

Enums§

Constants§

Statics§

Functions§

Crate fetchkit