Expand description
FetchKit - AI-friendly web content fetching library
This crate provides a reusable library API for fetching web content, with optional HTML to markdown/text conversion optimized for LLM consumption.
§Quick Start
use fetchkit::{FetchRequest, fetch};
let request = FetchRequest::new("https://example.com").as_markdown();
let response = fetch(request).await?;
println!("Content: {}", response.content.unwrap_or_default());§Tool Builder
For more control, use the ToolBuilder to configure options:
use fetchkit::{FetchRequest, ToolBuilder};
let tool = ToolBuilder::new()
.enable_markdown(true)
.user_agent("MyBot/1.0")
.block_prefix("https://blocked.example.com")
.build();
let request = FetchRequest::new("https://example.com");
let response = tool.execute(request).await?;§HTML Conversion
Convert HTML to markdown or plain text directly:
use fetchkit::{html_to_markdown, html_to_text};
let html = "<h1>Hello</h1><p>World</p>";
let md = html_to_markdown(html);
assert!(md.contains("# Hello"));
let text = html_to_text(html);
assert!(text.contains("Hello"));§Fetcher System
FetchKit uses a pluggable fetcher system where specialized fetchers
handle specific URL patterns. The FetcherRegistry dispatches
requests to the appropriate fetcher based on URL matching.
Built-in fetchers:
ArXivFetcher- arXiv paper metadata and abstractDefaultFetcher- General HTTP/HTTPS fetcher with HTML conversionDocsSiteFetcher- llms.txt probe with DefaultFetcher fallbackGitHubCodeFetcher- GitHub source file content with language metadataHackerNewsFetcher- Hacker News thread content via Firebase APIGitHubIssueFetcher- GitHub issue and PR metadata with commentsGitHubRepoFetcher- GitHub repository metadata and READMEPackageRegistryFetcher- PyPI, crates.io, npm package metadataRSSFeedFetcher- RSS/Atom feed parsingStackOverflowFetcher- Stack Overflow Q&A contentTwitterFetcher- Twitter/X tweet content with article metadataWikipediaFetcher- Wikipedia article content via MediaWiki APIYouTubeFetcher- YouTube video metadata via oEmbed
Re-exports§
pub use client::batch_fetch;pub use client::batch_fetch_with_options;pub use client::fetch;pub use client::fetch_with_options;pub use client::FetchOptions;pub use fetchers::ArXivFetcher;pub use fetchers::DefaultFetcher;pub use fetchers::DocsSiteFetcher;pub use fetchers::Fetcher;pub use fetchers::FetcherRegistry;pub use fetchers::GitHubCodeFetcher;pub use fetchers::GitHubIssueFetcher;pub use fetchers::GitHubRepoFetcher;pub use fetchers::HackerNewsFetcher;pub use fetchers::PackageRegistryFetcher;pub use fetchers::RSSFeedFetcher;pub use fetchers::StackOverflowFetcher;pub use fetchers::TwitterFetcher;pub use fetchers::WikipediaFetcher;pub use fetchers::YouTubeFetcher;pub use file_saver::FileSaveError;pub use file_saver::FileSaver;pub use file_saver::LocalFileSaver;pub use file_saver::SaveResult;
Modules§
- client
- HTTP client for FetchKit
- fetchers
- Fetcher system for specialized content fetching
- file_
saver - File saving abstractions for FetchKit
Structs§
- DnsPolicy
- Policy for DNS resolution and IP validation
- Fetch
Request - Request to fetch a URL
- Fetch
Response - Response from a fetch operation
- Page
Link - A link extracted from the page with its text and href.
- Page
Metadata - Structured metadata extracted from an HTML page.
- Tool
- Configured FetchKit tool
- Tool
Builder - Builder for configuring the FetchKit tool
- Tool
Execution - Single-use runtime execution for one tool call.
- Tool
Image - Output image returned by the toolkit-library contract.
- Tool
Output - Structured tool output for the toolkit-library contract.
- Tool
Output Metadata - Consumer-only metadata returned by the toolkit-library contract.
- Tool
Service - Generic JSON args → JSON result service.
- Tool
Status - Status update during tool execution
Enums§
- Fetch
Error - Errors that can occur during fetch operations
- Http
Method - HTTP method for the request
- Tool
Error - Errors returned by the toolkit-library contract surface.
Constants§
- DEFAULT_
USER_ AGENT - Default User-Agent string
- TOOL_
DESCRIPTION - Backward-compatible full description string with file-saving enabled.
Statics§
- TOOL_
LLMTXT - Backward-compatible help document with file-saving enabled.
Functions§
- extract_
headings - Second pass specifically for heading extraction (cheap — headings are sparse). Called after the main metadata extraction to keep the main function clean.
- extract_
metadata - Extract structured metadata from HTML in a single pass.
- html_
to_ markdown - Convert HTML to markdown
- html_
to_ text - Convert HTML to plain text
- strip_
boilerplate - Strip boilerplate elements from HTML, keeping only main content.