fetchkit-cli-0.1.1 is not a library.
fetchkit
AI-friendly web content fetching tool designed for LLM consumption. Rust library with CLI, MCP server, and Python bindings.
Features
- HTTP fetching - GET and HEAD methods with streaming support
- HTML-to-Markdown - Built-in conversion optimized for LLMs
- HTML-to-Text - Plain text extraction with clean formatting
- Binary detection - Returns metadata only for images, PDFs, etc.
- Timeout handling - 1s first-byte, 30s body with partial content on timeout
- URL filtering - Allow/block lists for controlled access
- MCP server - Model Context Protocol support for AI tool integration
Installation
From Git (recommended)
From Source
CLI Usage
# Fetch URL (outputs markdown with frontmatter)
# Output as JSON instead
# Custom user agent
# Show full documentation
Default output is markdown with YAML frontmatter:
url: https://example.com
status_code: 200
source_content_type: text/html; charset=UTF-8
source_size: 1256
This domain is for use in illustrative examples in documents...
JSON output (-o json):
MCP Server
Run as a Model Context Protocol server:
Exposes fetchkit tool over JSON-RPC 2.0 stdio transport. Returns markdown with frontmatter (same format as CLI). Compatible with Claude Desktop and other MCP clients.
Library Usage
Add to Cargo.toml:
[]
= { = "https://github.com/everruns/fetchkit" }
Basic Fetch
use ;
async
With Tool Builder
use ;
let tool = new
.enable_markdown
.enable_text
.user_agent
.allow_prefix
.block_prefix
.build;
let request = new;
let response = tool.execute.await.unwrap;
Python Bindings
# Simple fetch
=
# With configuration
=
=
Response Fields
| Field | Type | Description |
|---|---|---|
url |
string | Fetched URL |
status_code |
int | HTTP status code |
content_type |
string? | Content-Type header |
size |
int? | Content size in bytes |
last_modified |
string? | Last-Modified header |
filename |
string? | From Content-Disposition |
format |
string? | "markdown", "text", "raw", or "github_repo" |
content |
string? | Page content |
truncated |
bool? | True if content was cut off |
method |
string? | "HEAD" for HEAD requests |
error |
string? | Error message if failed |
Error Handling
Errors are returned in the error field:
InvalidUrl- Malformed URLUrlBlocked- URL blocked by filterNetworkError- Connection failedTimeout- Request timed outHttpError- 4xx/5xx responseContentError- Failed to read bodyBinaryContent- Binary content not supported
Configuration
Timeouts
- First-byte: 1 second (connect + initial response)
- Body: 30 seconds total
Partial content is returned on body timeout with truncated: true.
Binary Content
Automatically detected and returns metadata only for:
- Images, audio, video, fonts
- PDFs, archives (zip, tar, rar, 7z)
- Office documents
HTML Conversion
HTML is automatically converted to markdown:
- Headers:
h1-h6→#to###### - Lists: Proper nesting with 2-space indent
- Code: Fenced blocks and inline backticks
- Links:
[text](url)format - Strips: scripts, styles, iframes, SVGs
License
MIT