๐ Fetch MCP Server
A powerful Model Context Protocol (MCP) server that provides secure web content fetching with robots.txt compliance, HTML-to-markdown conversion, content truncation, and comprehensive HTTP operations.
โจ Features
- ๐ Web Content Fetching - Retrieve content from any HTTP/HTTPS URL
- ๐ค Robots.txt Compliance - Automatic robots.txt checking for autonomous fetching
- ๐ HTML to Markdown - Intelligent conversion of HTML content to clean markdown
- โ๏ธ Content Truncation - Configurable content length limits with continuation support
- ๐ Raw HTML Mode - Option to retrieve unprocessed HTML content
- ๐ต๏ธ Custom User Agents - Configurable user agent strings for different use cases
- ๐ Proxy Support - HTTP proxy configuration for network environments
- ๐ก๏ธ Security First - Safe URL validation and error handling
- ๐ Flexible Parameters - Configurable max length, start index, and content format
- ๐ฏ Dual Modes - Both tool and prompt interfaces for different use cases
- ๐งน Input Validation - Comprehensive parameter validation and sanitization
- ๐ฏ SOLID Architecture - Clean, maintainable, and testable codebase
๐ Installation & Usage
Install from Crates.io
Run the Server
# Start the MCP server (communicates via stdio)
# Use custom user agent
# Ignore robots.txt restrictions
# Use HTTP proxy
# Enable debug logging
LOG_LEVEL=debug
Test with MCP Inspector
# Install and run the MCP Inspector to test the server
Use with Claude Desktop
Add to your Claude Desktop MCP configuration:
๐ ๏ธ Available Tools
fetch
Fetches a URL from the internet and optionally extracts its contents as markdown. This tool provides internet access capabilities with intelligent content processing.
Parameters:
url
(string): The URL to fetchmax_length
(optional number): Maximum number of characters to return (default: 5000, max: 1,000,000)start_index
(optional number): Starting character index for content extraction (default: 0)raw
(optional boolean): Return raw HTML content without markdown conversion (default: false)
Example Request:
Example Response:
Content Truncation:
When content exceeds the max_length
, the response includes continuation instructions:
Content truncated. Call the fetch tool with a start_index of 5000 to get more content.
Robots.txt Compliance:
The server automatically checks robots.txt for autonomous fetching:
- โ Allowed URLs proceed normally
- โ Disallowed URLs return an error with robots.txt information
- ๐ง Use
--ignore-robots-txt
flag to bypass restrictions
๐ Available Prompts
fetch
Manual URL fetching prompt that retrieves and processes web content for immediate use in conversations.
Parameters:
url
(string): The URL to fetch
Example Usage:
Use the fetch prompt with URL: https://news.example.com/latest
Response:
Returns a prompt message containing the fetched and processed content, ready for use in the conversation context.
๐ง Configuration
Command Line Options
)
Environment Variables
LOG_LEVEL
: Set logging level (trace, debug, info, warn, error)
User Agent Modes
The server uses different user agents depending on the context:
- Autonomous Mode:
ModelContextProtocol/1.0 (Autonomous; +https://github.com/modelcontextprotocol/servers)
- Manual Mode:
ModelContextProtocol/1.0 (User-Specified; +https://github.com/modelcontextprotocol/servers)
- Custom: Your specified user agent string
Security Model
The server implements several security measures:
- URL Validation: All URLs are validated before fetching
- Robots.txt Compliance: Automatic checking for autonomous operations
- Content Limits: Configurable size limits prevent abuse
- Error Sanitization: Safe error messages without sensitive information
- Proxy Support: Secure proxy configuration for network environments
๐ Usage Examples
With Claude Desktop
Once configured, you can ask Claude:
"Fetch the latest news from https://news.example.com"
"Get the content from this documentation page: https://docs.example.com/api"
"Retrieve the raw HTML from https://example.com without markdown conversion"
"Fetch the first 2000 characters from this long article: https://blog.example.com/long-post"
With MCP Inspector
# Test the server interactively
# Try these operations:
# 1. Use fetch tool with different URLs
# 2. Test content truncation with max_length
# 3. Try raw HTML mode
# 4. Test robots.txt compliance
# 5. Use the fetch prompt for immediate content
Advanced Usage Examples
Fetching Large Content in Chunks:
Follow up with:
Raw HTML Extraction:
Custom Configuration:
# Production setup with custom user agent and proxy
๐จ Error Handling
The server provides detailed error messages for common issues:
- Invalid URL: Clear feedback for malformed URLs
- Network Errors: Helpful messages for connection issues
- Robots.txt Violations: Specific guidance about autonomous fetching restrictions
- Content Limits: Information about size restrictions and truncation
- Validation Errors: Specific feedback on parameter validation failures
- Proxy Errors: Clear messages for proxy configuration issues
Example Error Response:
๐งช Testing
Run the comprehensive test suite:
# Run all tests
# Run specific test categories
# Run with coverage
Integration Testing
# Test with real URLs (requires internet)
# Test robots.txt compliance
# Test content processing
๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Development Setup
- Clone the repository
- Install Rust (1.70+ required)
- Run
cargo build
- Run
cargo test
Code Style
This project follows SOLID principles and Domain-Driven Design:
- Clean Architecture: Separation of concerns with clear layers
- Dependency Injection: Testable and maintainable code
- Comprehensive Testing: Unit and integration tests
- Error Handling: Robust error types and handling
Run cargo fmt
and cargo clippy
before submitting.
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- Built with rmcp - Rust MCP implementation
- HTTP client powered by reqwest
- HTML to Markdown conversion via fast_html2md
- URL parsing using url
- Async runtime provided by tokio
๐ Support
- ๐ Documentation
- ๐ Issue Tracker
- ๐ฌ Discussions