# ๐ Fetch MCP Server
[](https://crates.io/crates/mcp-server-fetch)
[](https://docs.rs/mcp-server-fetch)
[](https://opensource.org/licenses/MIT)
A powerful **Model Context Protocol (MCP) server** that provides secure web content fetching with **robots.txt compliance**, HTML-to-markdown conversion, content truncation, and comprehensive HTTP operations.
## โจ Features
- ๐ **Web Content Fetching** - Retrieve content from any HTTP/HTTPS URL
- ๐ค **Robots.txt Compliance** - Automatic robots.txt checking for autonomous fetching
- ๐ **HTML to Markdown** - Intelligent conversion of HTML content to clean markdown
- โ๏ธ **Content Truncation** - Configurable content length limits with continuation support
- ๐ **Raw HTML Mode** - Option to retrieve unprocessed HTML content
- ๐ต๏ธ **Custom User Agents** - Configurable user agent strings for different use cases
- ๐ **Proxy Support** - HTTP proxy configuration for network environments
- ๐ก๏ธ **Security First** - Safe URL validation and error handling
- ๐ **Flexible Parameters** - Configurable max length, start index, and content format
- ๐ฏ **Dual Modes** - Both tool and prompt interfaces for different use cases
- ๐งน **Input Validation** - Comprehensive parameter validation and sanitization
- ๐ฏ **SOLID Architecture** - Clean, maintainable, and testable codebase
## ๐ Installation & Usage
### Install from Crates.io
```bash
cargo install mcp-server-fetch
```
### Run the Server
```bash
# Start the MCP server (communicates via stdio)
mcp-server-fetch
# Use custom user agent
mcp-server-fetch --user-agent "MyApp/1.0"
# Ignore robots.txt restrictions
mcp-server-fetch --ignore-robots-txt
# Use HTTP proxy
mcp-server-fetch --proxy-url "http://proxy.example.com:8080"
# Enable debug logging
LOG_LEVEL=debug mcp-server-fetch
```
### Test with MCP Inspector
```bash
# Install and run the MCP Inspector to test the server
npx @modelcontextprotocol/inspector mcp-server-fetch
```
### Use with Claude Desktop
Add to your Claude Desktop MCP configuration:
```json
{
"mcpServers": {
"fetch": {
"command": "mcp-server-fetch",
"args": ["--user-agent", "Claude-Desktop/1.0"]
}
}
}
```
## ๐ ๏ธ Available Tools
### `fetch`
Fetches a URL from the internet and optionally extracts its contents as markdown. This tool provides internet access capabilities with intelligent content processing.
**Parameters:**
- `url` (string): The URL to fetch
- `max_length` (optional number): Maximum number of characters to return (default: 5000, max: 1,000,000)
- `start_index` (optional number): Starting character index for content extraction (default: 0)
- `raw` (optional boolean): Return raw HTML content without markdown conversion (default: false)
**Example Request:**
```json
{
"url": "https://example.com/article",
"max_length": 10000,
"start_index": 0,
"raw": false
}
```
**Example Response:**
```json
{
"content": [
{
"type": "text",
"text": "Contents of https://example.com/article:\n\n# Article Title\n\nThis is the converted markdown content..."
}
]
}
```
**Content Truncation:**
When content exceeds the `max_length`, the response includes continuation instructions:
```
Content truncated. Call the fetch tool with a start_index of 5000 to get more content.
```
**Robots.txt Compliance:**
The server automatically checks robots.txt for autonomous fetching:
- โ
Allowed URLs proceed normally
- โ Disallowed URLs return an error with robots.txt information
- ๐ง Use `--ignore-robots-txt` flag to bypass restrictions
## ๐ Available Prompts
### `fetch`
Manual URL fetching prompt that retrieves and processes web content for immediate use in conversations.
**Parameters:**
- `url` (string): The URL to fetch
**Example Usage:**
```
Use the fetch prompt with URL: https://news.example.com/latest
```
**Response:**
Returns a prompt message containing the fetched and processed content, ready for use in the conversation context.
## ๐ง Configuration
### Command Line Options
```bash
mcp-server-fetch [OPTIONS]
Options:
--user-agent <USER_AGENT> Custom User-Agent string to use for requests
--ignore-robots-txt Ignore robots.txt restrictions
--proxy-url <PROXY_URL> Proxy URL to use for requests (e.g., http://proxy:8080)
-h, --help Print help information
-V, --version Print version information
```
### Environment Variables
- `LOG_LEVEL`: Set logging level (trace, debug, info, warn, error)
### User Agent Modes
The server uses different user agents depending on the context:
- **Autonomous Mode**: `ModelContextProtocol/1.0 (Autonomous; +https://github.com/modelcontextprotocol/servers)`
- **Manual Mode**: `ModelContextProtocol/1.0 (User-Specified; +https://github.com/modelcontextprotocol/servers)`
- **Custom**: Your specified user agent string
### Security Model
The server implements several security measures:
- **URL Validation**: All URLs are validated before fetching
- **Robots.txt Compliance**: Automatic checking for autonomous operations
- **Content Limits**: Configurable size limits prevent abuse
- **Error Sanitization**: Safe error messages without sensitive information
- **Proxy Support**: Secure proxy configuration for network environments
## ๐ Usage Examples
### With Claude Desktop
Once configured, you can ask Claude:
> "Fetch the latest news from https://news.example.com"
> "Get the content from this documentation page: https://docs.example.com/api"
> "Retrieve the raw HTML from https://example.com without markdown conversion"
> "Fetch the first 2000 characters from this long article: https://blog.example.com/long-post"
### With MCP Inspector
```bash
# Test the server interactively
npx @modelcontextprotocol/inspector mcp-server-fetch
# Try these operations:
# 1. Use fetch tool with different URLs
# 2. Test content truncation with max_length
# 3. Try raw HTML mode
# 4. Test robots.txt compliance
# 5. Use the fetch prompt for immediate content
```
### Advanced Usage Examples
**Fetching Large Content in Chunks:**
```json
{
"url": "https://example.com/large-document",
"max_length": 5000,
"start_index": 0
}
```
Follow up with:
```json
{
"url": "https://example.com/large-document",
"max_length": 5000,
"start_index": 5000
}
```
**Raw HTML Extraction:**
```json
{
"url": "https://example.com/complex-page",
"raw": true,
"max_length": 10000
}
```
**Custom Configuration:**
```bash
# Production setup with custom user agent and proxy
mcp-server-fetch \
--user-agent "MyCompany-AI/2.0 (+https://mycompany.com/bot)" \
--proxy-url "http://corporate-proxy:8080"
```
## ๐จ Error Handling
The server provides detailed error messages for common issues:
- **Invalid URL**: Clear feedback for malformed URLs
- **Network Errors**: Helpful messages for connection issues
- **Robots.txt Violations**: Specific guidance about autonomous fetching restrictions
- **Content Limits**: Information about size restrictions and truncation
- **Validation Errors**: Specific feedback on parameter validation failures
- **Proxy Errors**: Clear messages for proxy configuration issues
**Example Error Response:**
```json
{
"error": {
"code": -32602,
"message": "Robots.txt disallows autonomous fetching of this URL",
"data": {
"url": "https://example.com/restricted",
"robots_txt_url": "https://example.com/robots.txt",
"user_agent": "ModelContextProtocol/1.0 (Autonomous)"
}
}
}
```
## ๐งช Testing
Run the comprehensive test suite:
```bash
# Run all tests
cargo test
# Run specific test categories
cargo test fetch_service
cargo test validation
cargo test server
# Run with coverage
cargo tarpaulin --out html
```
### Integration Testing
```bash
# Test with real URLs (requires internet)
cargo test --features integration-tests
# Test robots.txt compliance
cargo test robots_txt_tests
# Test content processing
cargo test content_processing
```
## ๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
### Development Setup
1. Clone the repository
2. Install Rust (1.70+ required)
3. Run `cargo build`
4. Run `cargo test`
### Code Style
This project follows SOLID principles and Domain-Driven Design:
- **Clean Architecture**: Separation of concerns with clear layers
- **Dependency Injection**: Testable and maintainable code
- **Comprehensive Testing**: Unit and integration tests
- **Error Handling**: Robust error types and handling
Run `cargo fmt` and `cargo clippy` before submitting.
## ๐ License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## ๐ Acknowledgments
- Built with [rmcp](https://crates.io/crates/rmcp) - Rust MCP implementation
- HTTP client powered by [reqwest](https://crates.io/crates/reqwest)
- HTML to Markdown conversion via [fast_html2md](https://crates.io/crates/fast_html2md)
- URL parsing using [url](https://crates.io/crates/url)
- Async runtime provided by [tokio](https://crates.io/crates/tokio)
## ๐ Support
- ๐ [Documentation](https://docs.rs/mcp-server-fetch)
- ๐ [Issue Tracker](https://github.com/sabry-awad97/rust-mcp-servers/issues)
- ๐ฌ [Discussions](https://github.com/sabry-awad97/rust-mcp-servers/discussions)
---
<div align="center">
<strong>Made with โค๏ธ for secure web content fetching in the MCP ecosystem</strong>
</div>