markdowndown
A Rust library for converting URLs to markdown with intelligent handling of different URL types.
Features
- 🌐 Universal URL Support: Convert any web page to clean markdown
- 📝 Smart Conversion: Specialized handlers for Google Docs, Office 365, GitHub Issues
- 🔧 Configurable: Flexible configuration for different use cases
- 🚀 Fast & Reliable: Built with performance and reliability in mind
- 📊 Rich Metadata: YAML frontmatter with source URL, date, and processing info
- 🔄 Async Support: Full async/await support with tokio
- 🛡️ Robust Error Handling: Comprehensive error types with recovery strategies
Quick Start
Simple Usage
use convert_url;
async
With Configuration
use ;
async
Supported URL Types
| URL Type | Example | Features |
|---|---|---|
| HTML Pages | https://example.com/article |
Clean HTML to markdown conversion |
| Google Docs | https://docs.google.com/document/d/{id}/edit |
Direct markdown export |
| Office 365 | https://company.sharepoint.com/.../document.docx |
Document download and conversion |
| GitHub Issues | https://github.com/owner/repo/issues/123 |
Issue + comments via API |
API Overview
Main Functions
convert_url(url)- Convert any URL to markdown with default configurationconvert_url_with_config(url, config)- Convert with custom configurationdetect_url_type(url)- Determine URL type without conversion
Core Types
MarkdownDown- Main library struct with configurationConfig- Configuration builder for customizing behaviorMarkdown- Validated markdown content wrapperMarkdownError- Comprehensive error handling
Configuration Options
let config = builder
// Authentication
.github_token
.office365_token
.google_api_key
// HTTP Settings
.timeout_seconds
.user_agent
.max_retries
// Output Options
.include_frontmatter
.custom_frontmatter_field
.max_consecutive_blank_lines
.build;
Error Handling
The library provides comprehensive error handling with specific error types:
use ;
match convert_url.await
Performance
Typical conversion times on modern hardware:
| URL Type | Small Document | Medium Document | Large Document |
|---|---|---|---|
| HTML Page | < 1s | 1-3s | 3-10s |
| Google Docs | < 2s | 2-5s | 5-15s |
| GitHub Issue | < 1s | 1-2s | 2-5s |
| Office 365 | 2-5s | 5-15s | 15-60s |
Note: Performance metrics are hardware and network dependent. Actual conversion times may vary based on your system specifications, network connectivity, and document complexity.
Memory usage scales linearly with document size. Network latency is typically the limiting factor.
Examples
The repository includes comprehensive examples in the examples/ directory:
basic_usage.rs- Simple URL conversionwith_configuration.rs- Custom configuration usagebatch_processing.rs- Converting multiple URLsasync_usage.rs- Async/await patternserror_handling.rs- Comprehensive error handling
Run examples with:
Environment Configuration
The library can be configured via environment variables:
Then use:
let config = from_env;
let md = with_config;
Documentation
- Getting Started Guide - Installation and first steps
- Configuration Reference - All configuration options
- URL Types Guide - Supported URL types and specifics
- Error Handling Guide - Error types and recovery
- Performance Guide - Optimization tips and benchmarks
- Troubleshooting - Common issues and solutions
- API Reference - Complete API documentation
Development
Prerequisites
- Rust 1.70+ (2021 edition)
- Cargo
Building
# Check the project
# Build the project
# Run tests
# Run integration tests
# Generate documentation
# Run benchmarks
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines on:
- Development environment setup
- Code style and standards
- Testing requirements
- Submitting pull requests
License
MIT License - see LICENSE file for details.
Changelog
See CHANGELOG.md for version history and breaking changes.
Support
- Issues: GitHub Issues
- Documentation: docs.rs/markdowndown
- Examples: See the
examples/directory