twars-url2md
twars-url2md is a fast and robust command-line tool written in Rust that fetches web pages, cleans up their HTML content, and converts them into clean Markdown.
You can drop a text that contains URLs onto the app, and it will find all the URLs and save Markdown versions of the pages in a logical folder structure. The output is not perfect, but the tool is fast and robust.
1. Table of Contents
2. Features
2.1. Powerful Web Content Conversion
- Extracts clean web content using Monolith
- Converts web pages to Markdown efficiently
- Handles complex URL and encoding scenarios
2.2. Smart URL Handling
- Extracts URLs from various text formats
- Resolves and validates URLs intelligently
- Supports base URL and relative link processing
- NEW: Processes local HTML files in addition to remote URLs
2.3. Flexible Input & Output
- Multiple input methods (file, stdin, CLI)
- Organized Markdown file generation
- Cross-platform compatibility
- NEW: Option to pack all Markdown outputs into a single combined file
2.4. Advanced Processing
- Parallel URL processing
- Robust error handling
- Exponential backoff retry mechanism for network requests
3. Installation
3.1. Download Pre-compiled Binaries
The easiest way to get started is to download the pre-compiled binary for your platform.
- Visit the releases page
- Download the appropriate file for your system:
- macOS:
twars-url2md-macos-universal.tar.gz(works on both Intel and Apple Silicon) - Windows:
twars-url2md-windows-x86_64.exe.zip - Linux:
twars-url2md-linux-x86_64.tar.gz
- macOS:
- Extract the archive:
- macOS/Linux:
tar -xzf twars-url2md-*.tar.gz - Windows: Extract the zip file using Explorer or any archive utility
- macOS/Linux:
- Make the binary executable (macOS/Linux only):
chmod +x twars-url2md - Move the binary to a location in your PATH:
- macOS/Linux:
sudo mv twars-url2md /usr/local/bin/ormv twars-url2md ~/.local/bin/ - Windows: Move to a folder in your PATH or add the folder to your PATH
- macOS/Linux:
3.2. Install from Crates.io
If you have Rust installed (version 1.70.0 or later), you can install directly from crates.io:
3.3. Build from Source
For the latest version or to customize the build:
# Clone the repository
# Build and install
4. Usage
4.1. Command Line Options
Usage: twars-url2md [OPTIONS]
Options:
-i, --input <FILE> Input file containing URLs or local file paths (one per line)
-o, --output <DIR> Output directory for markdown files
--stdin Read URLs from standard input
--base-url <URL> Base URL for resolving relative links
-p, --pack <FILE> Output file to pack all markdown files together
-v, --verbose Enable verbose output
-h, --help Print help
-V, --version Print version
4.2. Input Options
The tool accepts URLs and local file paths from:
- A file specified with
--input - Standard input with
--stdin - Note: Either
--inputor--stdinmust be specified
4.3. Output Options
--output <DIR>: Create individual Markdown files in this directory--pack <FILE>: Combine all Markdown files into a single output file- You can use both options together
4.4. Processing Local Files
You can now include local HTML files in your input:
- Absolute paths:
/path/to/file.html - File URLs:
file:///path/to/file.html - Mix of local files and remote URLs in the same input
5. Examples
5.1. Basic Usage
# Process a single URL and print to stdout
|
# Process URLs from a file with specific output directory
# Process piped URLs with base URL for relative links
|
# Show verbose output
5.2. Using the Pack Option
# Process URLs and create a combined Markdown file
# Both individual files and a combined file
5.3. Processing Local Files
# Create a test HTML file
# Process a local HTML file
# Mix local and remote content
5.4. Batch Processing
# Extract and process links from a webpage
|
# Process multiple files
6. Output Organization
The tool organizes output into a directory structure based on the URLs:
output/
├── example.com/
│ ├── index.md # from https://example.com/
│ └── articles/
│ └── page.md # from https://example.com/articles/page
└── another-site.com/
└── post/
└── article.md # from https://another-site.com/post/article
For local files, the directory structure mirrors the file path.
7. Development
7.1. Running Tests
# Run all tests
# Run with specific features
# Run specific test
7.2. Code Quality Tools
- Formatting:
cargo fmt - Linting:
cargo clippy --all-targets --all-features
7.3. Publishing
To publish a new release of twars-url2md:
7.3.1. Prepare for Release
# Update version in Cargo.toml (e.g. from 1.3.6 to 1.3.7)
# Ensure everything works
7.3.2. Build Locally
# Build in release mode
# Test the binary
7.3.3. Publish to Crates.io
# Login to crates.io (if not already logged in)
# Verify the package
# Publish
7.3.4. Create GitHub Release
# Create and push a tag matching your version
The configured GitHub Actions workflow (.github/workflows/ci.yml) will automatically:
- Run tests on the tag
- Create a GitHub Release
- Build binaries for macOS, Windows, and Linux
- Upload the binaries to the release
- Publish to crates.io
7.3.5. Manual Release (Alternative)
If GitHub Actions fails, you can create the release manually:
- Go to GitHub repository → Releases → Create a new release
- Select your tag
- Build platform-specific binaries:
# macOS universal binary
# Linux
# Windows
- Upload these files to your GitHub release
7.3.6. Verify the Release
- Check that the release appears on GitHub
- Verify that binary files are attached to the release
- Confirm the new version appears on crates.io
- Try installing the new version:
cargo install twars-url2md
8. License
MIT License - see LICENSE for details.
9. Author
Adam Twardoch (@twardoch)
For bug reports, feature requests, or general questions, please open an issue on the GitHub repository.