twars-url2md
A command-line tool that converts web pages to Markdown. It fetches HTML content, processes it to remove unnecessary styling, and converts it to clean Markdown format.
Features
- Fetch HTML content from URLs with proper user agent identification
- Remove styles and unnecessary formatting
- Convert HTML to clean Markdown
- Process multiple URLs in parallel
- Smart output path handling based on URL structure
- Support for input from:
- Command line arguments
- Input file (one URL per line)
- Standard input (space or newline separated)
Installation
From crates.io
From Binary Releases
Pre-built binaries are available for Linux, macOS, and Windows on the Releases page.
From source
Usage
# Process a single URL and print to stdout
# Process a single URL and save to file
# Process multiple URLs and save to current directory (creates domain-based folders)
# Process multiple URLs and save to specific directory
# Process URLs from a file
# Read URLs from stdin
|
# Show verbose output
Output Path Structure
For URLs like scheme://username:password@host:port/path?query#fragment:
- Username, password, query parameters, port, and fragments are ignored
- Files are organized in folders based on the host and path
- For URLs ending in
/or with no path,index.mdis used as the filename - For other URLs, the last path component is used as the filename (with
.mdextension)
Example:
$ twars-url2md https://example.com/ https://example.org/foo/bar
Created: example.com/index.md
Created: example.org/foo/bar.md
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Changelog
See CHANGELOG.md for a list of changes.
License
MIT License - see LICENSE for details.
Author
Adam Twardoch (@twardoch)