crawn-0.1.1 is not a library.
A utility for web crawling and scraping
Features
- Blazing fast – Built with Rust & tokio for async I/O
- Smart filtering – URL-based keyword matching (no content fetching required)
- NDJSON output – One JSON object per line for easy streaming
- BFS crawling – Depth-first traversal with configurable depth limits
- Rate limiting – Configurable request rate (default: 2 - 5 req/sec)
- Error recovery – Gracefully handles network errors and broken links
- Rich logging – Colored, timestamped logs with context chains
Installation
Run this command (requires cargo):
- Or build from source (requires cargo):
Usage
- Basic Crawling:
- With Logging:
- Verbose Mode (Log All Requests):
- Custom Depth Limit:
- Full HTML:
- Extracted text only:
Output Format
Results are written as NDJSON (newline-delimited JSON):
- With
--include-text:
- With
--include-content:
Logging
Log Levels:
- INFO (verbose mode only): Request logs
- WARN (always): Recoverable errors (404, network timeouts)
- FATAL (always): Unrecoverable errors (invalid URL, disk full)
Log Format:
2026-01-24 02:37:40.351 [INFO]:
Sent request to URL: https://example.com
2026-01-24 02:37:41.123 [WARN]:
Failed to fetch URL: https://example.com/broken-link
Caused by: HTTP 404 Not Found
Examples
- Crawl Documentation Site:
- Crawl with Logging:
- Limit to 2 Levels Deep:
Limitations
- Same-domain only (no external links, by design)
- No JavaScript rendering (static HTML only)
- No authentication (public pages only)
Notes
- crawn is licensed under the MIT license.
- For specifics about contributing to crawn, see CONTRIBUTING.md.
- For new changes to crawn, see CHANGELOG.md.