scrape-cli-0.2.3 is not a library.
scrape-cli
10-50x faster HTML extraction from command line. Rust-powered, shell-friendly.
Installation
Download from GitHub Releases:
# macOS (Apple Silicon)
|
# macOS (Intel)
|
# Linux (x86_64)
|
# Linux (ARM64)
|
[!IMPORTANT] Requires Rust 1.88 or later when building from source.
Quick start
# Extract h1 text from file
# Extract from stdin
|
# Extract links as JSON
Usage
# Extract text content
# Output: Welcome to Our Site
# Extract attribute value
# Output: /home
# /about
# /contact
# First match only
# Output: First paragraph text
# Plain text (default)
# Output: Hello World
# JSON
# Output: ["Link 1","Link 2"]
# Pretty JSON
# HTML fragments
# CSV (requires named selectors)
# Output: name,price
# "Product A","$10.00"
# Extract multiple fields
# Output: {"title":["Page Title"],"links":[...],"images":[...]}
# Process multiple files (parallel by default)
# Output: page1.html: Welcome
# page2.html: About Us
# page3.html: Contact
# Control parallelism
[!TIP] Batch processing uses all CPU cores by default. Use
-j Nto limit threads.
# NUL delimiter for xargs
|
# Suppress errors
# Disable filename prefix
Options
| Option | Short | Description |
|---|---|---|
--output FORMAT |
-o |
Output format: text, json, html, csv |
--select NAME=SEL |
-s |
Named selector extraction |
--attribute ATTR |
-a |
Extract attribute instead of text |
--first |
-1 |
Return only first match |
--pretty |
-p |
Pretty-print JSON output |
--null |
-0 |
Use NUL delimiter (for xargs) |
--color MODE |
-c |
Colorize: auto, always, never |
--parallel N |
-j |
Parallel threads for batch |
--quiet |
-q |
Suppress error messages |
--with-filename |
-H |
Always show filename prefix |
--no-filename |
Never show filename prefix |
Performance
Performance improvements:
- SIMD-accelerated parsing — 2-10x faster class selector matching on large documents
- Batch parallelization — Scales near-linearly with thread count when processing multiple files
- Zero-copy serialization — 50-70% memory reduction in output generation
Exit codes
| Code | Meaning |
|---|---|
| 0 | Success, matches found |
| 1 | No matches found |
| 2 | Runtime error (invalid selector, I/O error) |
| 4 | Argument validation error |
Built on Servo and Cloudflare
Parsing & Selection (Servo browser engine):
Streaming Parser (Cloudflare):
- lol_html — High-performance streaming HTML parser with constant-memory event-driven API
Related packages
| Platform | Package |
|---|---|
| Rust | scrape-core |
| Python | fast-scrape |
| Node.js | @fast-scrape/node |
| WASM | @fast-scrape/wasm |
License
MIT OR Apache-2.0