scrape-core
High-performance HTML parsing library core. Pure Rust implementation with no FFI dependencies.
Installation
[]
= "0.1"
Or with cargo:
[!IMPORTANT] Requires Rust 1.88 or later.
Usage
use Soup;
let html = r#"
<html>
<body>
<div class="content">Hello, World!</div>
<div class="content">Another div</div>
</body>
</html>
"#;
let soup = new;
// Find first element by tag
if let Some = soup.find
// CSS selectors
for el in soup.select
Features
Enable optional features in Cargo.toml:
[]
= { = "0.1", = ["simd", "parallel"] }
| Feature | Description | Default |
|---|---|---|
simd |
SIMD-accelerated byte scanning (SSE4.2, AVX2, NEON, WASM SIMD128) | No |
parallel |
Parallel batch processing via Rayon | No |
[!TIP] Start with default features for fastest compile times. Add
simdfor production workloads.
Performance
Optimized for high throughput:
- Arena-based DOM allocation (cache-friendly, zero per-node heap allocations)
- SIMD-accelerated byte scanning when
simdfeature is enabled - Parallel batch processing via Rayon when
parallelfeature is enabled
Benchmarks show 10x faster parsing and up to 132x faster queries compared to BeautifulSoup. See full benchmark results in the main project README.
Architecture
scrape-core/
├── dom/ # Arena-based DOM representation
├── parser/ # html5ever integration
├── query/ # CSS selector engine
├── simd/ # Platform-specific SIMD acceleration
└── parallel/ # Rayon-based parallelization
MSRV policy
Minimum Supported Rust Version: 1.88. MSRV increases are minor version bumps.
Related packages
This crate is part of fast-scrape:
| Platform | Package |
|---|---|
| Python | fast-scrape |
| Node.js | @fast-scrape/node |
| WASM | @fast-scrape/wasm |
License
Licensed under either of Apache License, Version 2.0 or MIT License at your option.