scrape-core 0.1.1

High-performance HTML parsing library core
Documentation

scrape-core

Crates.io docs.rs MSRV License

High-performance HTML parsing library core. Pure Rust implementation with no FFI dependencies.

Installation

[dependencies]
scrape-core = "0.1"

Or with cargo:

cargo add scrape-core

[!IMPORTANT] Requires Rust 1.88 or later.

Usage

use scrape_core::Soup;

let html = r#"
    <html>
        <body>
            <div class="content">Hello, World!</div>
            <div class="content">Another div</div>
        </body>
    </html>
"#;

let soup = Soup::new(html);

// Find first element by tag
if let Some(div) = soup.find("div") {
    println!("Text: {}", div.text());
}

// CSS selectors
for el in soup.select("div.content") {
    println!("{}", el.inner_html());
}

Features

Enable optional features in Cargo.toml:

[dependencies]
scrape-core = { version = "0.1", features = ["simd", "parallel"] }
Feature Description Default
simd SIMD-accelerated byte scanning (SSE4.2, AVX2, NEON, WASM SIMD128) No
parallel Parallel batch processing via Rayon No

[!TIP] Start with default features for fastest compile times. Add simd for production workloads.

Performance

Optimized for high throughput:

  • Arena-based DOM allocation (cache-friendly, zero per-node heap allocations)
  • SIMD-accelerated byte scanning when simd feature is enabled
  • Parallel batch processing via Rayon when parallel feature is enabled

Benchmarks show 10x faster parsing and up to 132x faster queries compared to BeautifulSoup. See full benchmark results in the main project README.

Architecture

scrape-core/
├── dom/       # Arena-based DOM representation
├── parser/    # html5ever integration
├── query/     # CSS selector engine
├── simd/      # Platform-specific SIMD acceleration
└── parallel/  # Rayon-based parallelization

MSRV policy

Minimum Supported Rust Version: 1.88. MSRV increases are minor version bumps.

Related packages

This crate is part of fast-scrape:

Platform Package
Python fast-scrape
Node.js @fast-scrape/node
WASM @fast-scrape/wasm

License

Licensed under either of Apache License, Version 2.0 or MIT License at your option.