crw-renderer 0.0.10

HTTP and CDP browser rendering engine for the CRW web scraper
Documentation

crw-renderer

HTTP and headless-browser rendering engine for the CRW web scraper.

crates.io docs.rs license

Overview

crw-renderer fetches web pages via plain HTTP and optionally re-renders them through a CDP-based headless browser when SPA content is detected.

  • FallbackRenderer — Composite renderer: tries HTTP first, falls back to JS rendering when the page looks like a SPA shell
  • HttpFetcher — Fast reqwest-based HTTP fetcher with stealth headers, gzip/brotli decompression, and proxy support
  • SPA detection — Heuristic analysis of the HTML response (empty body, framework markers like __NEXT_DATA__, ng-app, nuxt) to auto-detect pages that need JS rendering
  • CDP rendering — Chrome DevTools Protocol support for LightPanda, Playwright, and Chrome (requires cdp feature)
  • Stealth mode — User-Agent rotation from a built-in Chrome/Firefox/Safari pool and browser-like header injection

Installation

cargo add crw-renderer

With CDP (headless browser) support:

cargo add crw-renderer --features cdp

Feature flags

Flag Default Description
cdp off Enables CDP WebSocket rendering via tokio-tungstenite (LightPanda, Playwright, Chrome)

Usage

Basic HTTP fetching

use crw_core::config::{RendererConfig, StealthConfig};
use crw_renderer::FallbackRenderer;
use std::collections::HashMap;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = RendererConfig::default();
    let stealth = StealthConfig::default();
    let renderer = FallbackRenderer::new(&config, "my-app/1.0", None, &stealth);

    let result = renderer.fetch("https://example.com", &HashMap::new(), None, None).await?;
    println!("Status: {}", result.status_code);
    println!("HTML length: {}", result.html.len());
    Ok(())
}

Smart mode (auto-detect SPAs)

When a JS renderer is configured, FallbackRenderer automatically detects SPA shells and re-renders with a headless browser:

use crw_core::config::{RendererConfig, StealthConfig, CdpEndpoint};
use crw_renderer::FallbackRenderer;
use std::collections::HashMap;

let config = RendererConfig {
    mode: "auto".into(),
    lightpanda: Some(CdpEndpoint { ws_url: "ws://127.0.0.1:9222".into() }),
    ..Default::default()
};
let stealth = StealthConfig::default();
let renderer = FallbackRenderer::new(&config, "my-app/1.0", None, &stealth);

// Auto mode: HTTP first, JS rendering if SPA detected
let result = renderer.fetch("https://spa-app.com", &HashMap::new(), None, None).await?;

SPA detection

Use the detector directly to check if HTML needs JS rendering:

use crw_renderer::detector::needs_js_rendering;

let spa_html = r#"<html><body><div id="root"></div><script src="/app.js"></script></body></html>"#;
assert!(needs_js_rendering(spa_html));

let static_html = r#"<html><body><h1>Hello</h1><p>This is a static page with content.</p></body></html>"#;
assert!(!needs_js_rendering(static_html));

Stealth mode

Enable User-Agent rotation and browser-like headers to reduce bot detection:

use crw_core::config::{RendererConfig, StealthConfig};
use crw_renderer::FallbackRenderer;

let stealth = StealthConfig {
    enabled: true,
    inject_headers: true,
    user_agents: vec![], // uses built-in Chrome/Firefox/Safari pool
};
let renderer = FallbackRenderer::new(&RendererConfig::default(), "crw/1.0", None, &stealth);

Health check

let health = renderer.check_health().await;
for (name, available) in &health {
    println!("{name}: {}", if *available { "ok" } else { "down" });
}

Part of CRW

This crate is part of the CRW workspace — a fast, lightweight, Firecrawl-compatible web scraper built in Rust.

Crate Description
crw-core Core types, config, and error handling
crw-renderer HTTP + CDP browser rendering engine (this crate)
crw-extract HTML → markdown/plaintext extraction
crw-crawl Async BFS crawler with robots.txt & sitemap
crw-server Firecrawl-compatible API server
crw-cli Standalone CLI (crw binary)
crw-mcp MCP stdio proxy binary

License

AGPL-3.0 — see LICENSE.