# crw-renderer
HTTP and headless-browser rendering engine for the [CRW](https://github.com/us/crw) web scraper.
[](https://crates.io/crates/crw-renderer)
[](https://docs.rs/crw-renderer)
[](https://github.com/us/crw/blob/main/LICENSE)
## Overview
`crw-renderer` fetches web pages via plain HTTP and optionally re-renders them through a CDP-based headless browser when SPA content is detected.
- **`FallbackRenderer`** — Composite renderer: tries HTTP first, falls back to JS rendering when the page looks like a SPA shell
- **`HttpFetcher`** — Fast reqwest-based HTTP fetcher with stealth headers, gzip/brotli decompression, and proxy support
- **SPA detection** — Heuristic analysis of the HTML response (empty body, framework markers like `__NEXT_DATA__`, `ng-app`, `nuxt`) to auto-detect pages that need JS rendering
- **CDP rendering** — Chrome DevTools Protocol support for LightPanda, Playwright, and Chrome (requires `cdp` feature)
- **Stealth mode** — User-Agent rotation from a built-in Chrome/Firefox/Safari pool and browser-like header injection
## Installation
```bash
cargo add crw-renderer
```
With CDP (headless browser) support:
```bash
cargo add crw-renderer --features cdp
```
## Feature flags
| `cdp` | off | Enables CDP WebSocket rendering via `tokio-tungstenite` (LightPanda, Playwright, Chrome) |
## Usage
### Basic HTTP fetching
```rust,no_run
use crw_core::config::{RendererConfig, StealthConfig};
use crw_renderer::FallbackRenderer;
use std::collections::HashMap;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = RendererConfig::default();
let stealth = StealthConfig::default();
let renderer = FallbackRenderer::new(&config, "my-app/1.0", None, &stealth);
let result = renderer.fetch("https://example.com", &HashMap::new(), None, None).await?;
println!("Status: {}", result.status_code);
println!("HTML length: {}", result.html.len());
Ok(())
}
```
### Smart mode (auto-detect SPAs)
When a JS renderer is configured, `FallbackRenderer` automatically detects SPA shells and re-renders with a headless browser:
```rust,no_run
use crw_core::config::{RendererConfig, StealthConfig, CdpEndpoint};
use crw_renderer::FallbackRenderer;
use std::collections::HashMap;
let config = RendererConfig {
mode: "auto".into(),
lightpanda: Some(CdpEndpoint { ws_url: "ws://127.0.0.1:9222".into() }),
..Default::default()
};
let stealth = StealthConfig::default();
let renderer = FallbackRenderer::new(&config, "my-app/1.0", None, &stealth);
// Auto mode: HTTP first, JS rendering if SPA detected
let result = renderer.fetch("https://spa-app.com", &HashMap::new(), None, None).await?;
```
### SPA detection
Use the detector directly to check if HTML needs JS rendering:
```rust
use crw_renderer::detector::needs_js_rendering;
let spa_html = r#"<html><body><div id="root"></div><script src="/app.js"></script></body></html>"#;
assert!(needs_js_rendering(spa_html));
let static_html = r#"<html><body><h1>Hello</h1><p>This is a static page with content.</p></body></html>"#;
assert!(!needs_js_rendering(static_html));
```
### Stealth mode
Enable User-Agent rotation and browser-like headers to reduce bot detection:
```rust,no_run
use crw_core::config::{RendererConfig, StealthConfig};
use crw_renderer::FallbackRenderer;
let stealth = StealthConfig {
enabled: true,
inject_headers: true,
user_agents: vec![], // uses built-in Chrome/Firefox/Safari pool
};
let renderer = FallbackRenderer::new(&RendererConfig::default(), "crw/1.0", None, &stealth);
```
### Health check
```rust,no_run
let health = renderer.check_health().await;
for (name, available) in &health {
println!("{name}: {}", if *available { "ok" } else { "down" });
}
```
## Part of CRW
This crate is part of the [CRW](https://github.com/us/crw) workspace — a fast, lightweight, Firecrawl-compatible web scraper built in Rust.
| [crw-core](https://crates.io/crates/crw-core) | Core types, config, and error handling |
| **crw-renderer** | HTTP + CDP browser rendering engine (this crate) |
| [crw-extract](https://crates.io/crates/crw-extract) | HTML → markdown/plaintext extraction |
| [crw-crawl](https://crates.io/crates/crw-crawl) | Async BFS crawler with robots.txt & sitemap |
| [crw-server](https://crates.io/crates/crw-server) | Firecrawl-compatible API server |
| [crw-cli](https://crates.io/crates/crw-cli) | Standalone CLI (`crw` binary) |
| [crw-mcp](https://crates.io/crates/crw-mcp) | MCP stdio proxy binary |
## License
AGPL-3.0 — see [LICENSE](https://github.com/us/crw/blob/main/LICENSE).