crw-core
Core types, configuration, and error handling for the CRW web scraper.
Overview
crw-core provides the foundational building blocks shared across all CRW crates:
- Configuration — Layered TOML config with environment variable overrides (
AppConfig) - Error handling — Unified error type (
CrwError) and result alias (CrwResult) - Shared types —
ScrapeRequest,ScrapeData,FetchResult,OutputFormat,ChunkStrategy, and more - SSRF protection — URL validation that blocks private IPs, cloud metadata endpoints, loopback, and non-HTTP schemes
- MCP types — JSON-RPC request/response types for MCP protocol support
Installation
Usage
Configuration
CRW uses layered configuration: built-in defaults → config.local.toml → environment variables.
use AppConfig;
let config = load.unwrap;
println!;
println!;
println!;
Override any setting with environment variables using the CRW_ prefix:
CRW_SERVER__PORT=8080 CRW_CRAWLER__MAX_CONCURRENCY=20
Error handling
All CRW crates return CrwResult<T>, which uses the unified CrwError enum:
use ;
Error variants: HttpError, UrlParseError, InvalidRequest, RendererError, ExtractionError, CrawlError, Timeout, ConfigError, NotFound, Internal.
SSRF protection
Validate URLs before fetching to prevent server-side request forgery:
use validate_safe_url;
let url = parse.unwrap;
assert!;
let private = parse.unwrap;
assert!; // blocks AWS metadata
Use safe_redirect_policy() with reqwest to block SSRF via redirects:
use safe_redirect_policy;
let client = builder
.redirect
.build
.unwrap;
Shared types
use ;
let request = ScrapeRequest ;
Part of CRW
This crate is part of the CRW workspace — a fast, lightweight, Firecrawl-compatible web scraper built in Rust.
| Crate | Description |
|---|---|
| crw-core | Core types, config, and error handling (this crate) |
| crw-renderer | HTTP + CDP browser rendering engine |
| crw-extract | HTML → markdown/plaintext extraction |
| crw-crawl | Async BFS crawler with robots.txt & sitemap |
| crw-server | Firecrawl-compatible API server |
| crw-cli | Standalone CLI (crw binary) |
| crw-mcp | MCP stdio proxy binary |
License
AGPL-3.0 — see LICENSE.