Skip to main content

Crate servo_fetch

Crate servo_fetch 

Source
Expand description

Web content extraction library powered by Servo and Readability.

This crate provides utilities for extracting readable content from HTML:

  • extract — Convert HTML into Markdown or structured JSON using Mozilla’s Readability algorithm.
  • layout — CSS layout heuristics to detect and strip navbars, sidebars, and footers before extraction.
  • sanitize — Strip ANSI escape sequences and control characters from output strings.

Modules§

extract
Content extraction — converts raw HTML into readable Markdown or structured JSON.
layout
CSS layout heuristics — detects page structure (navbar, sidebar, footer, main) to improve content extraction accuracy.
sanitize
Strips terminal escape sequences and control characters from output.