Crate kawat

Expand description

§kawat

A Rust library for web content extraction, inspired by trafilatura.

Extracts main text, metadata, and comments from HTML documents with a multi-algorithm fallback cascade.

§Usage

use kawat::{extract, fetch_url, ExtractorOptions};

// From URL
let html = fetch_url("https://example.org/article").unwrap();
let text = extract(&html, &ExtractorOptions::default()).unwrap();

// With options
let options = ExtractorOptions {
    with_metadata: true,
    ..Default::default()
};
let text = extract(&html, &options).unwrap();

§Name

Kawat is Indonesian for “wire” — the same metallurgical metaphor as trafilatura (Italian for “wire drawing”), symbolizing the refinement of raw HTML into clean, structured text.

Re-exports§

pub use htmldate_rs;

Structs§

Document: A fully extracted document with text, metadata, and comments.
ExtractorOptions: Complete extraction configuration. Equivalent to trafilatura’s Extractor class.

Enums§

ExtractionError
OutputFormat: Supported output formats.

Functions§

bare_extraction: Extract content from an HTML document.
extract: Extract and format content, equivalent to trafilatura’s extract().
fetch_url: Fetch a URL and return the HTML content.
fetch_url_async: Async version of fetch_url.

Crate kawat

Crate kawat Copy item path

§kawat

§Usage

§Name

Re-exports§

Structs§

Enums§

Functions§

Crate kawat