Skip to main content

Crate rust_warc

Crate rust_warc 

Source
Expand description

A high performance Web Archive (WARC) file parser

The WarcReader iterates over WarcRecords from a BufRead input.

Perfomance should be quite good, about ~500MiB/s on a single CPU core.

§Usage

use rust_warc::WarcReader;

// we're taking input from stdin here, but any BufRead will do
let stdin = std::io::stdin();
let handle = stdin.lock();

let mut warc = WarcReader::new(handle);

let mut response_counter = 0;
for item in warc {
    let record = item.expect("IO/malformed error");

    // header names are case insensitive
    if record.header.get(&"WARC-Type".into()) == Some(&"response".into()) {
        response_counter += 1;
    }
}

println!("# response records: {}", response_counter);

Structs§

CaseString
Case insensitive string
WarcReader
WARC reader instance
WarcRecord
WARC Record

Enums§

WarcError
WARC Processing error