[][src]Crate rust_warc

A high performance Web Archive (WARC) file parser

The WarcReader iterates over WarcRecords from a BufRead input.

Perfomance should be quite good, about ~500MiB/s on a single CPU core.

Usage

use rust_warc::WarcReader;

fn main() {
    // we're taking input from stdin here, but any BufRead will do
    let stdin = std::io::stdin();
    let handle = stdin.lock();

    let mut warc = WarcReader::new(handle);

    let mut response_counter = 0;
    for item in warc {
        let record = item.expect("IO/malformed error");

        // header names are case insensitive
        if record.header.get(&"WARC-Type".into()) == Some(&"response".into()) {
            response_counter += 1;
        }
    }

    println!("# response records: {}", response_counter);
}

Structs

CaseString

Case insensitive string

WarcReader

WARC reader instance

WarcRecord

WARC Record

Enums

WarcError

WARC Processing error