Expand description
A high performance Web Archive (WARC) file parser
The WarcReader iterates over WarcRecords from a BufRead input.
Perfomance should be quite good, about ~500MiB/s on a single CPU core.
§Usage
use rust_warc::WarcReader;
fn main() {
// we're taking input from stdin here, but any BufRead will do
let stdin = std::io::stdin();
let handle = stdin.lock();
let mut warc = WarcReader::new(handle);
let mut response_counter = 0;
for item in warc {
let record = item.expect("IO/malformed error");
// header names are case insensitive
if record.header.get(&"WARC-Type".into()) == Some(&"response".into()) {
response_counter += 1;
}
}
println!("# response records: {}", response_counter);
}
Structs§
- Case insensitive string
- WARC reader instance
- WARC Record
Enums§
- WARC Processing error