rust-sequencefile
Hadoop SequenceFile library for Rust
# Cargo.toml
[]
= "0.2.0"
Status
Prototype status! I'm in the process of learning Rust. :) Feedback appreciated.
Unfortunately that means the API will change. If you depend on this crate, please fully qualify your versions for now.
Currently supports reading out your garden-variety sequence file. Handles uncompressed sequencefiles as well as block/record compressed files (deflate, gzip, and bzip2 only). LZO and Snappy are not (yet) handled.
There's a lot more to do:
- Varint decoding
- Block sizes are written with Varints
- Block decompression
- Gzip support
- Bzip2 support
- Sequencefile metadata
- Better error handling
- Tests
- Better error handling2
- More tests
- Better documentation
- Snappy support
- CRC file support
- 'Writables', e.g. generic deserialization for common Hadoop writable types
- Writer
- Gracefully handle version 4 sequencefiles
- Zero-copy implementation.
- LZO support.
Benchmarks
There aren't any formal benchmarks yet. However with deflate on my early 2012 MBP, 98.4% of CPU time was spent in miniz producing ~125MB/s of decompressed data.
Usage
let path = new;
let file = open.unwrap;
let seqfile = new.expect;
for kv in seqfile
// Until there's automatic deserialization, you can do something like this:
// VERY hacky
let kvs = seqfile.map.map;
for in kvs
License
rust-sequencefile is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0), with portions covered by various BSD-like licenses.
See LICENSE-APACHE, and LICENSE-MIT for details.