rust-sequencefile
Hadoop SequenceFile library for Rust
# Cargo.toml
[]
= "0.1.4"
Status
Prototype status! I'm in the process of learning Rust. :) Feedback appreciated.
Unfortunately that means the API will change. If you depend on this crate, please fully qualify your versions for now.
Currently supports reading out your garden-variety sequence file. Handles uncompressed sequencefiles as well as record compressed files (deflate only). The most common type of sequence file, block compressed, isn't supported yet.
There's a lot more to do:
- Varint decoding
- Block sizes are written with Varints
- Block decompression
- Gzip support
- Bzip2 support
- Sequencefile metadata
- Better error handling
- Tests
- Better error handling2
- Iterator should return Result<(ByteString, ByteString)>
- More tests
- Better documentation
- Snappy support
- CRC file support
- 'Writables', e.g. generic deserialization for common Hadoop writable types
- TODO: "Reflection" of some sort to allow registration of custom types.
- Writer
- Gracefully handle version 4 sequencefiles
- Zero-copy implementation.
Usage
let path = new;
let file = open.unwrap;
let seqfile = match new
for kv in seqfile
// Until there's automatic deserialization, you can do something like this:
// VERY hacky
let kvs = seqfile.map.map;
for in kvs
License
rust-sequencefile is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0), with portions covered by various BSD-like licenses.
See LICENSE-APACHE, and LICENSE-MIT for details.