Crate orc_format
source · [−]Expand description
Welcome to orc-format documentation. Thanks for checking it out!
This Rust crate is a toolkit to read and deserialize ORC to your favourite in-memory format.
Below is an example of how to read a column from ORC into memory:
use std::fs::File;
use orc_format::{error::Error, read, read::Column};
fn get_stripe(path: &str, column: u32) -> Result<Column, Error> {
// open the file, as expected. buffering this is not necessary - we
// are very careful about the number of `read`s we perform.
let mut f = File::open(path).expect("no file found");
// read the files' metadata
let metadata = read::read_metadata(&mut f)?;
// and copy the compression it is using
let compression = metadata.postscript.compression();
// the next step is to identify which stripe we want to read. Let's say it is the first one.
let stripe = &metadata.footer.stripes[0];
// Each stripe has a footer - we need to read it to extract the location of each column on it.
let stripe_footer = read::read_stripe_footer(&mut f, stripe, compression, &mut vec![])?;
// Finally, we read the column into `Column`
read::read_stripe_column(&mut f, stripe, stripe_footer, compression, column, vec![])
}To deserialize the values of a column, use things inside read::decode.
For example, the below contains the deserialization of the “Present” to a Vec<bool>.
use orc_format::{error::Error, proto::stream::Kind, read::decode::BooleanIter, read::Column};
fn deserialize_present(column: &Column, scratch: &mut Vec<u8>) -> Result<Vec<bool>, Error> {
let mut reader = column.get_stream(Kind::Present, std::mem::take(scratch))?;
let mut validity = Vec::with_capacity(column.number_of_rows());
BooleanIter::new(&mut reader, column.number_of_rows()).try_for_each(|item| {
validity.push(item?);
Result::<(), Error>::Ok(())
})?;
*scratch = std::mem::take(&mut reader.into_inner());
Ok(validity)
}Check out the integration tests of the crate to find deserialization of other types such as floats, integers, strings and dictionaries.
Re-exports
pub use fallible_streaming_iterator;