An Apache Parquet implementation in Rust
Usage
Add this to your Cargo.toml:
[]
= "2.0.0"
and this to your crate root:
extern crate parquet;
Example usage of reading data:
use File;
use Path;
use ;
let file = open.unwrap;
let reader = new.unwrap;
let mut iter = reader.get_row_iter.unwrap;
while let Some = iter.next
See crate documentation on available API.
Supported Parquet Version
- Parquet-format 2.4.0
To update Parquet format to a newer version, check if parquet-format
version is available. Then simply update version of parquet-format
crate in Cargo.toml.
Features
- All encodings supported
- All compression codecs supported
- Read support
- Primitive column value readers
- Row record reader
- Arrow record reader
- Statistics support
- Write support
- Primitive column value writers
- Row record writer
- Arrow record writer
- Predicate pushdown
- Parquet format 2.5 support
Requirements
- Rust nightly
See Working with nightly Rust to install nightly toolchain and set it as default.
Parquet requires LLVM. Our windows CI image includes LLVM but to build the libraries locally windows users will have to install LLVM. Follow this link for info.
Build
Run cargo build
or cargo build --release
to build in release mode.
Some features take advantage of SSE4.2 instructions, which can be
enabled by adding RUSTFLAGS="-C target-feature=+sse4.2"
before the
cargo build
command.
Test
Run cargo test
for unit tests.
Binaries
The following binaries are provided (use cargo install
to install them):
-
parquet-schema for printing Parquet file schema and metadata.
Usage: parquet-schema <file-path> [verbose]
, wherefile-path
is the path to a Parquet file, and optionalverbose
is the boolean flag that allows to print full metadata or schema only (when not specified only schema will be printed). -
parquet-read for reading records from a Parquet file.
Usage: parquet-read <file-path> [num-records]
, wherefile-path
is the path to a Parquet file, andnum-records
is the number of records to read from a file (when not specified all records will be printed). -
parquet-rowcount for reporting the number of records in one or more Parquet files.
Usage: parquet-rowcount <file-path> ...
, wherefile-path
is the path to a Parquet file, and...
indicates any number of additional parquet files.
If you see Library not loaded
error, please make sure LD_LIBRARY_PATH
is set properly:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(rustc --print sysroot)/lib
Benchmarks
Run cargo bench
for benchmarks.
Docs
To build documentation, run cargo doc --no-deps
.
To compile and view in the browser, run cargo doc --no-deps --open
.
License
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0.