# An Apache Parquet implementation in Rust
## Usage
Add this to your Cargo.toml:
```toml
[dependencies]
parquet = "5.0.0-SNAPSHOT"
```
and this to your crate root:
```rust
extern crate parquet;
```
Example usage of reading data:
```rust
use std::fs::File;
use std::path::Path;
use parquet::file::reader::{FileReader, SerializedFileReader};
let file = File::open(&Path::new("/path/to/file")).unwrap();
let reader = SerializedFileReader::new(file).unwrap();
let mut iter = reader.get_row_iter(None).unwrap();
while let Some(record) = iter.next() {
println!("{}", record);
}
```
See [crate documentation](https://docs.rs/crate/parquet/5.0.0-SNAPSHOT) on available API.
## Upgrading from versions prior to 4.0
If you are upgrading from version 3.0 or previous of this crate, you
likely need to change your code to use [`ConvertedType`] rather than
[`LogicalType`] to preserve existing behaviour in your code.
Version 2.4.0 of the Parquet format introduced a `LogicalType` to replace the existing `ConvertedType`.
This crate used `parquet::basic::LogicalType` to map to the `ConvertedType`, but this has been renamed to `parquet::basic::ConvertedType` from version 4.0 of this crate.
The `ConvertedType` is deprecated in the format, but is still written
to preserve backward compatibility.
It is preferred that `LogicalType` is used, as it supports nanosecond
precision timestamps without using the deprecated `Int96` Parquet type.
## Supported Parquet Version
- Parquet-format 2.6.0
To update Parquet format to a newer version, check if [parquet-format](https://github.com/sunchao/parquet-format-rs)
version is available. Then simply update version of `parquet-format` crate in Cargo.toml.
## Features
- [X] All encodings supported
- [X] All compression codecs supported
- [X] Read support
- [X] Primitive column value readers
- [X] Row record reader
- [X] Arrow record reader
- [ ] Statistics support
- [X] Write support
- [X] Primitive column value writers
- [ ] Row record writer
- [X] Arrow record writer
- [ ] Predicate pushdown
- [X] Parquet format 2.6.0 support
## Requirements
Parquet requires LLVM. Our windows CI image includes LLVM but to build the libraries locally windows
users will have to install LLVM. Follow [this](https://github.com/appveyor/ci/issues/2651) link for info.
## Build
Run `cargo build` or `cargo build --release` to build in release mode.
Some features take advantage of SSE4.2 instructions, which can be
enabled by adding `RUSTFLAGS="-C target-feature=+sse4.2"` before the
`cargo build` command.
## Test
Run `cargo test` for unit tests. To also run tests related to the binaries, use `cargo test --features cli`.
## Binaries
The following binaries are provided (use `cargo install --features cli` to install them):
- **parquet-schema** for printing Parquet file schema and metadata.
`Usage: parquet-schema <file-path>`, where `file-path` is the path to a Parquet file. Use `-v/--verbose` flag
to print full metadata or schema only (when not specified only schema will be printed).
- **parquet-read** for reading records from a Parquet file.
`Usage: parquet-read <file-path> [num-records]`, where `file-path` is the path to a Parquet file,
and `num-records` is the number of records to read from a file (when not specified all records will
be printed). Use `-j/--json` to print records in JSON lines format.
- **parquet-rowcount** for reporting the number of records in one or more Parquet files.
`Usage: parquet-rowcount <file-paths>...`, where `<file-paths>...` is a space separated list of one or more
files to read.
If you see `Library not loaded` error, please make sure `LD_LIBRARY_PATH` is set properly:
```
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(rustc --print sysroot)/lib
```
## Benchmarks
Run `cargo bench` for benchmarks.
## Docs
To build documentation, run `cargo doc --no-deps`.
To compile and view in the browser, run `cargo doc --no-deps --open`.
## License
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0.