Expand description
Provides API for reading/writing Arrow RecordBatches and Arrays to/from Parquet Files.
Apache Arrow is a cross-language development platform for in-memory data.
Example of writing Arrow record batch to Parquet file
use arrow_array::{Int32Array, ArrayRef};
use arrow_array::RecordBatch;
use parquet::arrow::arrow_writer::ArrowWriter;
use parquet::file::properties::WriterProperties;
use std::fs::File;
use std::sync::Arc;
let ids = Int32Array::from(vec![1, 2, 3, 4]);
let vals = Int32Array::from(vec![5, 6, 7, 8]);
let batch = RecordBatch::try_from_iter(vec![
("id", Arc::new(ids) as ArrayRef),
("val", Arc::new(vals) as ArrayRef),
]).unwrap();
let file = File::create("data.parquet").unwrap();
// Default writer properties
let props = WriterProperties::builder().build();
let mut writer = ArrowWriter::try_new(file, batch.schema(), Some(props)).unwrap();
writer.write(&batch).expect("Writing batch");
// writer must be closed to write footer
writer.close().unwrap();
WriterProperties
can be used to set Parquet file options
use parquet::file::properties::WriterProperties;
use parquet::basic::{ Compression, Encoding };
use parquet::file::properties::WriterVersion;
// File compression
let props = WriterProperties::builder()
.set_compression(Compression::SNAPPY)
.build();
Example of reading parquet file into arrow record batch
use std::fs::File;
use parquet::arrow::arrow_reader::ParquetRecordBatchReaderBuilder;
let file = File::open("data.parquet").unwrap();
let builder = ParquetRecordBatchReaderBuilder::try_new(file).unwrap();
println!("Converted arrow schema is: {}", builder.schema());
let mut reader = builder.build().unwrap();
let record_batch = reader.next().unwrap().unwrap();
println!("Read {} records.", record_batch.num_rows());
Re-exports
pub use self::arrow_reader::ArrowReader;
Deprecated
pub use self::arrow_reader::ParquetFileArrowReader;
Deprecated
pub use self::arrow_writer::ArrowWriter;
pub use self::async_reader::ParquetRecordBatchStreamBuilder;
Modules
Contains reader which reads parquet data into arrow
RecordBatch
Contains writer which writes arrow data into parquet data.
Provides
async
API for reading parquet files as
RecordBatch
esStructs
A
ProjectionMask
identifies a set of columns within a potentially nested schema to projectConstants
Schema metadata key used to store serialized Arrow IPC schema
Functions
Convert arrow schema to parquet schema
Convert Parquet schema to Arrow schema including optional metadata.
Attempts to decode any existing Arrow schema metadata, falling back
to converting the Parquet schema column-wise
Convert parquet schema to arrow schema including optional metadata,
only preserving some leaf columns.