Struct parquet::arrow::arrow_writer::ArrowWriter
source · pub struct ArrowWriter<W: Write> { /* private fields */ }
Expand description
Arrow writer
Writes Arrow RecordBatch
es to a Parquet writer, buffering up RecordBatch
in order
to produce row groups with max_row_group_size
rows. Any remaining rows will be
flushed on close, leading the final row group in the output file to potentially
contain fewer than max_row_group_size
rows
let col = Arc::new(Int64Array::from_iter_values([1, 2, 3])) as ArrayRef;
let to_write = RecordBatch::try_from_iter([("col", col)]).unwrap();
let mut buffer = Vec::new();
let mut writer = ArrowWriter::try_new(&mut buffer, to_write.schema(), None).unwrap();
writer.write(&to_write).unwrap();
writer.close().unwrap();
let mut reader = ParquetFileArrowReader::try_new(Bytes::from(buffer)).unwrap();
let mut reader = reader.get_record_reader(1024).unwrap();
let read = reader.next().unwrap().unwrap();
assert_eq!(to_write, read);
Implementations§
source§impl<W: Write> ArrowWriter<W>
impl<W: Write> ArrowWriter<W>
sourcepub fn try_new(
writer: W,
arrow_schema: SchemaRef,
props: Option<WriterProperties>
) -> Result<Self>
pub fn try_new(
writer: W,
arrow_schema: SchemaRef,
props: Option<WriterProperties>
) -> Result<Self>
Try to create a new Arrow writer
The writer will fail if:
- a
SerializedFileWriter
cannot be created from the ParquetWriter - the Arrow schema contains unsupported datatypes such as Unions
sourcepub fn flushed_row_groups(&self) -> &[RowGroupMetaDataPtr]
pub fn flushed_row_groups(&self) -> &[RowGroupMetaDataPtr]
Returns metadata for any flushed row groups
sourcepub fn write(&mut self, batch: &RecordBatch) -> Result<()>
pub fn write(&mut self, batch: &RecordBatch) -> Result<()>
Enqueues the provided RecordBatch
to be written
If following this there are more than max_row_group_size
rows buffered,
this will flush out one or more row groups with max_row_group_size
rows,
and drop any fully written RecordBatch
sourcepub fn into_inner(self) -> Result<W>
pub fn into_inner(self) -> Result<W>
Flushes any outstanding data and returns the underlying writer.
sourcepub fn close(self) -> Result<FileMetaData>
pub fn close(self) -> Result<FileMetaData>
Close and finalize the underlying Parquet writer