Trait parquet::arrow::arrow_reader::ArrowReader [−][src]
pub trait ArrowReader {
type RecordReader: RecordBatchReader;
fn get_schema(&mut self) -> Result<Schema>;
fn get_schema_by_columns<T>(
&mut self,
column_indices: T,
leaf_columns: bool
) -> Result<Schema>
where
T: IntoIterator<Item = usize>;
fn get_record_reader(
&mut self,
batch_size: usize
) -> Result<Self::RecordReader>;
fn get_record_reader_by_columns<T>(
&mut self,
column_indices: T,
batch_size: usize
) -> Result<Self::RecordReader>
where
T: IntoIterator<Item = usize>;
}
Expand description
Arrow reader api. With this api, user can get arrow schema from parquet file, and read parquet data into arrow arrays.
Associated Types
Required methods
fn get_schema(&mut self) -> Result<Schema>
fn get_schema(&mut self) -> Result<Schema>
Read parquet schema and convert it into arrow schema.
fn get_schema_by_columns<T>(
&mut self,
column_indices: T,
leaf_columns: bool
) -> Result<Schema> where
T: IntoIterator<Item = usize>,
fn get_schema_by_columns<T>(
&mut self,
column_indices: T,
leaf_columns: bool
) -> Result<Schema> where
T: IntoIterator<Item = usize>,
Read parquet schema and convert it into arrow schema.
This schema only includes columns identified by column_indices
.
To select leaf columns (i.e. a.b.c
instead of a
), set leaf_columns = true
fn get_record_reader(&mut self, batch_size: usize) -> Result<Self::RecordReader>
fn get_record_reader(&mut self, batch_size: usize) -> Result<Self::RecordReader>
Returns record batch reader from whole parquet file.
Arguments
batch_size
: The size of each record batch returned from this reader. Only the
last batch may contain records less than this size, otherwise record batches
returned from this reader should contains exactly batch_size
elements.
fn get_record_reader_by_columns<T>(
&mut self,
column_indices: T,
batch_size: usize
) -> Result<Self::RecordReader> where
T: IntoIterator<Item = usize>,
fn get_record_reader_by_columns<T>(
&mut self,
column_indices: T,
batch_size: usize
) -> Result<Self::RecordReader> where
T: IntoIterator<Item = usize>,
Returns record batch reader whose record batch contains columns identified by
column_indices
.
Arguments
column_indices
: The columns that should be included in record batches.
batch_size
: Please refer to get_record_reader
.