Struct polars_io::parquet::ParquetWriter
source · pub struct ParquetWriter<W> { /* private fields */ }Available on crate feature
feature only.Expand description
Write a DataFrame to parquet format
Implementations§
source§impl<W> ParquetWriter<W>where
W: Write,
impl<W> ParquetWriter<W>where
W: Write,
sourcepub fn with_compression(self, compression: ParquetCompression) -> Self
pub fn with_compression(self, compression: ParquetCompression) -> Self
Set the compression used. Defaults to Lz4Raw.
The default compression Lz4Raw has very good performance, but may not yet been supported
by older readers. If you want more compatability guarantees, consider using Snappy.
sourcepub fn with_statistics(self, statistics: bool) -> Self
pub fn with_statistics(self, statistics: bool) -> Self
Compute and write statistic
sourcepub fn with_row_group_size(self, size: Option<usize>) -> Self
pub fn with_row_group_size(self, size: Option<usize>) -> Self
Set the row group size (in number of rows) during writing. This can reduce memory pressure and improve writing performance.
sourcepub fn with_data_pagesize_limit(self, limit: Option<usize>) -> Self
pub fn with_data_pagesize_limit(self, limit: Option<usize>) -> Self
Sets the maximum bytes size of a data page. If None will be 1024^2 bytes.
sourcepub fn batched(self, schema: &Schema) -> PolarsResult<BatchedWriter<W>>
pub fn batched(self, schema: &Schema) -> PolarsResult<BatchedWriter<W>>
Examples found in repository?
src/parquet/write.rs (line 184)
174 175 176 177 178 179 180 181 182 183 184 185 186 187
pub fn finish(self, df: &mut DataFrame) -> PolarsResult<u64> {
// ensures all chunks are aligned.
df.rechunk();
if let Some(n) = self.row_group_size {
let n_splits = df.height() / n;
if n_splits > 0 {
*df = accumulate_dataframes_vertical_unchecked(split_df(df, n_splits)?);
}
};
let mut batched = self.batched(&df.schema())?;
batched.write_batch(df)?;
batched.finish()
}sourcepub fn finish(self, df: &mut DataFrame) -> PolarsResult<u64>
pub fn finish(self, df: &mut DataFrame) -> PolarsResult<u64>
Write the given DataFrame in the the writer W. Returns the total size of the file.