Struct parquet::arrow::arrow_reader::RowFilter

source ·

pub struct RowFilter { /* private fields */ }

Expand description

A RowFilter allows pushing down a filter predicate to skip IO and decode

This consists of a list of ArrowPredicate where only the rows that satisfy all of the predicates will be returned. Any RowSelection will be applied prior to the first predicate, and each predicate in turn will then be used to compute a more refined RowSelection to use when evaluating the subsequent predicates.

Once all predicates have been evaluated, the final RowSelection is applied to the top-level ProjectionMask to produce the final output RecordBatch.

This design has a couple of implications:

RowFilter can be used to skip entire pages, and thus IO, in addition to CPU decode overheads
Columns may be decoded multiple times if they appear in multiple ProjectionMask
IO will be deferred until needed by a ProjectionMask

As such there is a trade-off between a single large predicate, or multiple predicates, that will depend on the shape of the data. Whilst multiple smaller predicates may minimise the amount of data scanned/decoded, it may not be faster overall.

For example, if a predicate that needs a single column of data filters out all but 1% of the rows, applying it as one of the early ArrowPredicateFn will likely significantly improve performance.

As a counter example, if a predicate needs several columns of data to evaluate but leaves 99% of the rows, it may be better to not filter the data from parquet and apply the filter after the RecordBatch has been fully decoded.

Additionally, even if a predicate eliminates a moderate number of rows, it may still be faster to filter the data after the RecordBatch has been fully decoded, if the eliminated rows are not contiguous.

Struct parquet::arrow::arrow_reader::RowFilter

Implementations§

impl RowFilter

pub fn new(predicates: Vec<Box<dyn ArrowPredicate>>) -> Self

Auto Trait Implementations§

impl Freeze for RowFilter

impl !RefUnwindSafe for RowFilter

impl Send for RowFilter

impl !Sync for RowFilter

impl Unpin for RowFilter

impl !UnwindSafe for RowFilter

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,