pub struct LazyFrame {
    pub logical_plan: LogicalPlan,
    /* private fields */
}
Expand description

Lazy abstraction over an eager DataFrame. It really is an abstraction over a logical plan. The methods of this struct will incrementally modify a logical plan until output is requested (via collect)

Fields

logical_plan: LogicalPlan

Implementations

Get a dot language representation of the LogicalPlan.

Create a LazyFrame directly from a ipc scan.

👎Deprecated: please use concat_lf instead

Create a LazyFrame directly from a parquet scan.

Create a LazyFrame directly from a parquet scan.

Get a hold on the schema of the current LazyFrame computation.

Set allowed optimizations

Turn off all optimizations

Toggle projection pushdown optimization.

Toggle predicate pushdown optimization.

Toggle type coercion optimization.

Toggle expression simplification optimization on or off

Toggle aggregate pushdown.

Toggle slice pushdown optimization

Describe the logical plan.

Describe the optimized logical plan.

Add a sort operation to the logical plan.

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;

/// Sort DataFrame by 'sepal.width' column
fn example(df: DataFrame) -> LazyFrame {
      df.lazy()
        .sort("sepal.width", Default::default())
}

Add a sort operation to the logical plan.

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;

/// Sort DataFrame by 'sepal.width' column
fn example(df: DataFrame) -> LazyFrame {
      df.lazy()
        .sort_by_exprs(vec![col("sepal.width")], vec![false], false)
}

Reverse the DataFrame

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;

fn example(df: DataFrame) -> LazyFrame {
      df.lazy()
        .reverse()
}

Rename columns in the DataFrame.

Removes columns from the DataFrame. Note that its better to only select the columns you need and let the projection pushdown optimize away the unneeded columns.

Shift the values by a given period and fill the parts that will be empty due to this operation with Nones.

See the method on Series for more info on the shift operation.

Shift the values by a given period and fill the parts that will be empty due to this operation with the result of the fill_value expression.

See the method on Series for more info on the shift operation.

Fill none values in the DataFrame

Fill NaN values in the DataFrame

Caches the result into a new LazyFrame. This should be used to prevent computations running multiple times

Fetch is like a collect operation, but it overwrites the number of rows read by every scan operation. This is a utility that helps debug a query on a smaller number of rows.

Note that the fetch does not guarantee the final number of rows in the DataFrame. Filter, join operations and a lower number of rows available in the scanned file influence the final number of rows.

Execute all the lazy operations and collect them into a DataFrame. Before execution the query is being optimized.

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;

fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    df.lazy()
      .groupby([col("foo")])
      .agg([col("bar").sum(), col("ham").mean().alias("avg_ham")])
      .collect()
}

Filter by some predicate expression.

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;

fn example(df: DataFrame) -> LazyFrame {
      df.lazy()
        .filter(col("sepal.width").is_not_null())
        .select(&[col("sepal.width"), col("sepal.length")])
}

Select (and rename) columns from the query.

Columns can be selected with col; If you want to select all columns use col("*").

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;

/// This function selects column "foo" and column "bar".
/// Column "bar" is renamed to "ham".
fn example(df: DataFrame) -> LazyFrame {
      df.lazy()
        .select(&[col("foo"),
                  col("bar").alias("ham")])
}

/// This function selects all columns except "foo"
fn exclude_a_column(df: DataFrame) -> LazyFrame {
      df.lazy()
        .select(&[col("*").exclude(["foo"])])
}

Group by and aggregate.

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
use polars_arrow::prelude::QuantileInterpolOptions;

fn example(df: DataFrame) -> LazyFrame {
      df.lazy()
       .groupby([col("date")])
       .agg([
           col("rain").min(),
           col("rain").sum(),
           col("rain").quantile(0.5, QuantileInterpolOptions::Nearest).alias("median_rain"),
       ])
}

Create rolling groups based on a time column.

Also works for index values of type Int32 or Int64.

Different from a [dynamic_groupby] the windows are now determined by the individual values and are not of constant intervals. For constant intervals use groupby_dynamic

Group based on a time value (or index value of type Int32, Int64).

Time windows are calculated and rows are assigned to windows. Different from a normal groupby is that a row can be member of multiple groups. The time/index window could be seen as a rolling window, with a window size determined by dates/times/values instead of slots in the DataFrame.

A window is defined by:

  • every: interval of the window
  • period: length of the window
  • offset: offset of the window

The by argument should be empty [] if you don’t want to combine this with a ordinary groupby on these keys.

Similar to groupby, but order of the DataFrame is maintained.

Join query with other lazy query.

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn join_dataframes(ldf: LazyFrame, other: LazyFrame) -> LazyFrame {
        ldf
        .left_join(other, col("foo"), col("bar"))
}

Join query with other lazy query.

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn join_dataframes(ldf: LazyFrame, other: LazyFrame) -> LazyFrame {
        ldf
        .outer_join(other, col("foo"), col("bar"))
}

Join query with other lazy query.

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn join_dataframes(ldf: LazyFrame, other: LazyFrame) -> LazyFrame {
        ldf
        .inner_join(other, col("foo"), col("bar").cast(DataType::Utf8))
}

Creates the cartesian product from both frames, preserves the order of the left keys.

Generic join function that can join on multiple columns.

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;

fn example(ldf: LazyFrame, other: LazyFrame) -> LazyFrame {
        ldf
        .join(other, [col("foo"), col("bar")], [col("foo"), col("bar")], JoinType::Inner)
}

Control more join options with the join builder.

Add a column to a DataFrame

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn add_column(df: DataFrame) -> LazyFrame {
    df.lazy()
        .with_column(
            when(col("sepal.length").lt(lit(5.0)))
            .then(lit(10))
            .otherwise(lit(1))
            .alias("new_column_name"),
            )
}

Add multiple columns to a DataFrame.

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn add_columns(df: DataFrame) -> LazyFrame {
    df.lazy()
        .with_columns(
            vec![lit(10).alias("foo"), lit(100).alias("bar")]
         )
}

Aggregate all the columns as their maximum values.

Aggregate all the columns as their minimum values.

Aggregate all the columns as their sum values.

Aggregate all the columns as their mean values.

Aggregate all the columns as their median values.

Aggregate all the columns as their quantile values.

Aggregate all the columns as their standard deviation values.

Aggregate all the columns as their variance values.

Apply explode operation. See eager explode.

Keep unique rows and maintain order

Keep unique rows, do not maintain order

Drop null rows.

Equal to LazyFrame::filter(col("*").is_not_null())

Slice the DataFrame.

Get the first row.

Get the last row

Get the last n rows

Melt the DataFrame from wide to long format

Limit the DataFrame to the first n rows. Note if you don’t want the rows to be scanned, use fetch.

Apply a function/closure once the logical plan get executed.

Warning

This can blow up in your face if the schema is changed due to the operation. The optimizer relies on a correct schema.

You can toggle certain optimizations off.

Add a new column at index 0 that counts the rows.

Warning

This can have a negative effect on query performance. This may for instance block predicate pushdown optimization.

Unnest the given Struct columns. This means that the fields of the Struct type will be inserted as columns.

Trait Implementations

Returns a copy of the value. Read more
Performs copy-assignment from source. Read more
Returns the “default value” for a type. Read more
Converts to this type from the input type.

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more
Immutably borrows from an owned value. Read more
Mutably borrows from an owned value. Read more

Returns the argument unchanged.

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

The alignment of pointer.
The type for initializers.
Initializes a with the given initializer. Read more
Dereferences the given pointer. Read more
Mutably dereferences the given pointer. Read more
Drops the object pointed to by the given pointer. Read more
The resulting type after obtaining ownership.
Creates owned data from borrowed data, usually by cloning. Read more
Uses borrowed data to replace owned data, usually by cloning. Read more
The type returned in the event of a conversion error.
Performs the conversion.
The type returned in the event of a conversion error.
Performs the conversion.