Struct polars_lazy::frame::LazyFrame[][src]

pub struct LazyFrame { /* fields omitted */ }
Expand description

Lazy abstraction over an eager DataFrame. It really is an abstraction over a logical plan. The methods of this struct will incrementally modify a logical plan until output is requested (via collect)

Implementations

Create a LazyFrame directly from a parquet scan.

Get a dot language representation of the LogicalPlan.

Toggle projection pushdown optimization.

Toggle predicate pushdown optimization.

Toggle type coercion optimization.

Toggle expression simplification optimization on or off

Toggle aggregate pushdown.

Toggle global string cache.

Toggle join pruning optimization

Describe the logical plan.

Describe the optimized logical plan.

Add a sort operation to the logical plan.

Example

use polars_core::prelude::*;
use polars_lazy::prelude::*;

/// Sort DataFrame by 'sepal.width' column
fn example(df: DataFrame) -> LazyFrame {
      df.lazy()
        .sort("sepal.width", false)
}

Add a sort operation to the logical plan.

Example

use polars_core::prelude::*;
use polars_lazy::prelude::*;

/// Sort DataFrame by 'sepal.width' column
fn example(df: DataFrame) -> LazyFrame {
      df.lazy()
        .sort_by_exprs(vec![col("sepal.width")], vec![false])
}

Reverse the DataFrame

Example

use polars_core::prelude::*;
use polars_lazy::prelude::*;

fn example(df: DataFrame) -> LazyFrame {
      df.lazy()
        .reverse()
}

Rename a column in the DataFrame

Shift the values by a given period and fill the parts that will be empty due to this operation with Nones.

See the method on Series for more info on the shift operation.

Shift the values by a given period and fill the parts that will be empty due to this operation with the result of the fill_value expression.

See the method on Series for more info on the shift operation.

Fill none values in the DataFrame

Caches the result into a new LazyFrame. This should be used to prevent computations running multiple times

Fetch is like a collect operation, but it overwrites the number of rows read by every scan operation. This is a utility that helps debug a query on a smaller number of rows.

Note that the fetch does not guarantee the final number of rows in the DataFrame. Filter, join operations and a lower number of rows available in the scanned file influence the final number of rows.

Execute all the lazy operations and collect them into a DataFrame. Before execution the query is being optimized.

Example

use polars_core::prelude::*;
use polars_lazy::prelude::*;

fn example(df: DataFrame) -> Result<DataFrame> {
      df.lazy()
        .groupby(vec![col("foo")])
        .agg(vec!(col("bar").sum(),
                  col("ham").mean().alias("avg_ham")))
        .collect()
}

Filter by some predicate expression.

Example

use polars_core::prelude::*;
use polars_lazy::prelude::*;

fn example(df: DataFrame) -> LazyFrame {
      df.lazy()
        .filter(col("sepal.width").is_not_null())
        .select(&[col("sepal.width"), col("sepal.length")])
}

Select (and rename) columns from the query.

Columns can be selected with col; If you want to select all columns use col("*").

Example

use polars_core::prelude::*;
use polars_lazy::prelude::*;

/// This function selects column "foo" and column "bar".
/// Column "bar" is renamed to "ham".
fn example(df: DataFrame) -> LazyFrame {
      df.lazy()
        .select(&[col("foo"),
                  col("bar").alias("ham")])
}

/// This function selects all columns except "foo"
fn exclude_a_column(df: DataFrame) -> LazyFrame {
      df.lazy()
        .select(&[col("*"),
                  except("foo")])
}

Group by and aggregate.

Example

use polars_core::prelude::*;
use polars_lazy::prelude::*;

fn example(df: DataFrame) -> LazyFrame {
      df.lazy()
       .groupby(vec![col("date")])
       .agg(vec![
           col("rain").min(),
           col("rain").sum(),
           col("rain").quantile(0.5).alias("median_rain"),
       ])
       .sort("date", false)
}

Join query with other lazy query.

Example

use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn join_dataframes(ldf: LazyFrame, other: LazyFrame) -> LazyFrame {
        ldf
        .left_join(other, col("foo"), col("bar"))
}

Join query with other lazy query.

Example

use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn join_dataframes(ldf: LazyFrame, other: LazyFrame) -> LazyFrame {
        ldf
        .outer_join(other, col("foo"), col("bar"))
}

Join query with other lazy query.

Example

use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn join_dataframes(ldf: LazyFrame, other: LazyFrame) -> LazyFrame {
        ldf
        .inner_join(other, col("foo"), col("bar").cast(DataType::Utf8))
}

Generic join function that can join on multiple columns.

Example

use polars_core::prelude::*;
use polars_lazy::prelude::*;

fn example(ldf: LazyFrame, other: LazyFrame) -> LazyFrame {
        ldf
        .join(other, vec![col("foo"), col("bar")], vec![col("foo"), col("bar")], JoinType::Inner)
}

Add a column to a DataFrame

Example

use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn add_column(df: DataFrame) -> LazyFrame {
    df.lazy()
        .with_column(
            when(col("sepal.length").lt(lit(5.0)))
            .then(lit(10))
            .otherwise(lit(1))
            .alias("new_column_name"),
            )
}

Add multiple columns to a DataFrame.

Example

use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn add_columns(df: DataFrame) -> LazyFrame {
    df.lazy()
        .with_columns(
            vec![lit(10).alias("foo"), lit(100).alias("bar")]
         )
}

Aggregate all the columns as their maximum values.

Aggregate all the columns as their minimum values.

Aggregate all the columns as their sum values.

Aggregate all the columns as their mean values.

Aggregate all the columns as their median values.

Aggregate all the columns as their quantile values.

Aggregate all the columns as their standard deviation values.

Aggregate all the columns as their variance values.

Apply explode operation. See eager explode.

Drop duplicate rows. See eager.

Drop null rows.

Equal to LazyFrame::filter(col("*").is_not_null())

Slice the DataFrame.

Get the first row.

Get the last row

Get the n last rows

Melt the DataFrame from wide to long format

Limit the DataFrame to the first n rows. Note if you don’t want the rows to be scanned, use fetch.

Apply a function/closure once the logical plan get executed.

Warning

This can blow up in your face if the schema is changed due to the operation. The optimizer relies on a correct schema.

You can toggle certain optimizations off.

Trait Implementations

Returns a copy of the value. Read more

Performs copy-assignment from source. Read more

Returns the “default value” for a type. Read more

Performs the conversion.

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more

Immutably borrows from an owned value. Read more

Mutably borrows from an owned value. Read more

Performs the conversion.

Performs the conversion.

The alignment of pointer.

The type for initializers.

Initializes a with the given initializer. Read more

Dereferences the given pointer. Read more

Mutably dereferences the given pointer. Read more

Drops the object pointed to by the given pointer. Read more

The resulting type after obtaining ownership.

Creates owned data from borrowed data, usually by cloning. Read more

🔬 This is a nightly-only experimental API. (toowned_clone_into)

recently added

Uses borrowed data to replace owned data, usually by cloning. Read more

The type returned in the event of a conversion error.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.