polars_lazy/lib.rs
1//! Lazy API of Polars
2//!
3//! The lazy API of Polars supports a subset of the eager API. Apart from the distributed compute,
4//! it is very similar to [Apache Spark](https://spark.apache.org/). You write queries in a
5//! domain specific language. These queries translate to a logical plan, which represent your query steps.
6//! Before execution this logical plan is optimized and may change the order of operations if this will increase performance.
7//! Or implicit type casts may be added such that execution of the query won't lead to a type error (if it can be resolved).
8//!
9//! # Lazy DSL
10//!
11//! The lazy API of polars replaces the eager [`DataFrame`] with the [`LazyFrame`], through which
12//! the lazy API is exposed.
13//! The [`LazyFrame`] represents a logical execution plan: a sequence of operations to perform on a concrete data source.
14//! These operations are not executed until we call [`collect`].
15//! This allows polars to optimize/reorder the query which may lead to faster queries or fewer type errors.
16//!
17//! [`DataFrame`]: polars_core::frame::DataFrame
18//! [`LazyFrame`]: crate::frame::LazyFrame
19//! [`collect`]: crate::frame::LazyFrame::collect
20//!
21//! In general, a [`LazyFrame`] requires a concrete data source — a [`DataFrame`], a file on disk, etc. — which polars-lazy
22//! then applies the user-specified sequence of operations to.
23//! To obtain a [`LazyFrame`] from an existing [`DataFrame`], we call the [`lazy`](crate::frame::IntoLazy::lazy) method on
24//! the [`DataFrame`].
25//! A [`LazyFrame`] can also be obtained through the lazy versions of file readers, such as [`LazyCsvReader`](crate::frame::LazyCsvReader).
26//!
27//! The other major component of the polars lazy API is [`Expr`](crate::dsl::Expr), which represents an operation to be
28//! performed on a [`LazyFrame`], such as mapping over a column, filtering, or groupby-aggregation.
29//! [`Expr`] and the functions that produce them can be found in the [dsl module](crate::dsl).
30//!
31//! [`Expr`]: crate::dsl::Expr
32//!
33//! Most operations on a [`LazyFrame`] consume the [`LazyFrame`] and return a new [`LazyFrame`] with the updated plan.
34//! If you need to use the same [`LazyFrame`] multiple times, you should [`clone`](crate::frame::LazyFrame::clone) it, and optionally
35//! [`cache`](crate::frame::LazyFrame::cache) it beforehand.
36//!
37//! ## Examples
38//!
39//! #### Adding a new column to a lazy DataFrame
40//!
41//!```rust
42//! #[macro_use] extern crate polars_core;
43//! use polars_core::prelude::*;
44//! use polars_lazy::prelude::*;
45//!
46//! let df = df! {
47//! "column_a" => &[1, 2, 3, 4, 5],
48//! "column_b" => &["a", "b", "c", "d", "e"]
49//! }.unwrap();
50//!
51//! let new = df.lazy()
52//! // Note the reverse here!!
53//! .reverse()
54//! .with_column(
55//! // always rename a new column
56//! (col("column_a") * lit(10)).alias("new_column")
57//! )
58//! .collect()
59//! .unwrap();
60//!
61//! assert!(new.column("new_column")
62//! .unwrap()
63//! .equals(
64//! &Column::new("new_column".into(), &[50, 40, 30, 20, 10])
65//! )
66//! );
67//! ```
68//! #### Modifying a column based on some predicate
69//!
70//!```rust
71//! #[macro_use] extern crate polars_core;
72//! use polars_core::prelude::*;
73//! use polars_lazy::prelude::*;
74//!
75//! let df = df! {
76//! "column_a" => &[1, 2, 3, 4, 5],
77//! "column_b" => &["a", "b", "c", "d", "e"]
78//! }.unwrap();
79//!
80//! let new = df.lazy()
81//! .with_column(
82//! // value = 100 if x < 3 else x
83//! when(
84//! col("column_a").lt(lit(3))
85//! ).then(
86//! lit(100)
87//! ).otherwise(
88//! col("column_a")
89//! ).alias("new_column")
90//! )
91//! .collect()
92//! .unwrap();
93//!
94//! assert!(new.column("new_column")
95//! .unwrap()
96//! .equals(
97//! &Column::new("new_column".into(), &[100, 100, 3, 4, 5])
98//! )
99//! );
100//! ```
101//! #### Groupby + Aggregations
102//!
103//!```rust
104//! use polars_core::prelude::*;
105//! use polars_core::df;
106//! use polars_lazy::prelude::*;
107//! use arrow::legacy::prelude::QuantileMethod;
108//!
109//! fn example() -> PolarsResult<DataFrame> {
110//! let df = df!(
111//! "date" => ["2020-08-21", "2020-08-21", "2020-08-22", "2020-08-23", "2020-08-22"],
112//! "temp" => [20, 10, 7, 9, 1],
113//! "rain" => [0.2, 0.1, 0.3, 0.1, 0.01]
114//! )?;
115//!
116//! df.lazy()
117//! .group_by([col("date")])
118//! .agg([
119//! col("rain").min().alias("min_rain"),
120//! col("rain").sum().alias("sum_rain"),
121//! col("rain").quantile(lit(0.5), QuantileMethod::Nearest).alias("median_rain"),
122//! ])
123//! .sort(["date"], Default::default())
124//! .collect()
125//! }
126//! ```
127//!
128//! #### Calling any function
129//!
130//! Below we lazily call a custom closure of type `Series => Result<Series>`. Because the closure
131//! changes the type/variant of the Series we also define the return type. This is important because
132//! due to the laziness the types should be known beforehand. Note that by applying these custom
133//! functions you have access to the whole **eager API** of the Series/ChunkedArrays.
134//!
135//!```rust
136//! #[macro_use] extern crate polars_core;
137//! use polars_core::prelude::*;
138//! use polars_lazy::prelude::*;
139//!
140//! let df = df! {
141//! "column_a" => &[1, 2, 3, 4, 5],
142//! "column_b" => &["a", "b", "c", "d", "e"]
143//! }.unwrap();
144//!
145//! let new = df.lazy()
146//! .with_column(
147//! col("column_a")
148//! // apply a custom closure Series => Result<Series>
149//! .map(|_s| {
150//! Ok(Some(Column::new("".into(), &[6.0f32, 6.0, 6.0, 6.0, 6.0])))
151//! },
152//! // return type of the closure
153//! GetOutput::from_type(DataType::Float64)).alias("new_column")
154//! )
155//! .collect()
156//! .unwrap();
157//! ```
158//!
159//! #### Joins, filters and projections
160//!
161//! In the query below we do a lazy join and afterwards we filter rows based on the predicate `a < 2`.
162//! And last we select the columns `"b"` and `"c_first"`. In an eager API this query would be very
163//! suboptimal because we join on DataFrames with more columns and rows than needed. In this case
164//! the query optimizer will do the selection of the columns (projection) and the filtering of the
165//! rows (selection) before the join, thereby reducing the amount of work done by the query.
166//!
167//! ```rust
168//! # use polars_core::prelude::*;
169//! # use polars_lazy::prelude::*;
170//!
171//! fn example(df_a: DataFrame, df_b: DataFrame) -> LazyFrame {
172//! df_a.lazy()
173//! .left_join(df_b.lazy(), col("b_left"), col("b_right"))
174//! .filter(
175//! col("a").lt(lit(2))
176//! )
177//! .group_by([col("b")])
178//! .agg(
179//! vec![col("b").first().alias("first_b"), col("c").first().alias("first_c")]
180//! )
181//! .select(&[col("b"), col("c_first")])
182//! }
183//! ```
184//!
185//! If we want to do an aggregation on all columns we can use the wildcard operator `*` to achieve this.
186//!
187//! ```rust
188//! # use polars_core::prelude::*;
189//! # use polars_lazy::prelude::*;
190//!
191//! fn aggregate_all_columns(df_a: DataFrame) -> LazyFrame {
192//! df_a.lazy()
193//! .group_by([col("b")])
194//! .agg(
195//! vec![col("*").first()]
196//! )
197//! }
198//! ```
199#![allow(ambiguous_glob_reexports)]
200#![cfg_attr(docsrs, feature(doc_auto_cfg))]
201extern crate core;
202
203#[cfg(feature = "dot_diagram")]
204mod dot;
205pub mod dsl;
206pub mod frame;
207pub mod physical_plan;
208pub mod prelude;
209
210mod scan;
211#[cfg(test)]
212mod tests;