Struct polars::frame::groupby::GroupBy [−][src]
pub struct GroupBy<'df, 'selection_str> { /* fields omitted */ }
Expand description
Returned by a groupby operation on a DataFrame. This struct supports several aggregations.
Until described otherwise, the examples in this struct are performed on the following DataFrame:
use polars_core::prelude::*; let dates = &[ "2020-08-21", "2020-08-21", "2020-08-22", "2020-08-23", "2020-08-22", ]; // date format let fmt = "%Y-%m-%d"; // create date series let s0 = Date32Chunked::parse_from_str_slice("date", dates, fmt) .into_series(); // create temperature series let s1 = Series::new("temp", [20, 10, 7, 9, 1].as_ref()); // create rain series let s2 = Series::new("rain", [0.2, 0.1, 0.3, 0.1, 0.01].as_ref()); // create a new DataFrame let df = DataFrame::new(vec![s0, s1, s2]).unwrap(); println!("{:?}", df);
Outputs:
+------------+------+------+
| date | temp | rain |
| --- | --- | --- |
| date32 | i32 | f64 |
+============+======+======+
| 2020-08-21 | 20 | 0.2 |
+------------+------+------+
| 2020-08-21 | 10 | 0.1 |
+------------+------+------+
| 2020-08-22 | 7 | 0.3 |
+------------+------+------+
| 2020-08-23 | 9 | 0.1 |
+------------+------+------+
| 2020-08-22 | 1 | 0.01 |
+------------+------+------+
Implementations
Pivot a column of the current DataFrame
and perform one of the following aggregations:
- first
- sum
- min
- max
- mean
- median
The pivot operation consists of a group by one, or multiple columns (these will be the new y-axis), column that will be pivoted (this will be the new x-axis) and an aggregation.
Panics
If the values column is not a numerical type, the code will panic.
Example
use polars_core::prelude::*; use polars_core::df; fn example() -> Result<DataFrame> { let df = df!("foo" => &["A", "A", "B", "B", "C"], "N" => &[1, 2, 2, 4, 2], "bar" => &["k", "l", "m", "n", "0"] )?; df.groupby("foo")? .pivot("bar", "N") .first() }
Transforms:
+-----+-----+-----+
| foo | N | bar |
| --- | --- | --- |
| str | i32 | str |
+=====+=====+=====+
| "A" | 1 | "k" |
+-----+-----+-----+
| "A" | 2 | "l" |
+-----+-----+-----+
| "B" | 2 | "m" |
+-----+-----+-----+
| "B" | 4 | "n" |
+-----+-----+-----+
| "C" | 2 | "o" |
+-----+-----+-----+
Into:
+-----+------+------+------+------+------+
| foo | o | n | m | l | k |
| --- | --- | --- | --- | --- | --- |
| str | i32 | i32 | i32 | i32 | i32 |
+=====+======+======+======+======+======+
| "A" | null | null | null | 2 | 1 |
+-----+------+------+------+------+------+
| "B" | null | 4 | 2 | null | null |
+-----+------+------+------+------+------+
| "C" | 2 | null | null | null | null |
+-----+------+------+------+------+------+
pub fn new(
df: &'df DataFrame,
by: Vec<Series, Global>,
groups: Vec<(u32, Vec<u32, Global>), Global>,
selected_agg: Option<Vec<&'selection_str str, Global>>
) -> GroupBy<'df, 'selection_str>
Select the column(s) that should be aggregated. You can select a single column or a slice of columns.
Note that making a selection with this method is not required. If you skip it all columns (except for the keys) will be selected for aggregation.
Get the internal representation of the GroupBy operation.
The Vec returned contains:
(first_idx, Vec
Get the internal representation of the GroupBy operation.
The Vec returned contains:
(first_idx, Vec
Aggregate grouped series and compute the mean per group.
Example
fn example(df: DataFrame) -> Result<DataFrame> { df.groupby("date")?.select(&["temp", "rain"]).mean() }
Returns:
+------------+-----------+-----------+
| date | temp_mean | rain_mean |
| --- | --- | --- |
| date32 | f64 | f64 |
+============+===========+===========+
| 2020-08-23 | 9 | 0.1 |
+------------+-----------+-----------+
| 2020-08-22 | 4 | 0.155 |
+------------+-----------+-----------+
| 2020-08-21 | 15 | 0.15 |
+------------+-----------+-----------+
Aggregate grouped series and compute the sum per group.
Example
fn example(df: DataFrame) -> Result<DataFrame> { df.groupby("date")?.select("temp").sum() }
Returns:
+------------+----------+
| date | temp_sum |
| --- | --- |
| date32 | i32 |
+============+==========+
| 2020-08-23 | 9 |
+------------+----------+
| 2020-08-22 | 8 |
+------------+----------+
| 2020-08-21 | 30 |
+------------+----------+
Aggregate grouped series and compute the minimal value per group.
Example
fn example(df: DataFrame) -> Result<DataFrame> { df.groupby("date")?.select("temp").min() }
Returns:
+------------+----------+
| date | temp_min |
| --- | --- |
| date32 | i32 |
+============+==========+
| 2020-08-23 | 9 |
+------------+----------+
| 2020-08-22 | 1 |
+------------+----------+
| 2020-08-21 | 10 |
+------------+----------+
Aggregate grouped series and compute the maximum value per group.
Example
fn example(df: DataFrame) -> Result<DataFrame> { df.groupby("date")?.select("temp").max() }
Returns:
+------------+----------+
| date | temp_max |
| --- | --- |
| date32 | i32 |
+============+==========+
| 2020-08-23 | 9 |
+------------+----------+
| 2020-08-22 | 7 |
+------------+----------+
| 2020-08-21 | 20 |
+------------+----------+
Aggregate grouped Series
and find the first value per group.
Example
fn example(df: DataFrame) -> Result<DataFrame> { df.groupby("date")?.select("temp").first() }
Returns:
+------------+------------+
| date | temp_first |
| --- | --- |
| date32 | i32 |
+============+============+
| 2020-08-23 | 9 |
+------------+------------+
| 2020-08-22 | 7 |
+------------+------------+
| 2020-08-21 | 20 |
+------------+------------+
Aggregate grouped Series
and return the last value per group.
Example
fn example(df: DataFrame) -> Result<DataFrame> { df.groupby("date")?.select("temp").last() }
Returns:
+------------+------------+
| date | temp_last |
| --- | --- |
| date32 | i32 |
+============+============+
| 2020-08-23 | 9 |
+------------+------------+
| 2020-08-22 | 1 |
+------------+------------+
| 2020-08-21 | 10 |
+------------+------------+
Aggregate grouped Series
by counting the number of unique values.
Example
fn example(df: DataFrame) -> Result<DataFrame> { df.groupby("date")?.select("temp").n_unique() }
Returns:
+------------+---------------+
| date | temp_n_unique |
| --- | --- |
| date32 | u32 |
+============+===============+
| 2020-08-23 | 1 |
+------------+---------------+
| 2020-08-22 | 2 |
+------------+---------------+
| 2020-08-21 | 2 |
+------------+---------------+
Aggregate grouped Series
and determine the quantile per group.
Example
fn example(df: DataFrame) -> Result<DataFrame> { df.groupby("date")?.select("temp").quantile(0.2) }
Aggregate grouped Series
and determine the median per group.
Example
fn example(df: DataFrame) -> Result<DataFrame> { df.groupby("date")?.select("temp").median() }
Aggregate grouped Series
and determine the variance per group.
Aggregate grouped Series
and determine the standard deviation per group.
Aggregate grouped series and compute the number of values per group.
Example
fn example(df: DataFrame) -> Result<DataFrame> { df.groupby("date")?.select("temp").count() }
Returns:
+------------+------------+
| date | temp_count |
| --- | --- |
| date32 | u32 |
+============+============+
| 2020-08-23 | 1 |
+------------+------------+
| 2020-08-22 | 2 |
+------------+------------+
| 2020-08-21 | 2 |
+------------+------------+
Get the groupby group indexes.
Example
fn example(df: DataFrame) -> Result<DataFrame> { df.groupby("date")?.groups() }
Returns:
+--------------+------------+
| date | groups |
| --- | --- |
| date32(days) | list [u32] |
+==============+============+
| 2020-08-23 | "[3]" |
+--------------+------------+
| 2020-08-22 | "[2, 4]" |
+--------------+------------+
| 2020-08-21 | "[0, 1]" |
+--------------+------------+
Combine different aggregations on columns
Operations
- count
- first
- last
- sum
- min
- max
- mean
- median
Example
fn example(df: DataFrame) -> Result<DataFrame> { df.groupby("date")?.agg(&[("temp", &["n_unique", "sum", "min"])]) }
Returns:
+--------------+---------------+----------+----------+
| date | temp_n_unique | temp_sum | temp_min |
| --- | --- | --- | --- |
| date32(days) | u32 | i32 | i32 |
+==============+===============+==========+==========+
| 2020-08-23 | 1 | 9 | 9 |
+--------------+---------------+----------+----------+
| 2020-08-22 | 2 | 8 | 1 |
+--------------+---------------+----------+----------+
| 2020-08-21 | 2 | 30 | 10 |
+--------------+---------------+----------+----------+
Aggregate the groups of the groupby operation into lists.
Example
fn example(df: DataFrame) -> Result<DataFrame> { // GroupBy and aggregate to Lists df.groupby("date")?.select("temp").agg_list() }
Returns:
+------------+------------------------+
| date | temp_agg_list |
| --- | --- |
| date32 | list [i32] |
+============+========================+
| 2020-08-23 | "[Some(9)]" |
+------------+------------------------+
| 2020-08-22 | "[Some(7), Some(1)]" |
+------------+------------------------+
| 2020-08-21 | "[Some(20), Some(10)]" |
+------------+------------------------+
Trait Implementations
Auto Trait Implementations
impl<'df, 'selection_str> !RefUnwindSafe for GroupBy<'df, 'selection_str>
impl<'df, 'selection_str> !UnwindSafe for GroupBy<'df, 'selection_str>
Blanket Implementations
Mutably borrows from an owned value. Read more
pub fn vzip(self) -> V