Struct polars::frame::groupby::GroupBy[][src]

pub struct GroupBy<'df, 'selection_str> { /* fields omitted */ }
Expand description

Returned by a groupby operation on a DataFrame. This struct supports several aggregations.

Until described otherwise, the examples in this struct are performed on the following DataFrame:

use polars_core::prelude::*;

let dates = &[
"2020-08-21",
"2020-08-21",
"2020-08-22",
"2020-08-23",
"2020-08-22",
];
// date format
let fmt = "%Y-%m-%d";
// create date series
let s0 = Date32Chunked::parse_from_str_slice("date", dates, fmt)
        .into_series();
// create temperature series
let s1 = Series::new("temp", [20, 10, 7, 9, 1].as_ref());
// create rain series
let s2 = Series::new("rain", [0.2, 0.1, 0.3, 0.1, 0.01].as_ref());
// create a new DataFrame
let df = DataFrame::new(vec![s0, s1, s2]).unwrap();
println!("{:?}", df);

Outputs:

+------------+------+------+
| date       | temp | rain |
| ---        | ---  | ---  |
| date32     | i32  | f64  |
+============+======+======+
| 2020-08-21 | 20   | 0.2  |
+------------+------+------+
| 2020-08-21 | 10   | 0.1  |
+------------+------+------+
| 2020-08-22 | 7    | 0.3  |
+------------+------+------+
| 2020-08-23 | 9    | 0.1  |
+------------+------+------+
| 2020-08-22 | 1    | 0.01 |
+------------+------+------+

Implementations

Pivot a column of the current DataFrame and perform one of the following aggregations:

  • first
  • sum
  • min
  • max
  • mean
  • median

The pivot operation consists of a group by one, or multiple columns (these will be the new y-axis), column that will be pivoted (this will be the new x-axis) and an aggregation.

Panics

If the values column is not a numerical type, the code will panic.

Example

use polars_core::prelude::*;
use polars_core::df;

fn example() -> Result<DataFrame> {
    let df = df!("foo" => &["A", "A", "B", "B", "C"],
        "N" => &[1, 2, 2, 4, 2],
        "bar" => &["k", "l", "m", "n", "0"]
        )?;

    df.groupby("foo")?
    .pivot("bar", "N")
    .first()
}

Transforms:

+-----+-----+-----+
| foo | N   | bar |
| --- | --- | --- |
| str | i32 | str |
+=====+=====+=====+
| "A" | 1   | "k" |
+-----+-----+-----+
| "A" | 2   | "l" |
+-----+-----+-----+
| "B" | 2   | "m" |
+-----+-----+-----+
| "B" | 4   | "n" |
+-----+-----+-----+
| "C" | 2   | "o" |
+-----+-----+-----+

Into:

+-----+------+------+------+------+------+
| foo | o    | n    | m    | l    | k    |
| --- | ---  | ---  | ---  | ---  | ---  |
| str | i32  | i32  | i32  | i32  | i32  |
+=====+======+======+======+======+======+
| "A" | null | null | null | 2    | 1    |
+-----+------+------+------+------+------+
| "B" | null | 4    | 2    | null | null |
+-----+------+------+------+------+------+
| "C" | 2    | null | null | null | null |
+-----+------+------+------+------+------+

Select the column(s) that should be aggregated. You can select a single column or a slice of columns.

Note that making a selection with this method is not required. If you skip it all columns (except for the keys) will be selected for aggregation.

Get the internal representation of the GroupBy operation. The Vec returned contains: (first_idx, Vec) Where second value in the tuple is a vector with all matching indexes.

Get the internal representation of the GroupBy operation. The Vec returned contains: (first_idx, Vec) Where second value in the tuple is a vector with all matching indexes.

Aggregate grouped series and compute the mean per group.

Example

fn example(df: DataFrame) -> Result<DataFrame> {
    df.groupby("date")?.select(&["temp", "rain"]).mean()
}

Returns:

+------------+-----------+-----------+
| date       | temp_mean | rain_mean |
| ---        | ---       | ---       |
| date32     | f64       | f64       |
+============+===========+===========+
| 2020-08-23 | 9         | 0.1       |
+------------+-----------+-----------+
| 2020-08-22 | 4         | 0.155     |
+------------+-----------+-----------+
| 2020-08-21 | 15        | 0.15      |
+------------+-----------+-----------+

Aggregate grouped series and compute the sum per group.

Example

fn example(df: DataFrame) -> Result<DataFrame> {
    df.groupby("date")?.select("temp").sum()
}

Returns:

+------------+----------+
| date       | temp_sum |
| ---        | ---      |
| date32     | i32      |
+============+==========+
| 2020-08-23 | 9        |
+------------+----------+
| 2020-08-22 | 8        |
+------------+----------+
| 2020-08-21 | 30       |
+------------+----------+

Aggregate grouped series and compute the minimal value per group.

Example

fn example(df: DataFrame) -> Result<DataFrame> {
    df.groupby("date")?.select("temp").min()
}

Returns:

+------------+----------+
| date       | temp_min |
| ---        | ---      |
| date32     | i32      |
+============+==========+
| 2020-08-23 | 9        |
+------------+----------+
| 2020-08-22 | 1        |
+------------+----------+
| 2020-08-21 | 10       |
+------------+----------+

Aggregate grouped series and compute the maximum value per group.

Example

fn example(df: DataFrame) -> Result<DataFrame> {
    df.groupby("date")?.select("temp").max()
}

Returns:

+------------+----------+
| date       | temp_max |
| ---        | ---      |
| date32     | i32      |
+============+==========+
| 2020-08-23 | 9        |
+------------+----------+
| 2020-08-22 | 7        |
+------------+----------+
| 2020-08-21 | 20       |
+------------+----------+

Aggregate grouped Series and find the first value per group.

Example

fn example(df: DataFrame) -> Result<DataFrame> {
    df.groupby("date")?.select("temp").first()
}

Returns:

+------------+------------+
| date       | temp_first |
| ---        | ---        |
| date32     | i32        |
+============+============+
| 2020-08-23 | 9          |
+------------+------------+
| 2020-08-22 | 7          |
+------------+------------+
| 2020-08-21 | 20         |
+------------+------------+

Aggregate grouped Series and return the last value per group.

Example

fn example(df: DataFrame) -> Result<DataFrame> {
    df.groupby("date")?.select("temp").last()
}

Returns:

+------------+------------+
| date       | temp_last |
| ---        | ---        |
| date32     | i32        |
+============+============+
| 2020-08-23 | 9          |
+------------+------------+
| 2020-08-22 | 1          |
+------------+------------+
| 2020-08-21 | 10         |
+------------+------------+

Aggregate grouped Series by counting the number of unique values.

Example

fn example(df: DataFrame) -> Result<DataFrame> {
    df.groupby("date")?.select("temp").n_unique()
}

Returns:

+------------+---------------+
| date       | temp_n_unique |
| ---        | ---           |
| date32     | u32           |
+============+===============+
| 2020-08-23 | 1             |
+------------+---------------+
| 2020-08-22 | 2             |
+------------+---------------+
| 2020-08-21 | 2             |
+------------+---------------+

Aggregate grouped Series and determine the quantile per group.

Example

fn example(df: DataFrame) -> Result<DataFrame> {
    df.groupby("date")?.select("temp").quantile(0.2)
}

Aggregate grouped Series and determine the median per group.

Example

fn example(df: DataFrame) -> Result<DataFrame> {
    df.groupby("date")?.select("temp").median()
}

Aggregate grouped Series and determine the variance per group.

Aggregate grouped Series and determine the standard deviation per group.

Aggregate grouped series and compute the number of values per group.

Example

fn example(df: DataFrame) -> Result<DataFrame> {
    df.groupby("date")?.select("temp").count()
}

Returns:

+------------+------------+
| date       | temp_count |
| ---        | ---        |
| date32     | u32        |
+============+============+
| 2020-08-23 | 1          |
+------------+------------+
| 2020-08-22 | 2          |
+------------+------------+
| 2020-08-21 | 2          |
+------------+------------+

Get the groupby group indexes.

Example

fn example(df: DataFrame) -> Result<DataFrame> {
    df.groupby("date")?.groups()
}

Returns:

+--------------+------------+
| date         | groups     |
| ---          | ---        |
| date32(days) | list [u32] |
+==============+============+
| 2020-08-23   | "[3]"      |
+--------------+------------+
| 2020-08-22   | "[2, 4]"   |
+--------------+------------+
| 2020-08-21   | "[0, 1]"   |
+--------------+------------+

Combine different aggregations on columns

Operations

  • count
  • first
  • last
  • sum
  • min
  • max
  • mean
  • median

Example

fn example(df: DataFrame) -> Result<DataFrame> {
    df.groupby("date")?.agg(&[("temp", &["n_unique", "sum", "min"])])
}

Returns:

+--------------+---------------+----------+----------+
| date         | temp_n_unique | temp_sum | temp_min |
| ---          | ---           | ---      | ---      |
| date32(days) | u32           | i32      | i32      |
+==============+===============+==========+==========+
| 2020-08-23   | 1             | 9        | 9        |
+--------------+---------------+----------+----------+
| 2020-08-22   | 2             | 8        | 1        |
+--------------+---------------+----------+----------+
| 2020-08-21   | 2             | 30       | 10       |
+--------------+---------------+----------+----------+

Aggregate the groups of the groupby operation into lists.

Example

fn example(df: DataFrame) -> Result<DataFrame> {
    // GroupBy and aggregate to Lists
    df.groupby("date")?.select("temp").agg_list()
}

Returns:

+------------+------------------------+
| date       | temp_agg_list          |
| ---        | ---                    |
| date32     | list [i32]             |
+============+========================+
| 2020-08-23 | "[Some(9)]"            |
+------------+------------------------+
| 2020-08-22 | "[Some(7), Some(1)]"   |
+------------+------------------------+
| 2020-08-21 | "[Some(20), Some(10)]" |
+------------+------------------------+

Apply a closure over the groups as a new DataFrame.

Trait Implementations

Returns a copy of the value. Read more

Performs copy-assignment from source. Read more

Formats the value using the given formatter. Read more

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more

Immutably borrows from an owned value. Read more

Mutably borrows from an owned value. Read more

Performs the conversion.

Performs the conversion.

The alignment of pointer.

The type for initializers.

Initializes a with the given initializer. Read more

Dereferences the given pointer. Read more

Mutably dereferences the given pointer. Read more

Drops the object pointed to by the given pointer. Read more

The resulting type after obtaining ownership.

Creates owned data from borrowed data, usually by cloning. Read more

🔬 This is a nightly-only experimental API. (toowned_clone_into)

recently added

Uses borrowed data to replace owned data, usually by cloning. Read more

The type returned in the event of a conversion error.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.