pub struct DataFrame { /* private fields */ }
Expand description

A contiguous growable collection of Series that have the same length.

Use declarations

All the common tools can be found in crate::prelude (or in polars::prelude).

use polars_core::prelude::*; // if the crate polars-core is used directly
// use polars::prelude::*;      if the crate polars is used

Initialization

Default

A DataFrame can be initialized empty:

let df = DataFrame::default();
assert!(df.is_empty());

Wrapping a Vec<Series>

A DataFrame is built upon a Vec<Series> where the Series have the same length.

let s1 = Series::new("Fruit", &["Apple", "Apple", "Pear"]);
let s2 = Series::new("Color", &["Red", "Yellow", "Green"]);

let df: Result<DataFrame> = DataFrame::new(vec![s1, s2]);

Using a macro

The df! macro is a convenient method:

let df: Result<DataFrame> = df!("Fruit" => &["Apple", "Apple", "Pear"],
                                "Color" => &["Red", "Yellow", "Green"]);

Using a CSV file

See the polars_io::csv::CsvReader.

Indexing

By a number

The Index<usize> is implemented for the DataFrame.

let df = df!("Fruit" => &["Apple", "Apple", "Pear"],
             "Color" => &["Red", "Yellow", "Green"])?;

assert_eq!(df[0], Series::new("Fruit", &["Apple", "Apple", "Pear"]));
assert_eq!(df[1], Series::new("Color", &["Red", "Yellow", "Green"]));

By a Series name

let df = df!("Fruit" => &["Apple", "Apple", "Pear"],
             "Color" => &["Red", "Yellow", "Green"])?;

assert_eq!(df["Fruit"], Series::new("Fruit", &["Apple", "Apple", "Pear"]));
assert_eq!(df["Color"], Series::new("Color", &["Red", "Yellow", "Green"]));

Implementations

Create a 2D ndarray::Array from this DataFrame. This requires all columns in the DataFrame to be non-null and numeric. They will be casted to the same data type (if they aren’t already).

For floating point data we implicitly convert None to NaN without failure.

use polars_core::prelude::*;
let a = UInt32Chunked::new("a", &[1, 2, 3]).into_series();
let b = Float64Chunked::new("b", &[10., 8., 6.]).into_series();

let df = DataFrame::new(vec![a, b]).unwrap();
let ndarray = df.to_ndarray::<Float64Type>().unwrap();
println!("{:?}", ndarray);

Outputs:

[[1.0, 10.0],
 [2.0, 8.0],
 [3.0, 6.0]], shape=[3, 2], strides=[2, 1], layout=C (0x1), const ndim=2/

Sample n datapoints from this DataFrame.

Sample a fraction between 0.0-1.0 of this DataFrame.

This is similar to a left-join except that we match on nearest key rather than equal keys. The keys must be sorted to perform an asof join. This is a special implementation of an asof join that searches for the nearest keys within a subgroup set by by.

This is similar to a left-join except that we match on nearest key rather than equal keys. The keys must be sorted to perform an asof join

Creates the cartesian product from both frames, preserves the order of the left keys.

Explode DataFrame to long format by exploding a column with Lists.

Example
let s0 = Series::new("a", &[1i64, 2, 3]);
let s1 = Series::new("b", &[1i64, 1, 1]);
let s2 = Series::new("c", &[2i64, 2, 2]);
let list = Series::new("foo", &[s0, s1, s2]);

let s0 = Series::new("B", [1, 2, 3]);
let s1 = Series::new("C", [1, 1, 1]);
let df = DataFrame::new(vec![list, s0, s1])?;
let exploded = df.explode(["foo"])?;

println!("{:?}", df);
println!("{:?}", exploded);

Outputs:

 +-------------+-----+-----+
 | foo         | B   | C   |
 | ---         | --- | --- |
 | list [i64]  | i32 | i32 |
 +=============+=====+=====+
 | "[1, 2, 3]" | 1   | 1   |
 +-------------+-----+-----+
 | "[1, 1, 1]" | 2   | 1   |
 +-------------+-----+-----+
 | "[2, 2, 2]" | 3   | 1   |
 +-------------+-----+-----+

 +-----+-----+-----+
 | foo | B   | C   |
 | --- | --- | --- |
 | i64 | i32 | i32 |
 +=====+=====+=====+
 | 1   | 1   | 1   |
 +-----+-----+-----+
 | 2   | 1   | 1   |
 +-----+-----+-----+
 | 3   | 1   | 1   |
 +-----+-----+-----+
 | 1   | 2   | 1   |
 +-----+-----+-----+
 | 1   | 2   | 1   |
 +-----+-----+-----+
 | 1   | 2   | 1   |
 +-----+-----+-----+
 | 2   | 3   | 1   |
 +-----+-----+-----+
 | 2   | 3   | 1   |
 +-----+-----+-----+
 | 2   | 3   | 1   |
 +-----+-----+-----+

Unpivot a DataFrame from wide to long format.

Example
Arguments
  • id_vars - String slice that represent the columns to use as id variables.
  • value_vars - String slice that represent the columns to use as value variables.

If value_vars is empty all columns that are not in id_vars will be used.

let df = df!("A" => &["a", "b", "a"],
             "B" => &[1, 3, 5],
             "C" => &[10, 11, 12],
             "D" => &[2, 4, 6]
    )?;

let melted = df.melt(&["A", "B"], &["C", "D"])?;
println!("{:?}", df);
println!("{:?}", melted);

Outputs:

 +-----+-----+-----+-----+
 | A   | B   | C   | D   |
 | --- | --- | --- | --- |
 | str | i32 | i32 | i32 |
 +=====+=====+=====+=====+
 | "a" | 1   | 10  | 2   |
 +-----+-----+-----+-----+
 | "b" | 3   | 11  | 4   |
 +-----+-----+-----+-----+
 | "a" | 5   | 12  | 6   |
 +-----+-----+-----+-----+

 +-----+-----+----------+-------+
 | A   | B   | variable | value |
 | --- | --- | ---      | ---   |
 | str | i32 | str      | i32   |
 +=====+=====+==========+=======+
 | "a" | 1   | "C"      | 10    |
 +-----+-----+----------+-------+
 | "b" | 3   | "C"      | 11    |
 +-----+-----+----------+-------+
 | "a" | 5   | "C"      | 12    |
 +-----+-----+----------+-------+
 | "a" | 1   | "D"      | 2     |
 +-----+-----+----------+-------+
 | "b" | 3   | "D"      | 4     |
 +-----+-----+----------+-------+
 | "a" | 5   | "D"      | 6     |
 +-----+-----+----------+-------+

Similar to melt, but without generics. This may be easier if you want to pass an empty id_vars or empty value_vars.

Do a pivot operation based on the group key, a pivot column and an aggregation function on the values column.

Note

Polars’/arrow memory is not ideal for transposing operations like pivots. If you have a relatively large table, consider using a groupby over a pivot.

Group DataFrame using a Series column.

Example
use polars_core::prelude::*;
fn groupby_sum(df: &DataFrame) -> Result<DataFrame> {
    df.groupby(["column_name"])?
    .select(["agg_column_name"])
    .sum()
}

Group DataFrame using a Series column. The groups are ordered by their smallest row index.

Generic join method. Can be used to join on multiple columns.

Example
let df1: DataFrame = df!("Fruit" => &["Apple", "Banana", "Pear"],
                         "Phosphorus (mg/100g)" => &[11, 22, 12])?;
let df2: DataFrame = df!("Name" => &["Apple", "Banana", "Pear"],
                         "Potassium (mg/100g)" => &[107, 358, 115])?;

let df3: DataFrame = df1.join(&df2, ["Fruit"], ["Name"], JoinType::Inner, None)?;
assert_eq!(df3.shape(), (3, 3));
println!("{}", df3);

Output:

shape: (3, 3)
+--------+----------------------+---------------------+
| Fruit  | Phosphorus (mg/100g) | Potassium (mg/100g) |
| ---    | ---                  | ---                 |
| str    | i32                  | i32                 |
+========+======================+=====================+
| Apple  | 11                   | 107                 |
+--------+----------------------+---------------------+
| Banana | 22                   | 358                 |
+--------+----------------------+---------------------+
| Pear   | 12                   | 115                 |
+--------+----------------------+---------------------+

Perform an inner join on two DataFrames.

Example
fn join_dfs(left: &DataFrame, right: &DataFrame) -> Result<DataFrame> {
    left.inner_join(right, ["join_column_left"], ["join_column_right"])
}

Perform a left join on two DataFrames

Example
let df1: DataFrame = df!("Wavelength (nm)" => &[480.0, 650.0, 577.0, 1201.0, 100.0])?;
let df2: DataFrame = df!("Color" => &["Blue", "Yellow", "Red"],
                         "Wavelength nm" => &[480.0, 577.0, 650.0])?;

let df3: DataFrame = df1.left_join(&df2, ["Wavelength (nm)"], ["Wavelength nm"])?;
println!("{:?}", df3);

Output:

shape: (5, 2)
+-----------------+--------+
| Wavelength (nm) | Color  |
| ---             | ---    |
| f64             | str    |
+=================+========+
| 480             | Blue   |
+-----------------+--------+
| 650             | Red    |
+-----------------+--------+
| 577             | Yellow |
+-----------------+--------+
| 1201            | null   |
+-----------------+--------+
| 100             | null   |
+-----------------+--------+

Perform an outer join on two DataFrames

Example
fn join_dfs(left: &DataFrame, right: &DataFrame) -> Result<DataFrame> {
    left.outer_join(right, ["join_column_left"], ["join_column_right"])
}

Get a row from a DataFrame. Use of this is discouraged as it will likely be slow.

Amortize allocations by reusing a row. The caller is responsible to make sure that the row has at least the capacity for the number of columns in the DataFrame

Amortize allocations by reusing a row. The caller is responsible to make sure that the row has at least the capacity for the number of columns in the DataFrame

Safety

Does not do any bounds checking.

Create a new DataFrame from rows. This should only be used when you have row wise data, as this is a lot slower than creating the Series in a columnar fashion

Create a new DataFrame from an iterator over rows. This should only be used when you have row wise data, as this is a lot slower than creating the Series in a columnar fashion

Create a new DataFrame from rows. This should only be used when you have row wise data, as this is a lot slower than creating the Series in a columnar fashion

Transpose a DataFrame. This is a very expensive operation.

Returns an estimation of the total (heap) allocated size of the DataFrame in bytes.

Implementation

This estimation is the sum of the size of its buffers, validity, including nested arrays. Multiple arrays may share buffers and bitmaps. Therefore, the size of 2 arrays is not the sum of the sizes computed from this function. In particular, StructArray’s size is an upper bound.

When an array is sliced, its allocated size remains constant because the buffer unchanged. However, this function will yield a smaller number. This is because this function returns the visible size of the buffer, not its total capacity.

FFI buffers are included in this estimation.

Create a DataFrame from a Vector of Series.

Example
let s0 = Series::new("days", [0, 1, 2].as_ref());
let s1 = Series::new("temp", [22.1, 19.9, 7.].as_ref());

let df = DataFrame::new(vec![s0, s1])?;

Removes the last Series from the DataFrame and returns it, or None if it is empty.

Example
let s1 = Series::new("Ocean", &["Atlantic", "Indian"]);
let s2 = Series::new("Area (km²)", &[106_460_000, 70_560_000]);
let mut df = DataFrame::new(vec![s1.clone(), s2.clone()])?;

assert_eq!(df.pop(), Some(s2));
assert_eq!(df.pop(), Some(s1));
assert_eq!(df.pop(), None);
assert!(df.is_empty());

Add a new column at index 0 that counts the rows.

Example
let df1: DataFrame = df!("Name" => &["James", "Mary", "John", "Patricia"])?;
assert_eq!(df1.shape(), (4, 1));

let df2: DataFrame = df1.with_row_count("Id", None)?;
assert_eq!(df2.shape(), (4, 2));
println!("{}", df2);

Output:

 shape: (4, 2)
 +-----+----------+
 | Id  | Name     |
 | --- | ---      |
 | u32 | str      |
 +=====+==========+
 | 0   | James    |
 +-----+----------+
 | 1   | Mary     |
 +-----+----------+
 | 2   | John     |
 +-----+----------+
 | 3   | Patricia |
 +-----+----------+

Add a row count in place.

Create a new DataFrame but does not check the length or duplicate occurrence of the Series.

It is advised to use Series::new in favor of this method.

Panic

It is the callers responsibility to uphold the contract of all Series having an equal length, if not this may panic down the line.

Aggregate all chunks to contiguous memory.

Shrink the capacity of this DataFrame to fit it’s length.

Aggregate all the chunks in the DataFrame to a single chunk.

Aggregate all the chunks in the DataFrame to a single chunk in parallel. This may lead to more peak memory consumption.

Estimates of the DataFrames columns consist of the same chunk sizes

Ensure all the chunks in the DataFrame are aligned.

Get the DataFrame schema.

Example
let df: DataFrame = df!("Thing" => &["Observable universe", "Human stupidity"],
                        "Diameter (m)" => &[8.8e26, f64::INFINITY])?;

let f1: Field = Field::new("Thing", DataType::Utf8);
let f2: Field = Field::new("Diameter (m)", DataType::Float64);
let sc: Schema = Schema::from(vec![f1, f2]);

assert_eq!(df.schema(), sc);

Get a reference to the DataFrame columns.

Example
let df: DataFrame = df!("Name" => &["Adenine", "Cytosine", "Guanine", "Thymine"],
                        "Symbol" => &["A", "C", "G", "T"])?;
let columns: &Vec<Series> = df.get_columns();

assert_eq!(columns[0].name(), "Name");
assert_eq!(columns[1].name(), "Symbol");

Iterator over the columns as Series.

Example
let s1: Series = Series::new("Name", &["Pythagoras' theorem", "Shannon entropy"]);
let s2: Series = Series::new("Formula", &["a²+b²=c²", "H=-Σ[P(x)log|P(x)|]"]);
let df: DataFrame = DataFrame::new(vec![s1.clone(), s2.clone()])?;

let mut iterator = df.iter();

assert_eq!(iterator.next(), Some(&s1));
assert_eq!(iterator.next(), Some(&s2));
assert_eq!(iterator.next(), None);
Example
let df: DataFrame = df!("Language" => &["Rust", "Python"],
                        "Designer" => &["Graydon Hoare", "Guido van Rossum"])?;

assert_eq!(df.get_column_names(), &["Language", "Designer"]);

Get the Vec<String> representing the column names.

Set the column names.

Example
let mut df: DataFrame = df!("Mathematical set" => &["ℕ", "ℤ", "𝔻", "ℚ", "ℝ", "ℂ"])?;
df.set_column_names(&["Set"])?;

assert_eq!(df.get_column_names(), &["Set"]);

Get the data types of the columns in the DataFrame.

Example
let venus_air: DataFrame = df!("Element" => &["Carbon dioxide", "Nitrogen"],
                               "Fraction" => &[0.965, 0.035])?;

assert_eq!(venus_air.dtypes(), &[DataType::Utf8, DataType::Float64]);

The number of chunks per column

Get a reference to the schema fields of the DataFrame.

Example
let earth: DataFrame = df!("Surface type" => &["Water", "Land"],
                           "Fraction" => &[0.708, 0.292])?;

let f1: Field = Field::new("Surface type", DataType::Utf8);
let f2: Field = Field::new("Fraction", DataType::Float64);

assert_eq!(earth.fields(), &[f1, f2]);

Get (height, width) of the DataFrame.

Example
let df0: DataFrame = DataFrame::default();
let df1: DataFrame = df!("1" => &[1, 2, 3, 4, 5])?;
let df2: DataFrame = df!("1" => &[1, 2, 3, 4, 5],
                         "2" => &[1, 2, 3, 4, 5])?;

assert_eq!(df0.shape(), (0 ,0));
assert_eq!(df1.shape(), (5, 1));
assert_eq!(df2.shape(), (5, 2));

Get the width of the DataFrame which is the number of columns.

Example
let df0: DataFrame = DataFrame::default();
let df1: DataFrame = df!("Series 1" => &[0; 0])?;
let df2: DataFrame = df!("Series 1" => &[0; 0],
                         "Series 2" => &[0; 0])?;

assert_eq!(df0.width(), 0);
assert_eq!(df1.width(), 1);
assert_eq!(df2.width(), 2);

Get the height of the DataFrame which is the number of rows.

Example
let df0: DataFrame = DataFrame::default();
let df1: DataFrame = df!("Currency" => &["€", "$"])?;
let df2: DataFrame = df!("Currency" => &["€", "$", "¥", "£", "₿"])?;

assert_eq!(df0.height(), 0);
assert_eq!(df1.height(), 2);
assert_eq!(df2.height(), 5);

Check if the DataFrame is empty.

Example
let df1: DataFrame = DataFrame::default();
assert!(df1.is_empty());

let df2: DataFrame = df!("First name" => &["Forever"],
                         "Last name" => &["Alone"])?;
assert!(!df2.is_empty());

Add multiple Series to a DataFrame. The added Series are required to have the same length.

Example
fn stack(df: &mut DataFrame, columns: &[Series]) {
    df.hstack_mut(columns);
}

Add multiple Series to a DataFrame. The added Series are required to have the same length.

Example
let df1: DataFrame = df!("Element" => &["Copper", "Silver", "Gold"])?;
let s1: Series = Series::new("Proton", &[29, 47, 79]);
let s2: Series = Series::new("Electron", &[29, 47, 79]);

let df2: DataFrame = df1.hstack(&[s1, s2])?;
assert_eq!(df2.shape(), (3, 3));
println!("{}", df2);

Output:

shape: (3, 3)
+---------+--------+----------+
| Element | Proton | Electron |
| ---     | ---    | ---      |
| str     | i32    | i32      |
+=========+========+==========+
| Copper  | 29     | 29       |
+---------+--------+----------+
| Silver  | 47     | 47       |
+---------+--------+----------+
| Gold    | 79     | 79       |
+---------+--------+----------+

Concatenate a DataFrame to this DataFrame and return as newly allocated DataFrame.

If many vstack operations are done, it is recommended to call DataFrame::rechunk.

Example
let df1: DataFrame = df!("Element" => &["Copper", "Silver", "Gold"],
                         "Melting Point (K)" => &[1357.77, 1234.93, 1337.33])?;
let df2: DataFrame = df!("Element" => &["Platinum", "Palladium"],
                         "Melting Point (K)" => &[2041.4, 1828.05])?;

let df3: DataFrame = df1.vstack(&df2)?;

assert_eq!(df3.shape(), (5, 2));
println!("{}", df3);

Output:

shape: (5, 2)
+-----------+-------------------+
| Element   | Melting Point (K) |
| ---       | ---               |
| str       | f64               |
+===========+===================+
| Copper    | 1357.77           |
+-----------+-------------------+
| Silver    | 1234.93           |
+-----------+-------------------+
| Gold      | 1337.33           |
+-----------+-------------------+
| Platinum  | 2041.4            |
+-----------+-------------------+
| Palladium | 1828.05           |
+-----------+-------------------+

Concatenate a DataFrame to this DataFrame

If many vstack operations are done, it is recommended to call DataFrame::rechunk.

Example
let mut df1: DataFrame = df!("Element" => &["Copper", "Silver", "Gold"],
                         "Melting Point (K)" => &[1357.77, 1234.93, 1337.33])?;
let df2: DataFrame = df!("Element" => &["Platinum", "Palladium"],
                         "Melting Point (K)" => &[2041.4, 1828.05])?;

df1.vstack_mut(&df2)?;

assert_eq!(df1.shape(), (5, 2));
println!("{}", df1);

Output:

shape: (5, 2)
+-----------+-------------------+
| Element   | Melting Point (K) |
| ---       | ---               |
| str       | f64               |
+===========+===================+
| Copper    | 1357.77           |
+-----------+-------------------+
| Silver    | 1234.93           |
+-----------+-------------------+
| Gold      | 1337.33           |
+-----------+-------------------+
| Platinum  | 2041.4            |
+-----------+-------------------+
| Palladium | 1828.05           |
+-----------+-------------------+

Extend the memory backed by this DataFrame with the values from other.

Different from vstack which adds the chunks from other to the chunks of this DataFrame extent appends the data from other to the underlying memory locations and thus may cause a reallocation.

If this does not cause a reallocation, the resulting data structure will not have any extra chunks and thus will yield faster queries.

Prefer extend over vstack when you want to do a query after a single append. For instance during online operations where you add n rows and rerun a query.

Prefer vstack over extend when you want to append many times before doing a query. For instance when you read in multiple files and when to store them in a single DataFrame. In the latter case, finish the sequence of append operations with a rechunk.

Remove a column by name and return the column removed.

Example
let mut df: DataFrame = df!("Animal" => &["Tiger", "Lion", "Great auk"],
                            "IUCN" => &["Endangered", "Vulnerable", "Extinct"])?;

let s1: Result<Series> = df.drop_in_place("Average weight");
assert!(s1.is_err());

let s2: Series = df.drop_in_place("Animal")?;
assert_eq!(s2, Series::new("Animal", &["Tiger", "Lion", "Great auk"]));

Return a new DataFrame where all null values are dropped.

Example
let df1: DataFrame = df!("Country" => ["Malta", "Liechtenstein", "North Korea"],
                        "Tax revenue (% GDP)" => [Some(32.7), None, None])?;
assert_eq!(df1.shape(), (3, 2));

let df2: DataFrame = df1.drop_nulls(None)?;
assert_eq!(df2.shape(), (1, 2));
println!("{}", df2);

Output:

shape: (1, 2)
+---------+---------------------+
| Country | Tax revenue (% GDP) |
| ---     | ---                 |
| str     | f64                 |
+=========+=====================+
| Malta   | 32.7                |
+---------+---------------------+

Drop a column by name. This is a pure method and will return a new DataFrame instead of modifying the current one in place.

Example
let df1: DataFrame = df!("Ray type" => &["α", "β", "X", "γ"])?;
let df2: DataFrame = df1.drop("Ray type")?;

assert!(df2.is_empty());

Insert a new column at a given index.

Add a new column to this DataFrame or replace an existing one.

Add a new column to this DataFrame or replace an existing one. Uses an existing schema to amortize lookups. If the schema is incorrect, we will fallback to linear search.

Get a row in the DataFrame. Beware this is slow.

Example
fn example(df: &mut DataFrame, idx: usize) -> Option<Vec<AnyValue>> {
    df.get(idx)
}

Select a Series by index.

Example
let df: DataFrame = df!("Star" => &["Sun", "Betelgeuse", "Sirius A", "Sirius B"],
                        "Absolute magnitude" => &[4.83, -5.85, 1.42, 11.18])?;

let s1: Option<&Series> = df.select_at_idx(0);
let s2: Series = Series::new("Star", &["Sun", "Betelgeuse", "Sirius A", "Sirius B"]);

assert_eq!(s1, Some(&s2));

Select column(s) from this DataFrame by range and return a new DataFrame

Examples
let df = df! {
    "0" => &[0, 0, 0],
    "1" => &[1, 1, 1],
    "2" => &[2, 2, 2]
}?;

assert!(df.select(&["0", "1"])?.frame_equal(&df.select_by_range(0..=1)?));
assert!(df.frame_equal(&df.select_by_range(..)?));

Get column index of a Series by name.

Example
let df: DataFrame = df!("Name" => &["Player 1", "Player 2", "Player 3"],
                        "Health" => &[100, 200, 500],
                        "Mana" => &[250, 100, 0],
                        "Strength" => &[30, 150, 300])?;

assert_eq!(df.find_idx_by_name("Name"), Some(0));
assert_eq!(df.find_idx_by_name("Health"), Some(1));
assert_eq!(df.find_idx_by_name("Mana"), Some(2));
assert_eq!(df.find_idx_by_name("Strength"), Some(3));
assert_eq!(df.find_idx_by_name("Haste"), None);

Select a single column by name.

Example
let s1: Series = Series::new("Password", &["123456", "[]B$u$g$s$B#u#n#n#y[]{}"]);
let s2: Series = Series::new("Robustness", &["Weak", "Strong"]);
let df: DataFrame = DataFrame::new(vec![s1.clone(), s2])?;

assert_eq!(df.column("Password")?, &s1);

Selected multiple columns by name.

Example
let df: DataFrame = df!("Latin name" => &["Oncorhynchus kisutch", "Salmo salar"],
                        "Max weight (kg)" => &[16.0, 35.89])?;
let sv: Vec<&Series> = df.columns(&["Latin name", "Max weight (kg)"])?;

assert_eq!(&df[0], sv[0]);
assert_eq!(&df[1], sv[1]);

Select column(s) from this DataFrame and return a new DataFrame.

Examples
fn example(df: &DataFrame) -> Result<DataFrame> {
    df.select(["foo", "bar"])
}

Select column(s) from this DataFrame and return them into a Vec.

Example
let df: DataFrame = df!("Name" => &["Methane", "Ethane", "Propane"],
                        "Carbon" => &[1, 2, 3],
                        "Hydrogen" => &[4, 6, 8])?;
let sv: Vec<Series> = df.select_series(&["Carbon", "Hydrogen"])?;

assert_eq!(df["Carbon"], sv[0]);
assert_eq!(df["Hydrogen"], sv[1]);

Take the DataFrame rows by a boolean mask.

Example
fn example(df: &DataFrame) -> Result<DataFrame> {
    let mask = df.column("sepal.width")?.is_not_null();
    df.filter(&mask)
}

Take DataFrame value by indexes from an iterator.

Example
fn example(df: &DataFrame) -> Result<DataFrame> {
    let iterator = (0..9).into_iter();
    df.take_iter(iterator)
}

Take DataFrame values by indexes from an iterator.

Safety

This doesn’t do any bound checking but checks null validity.

Take DataFrame values by indexes from an iterator that may contain None values.

Safety

This doesn’t do any bound checking. Out of bounds may access uninitialized memory. Null validity is checked

Take DataFrame rows by index values.

Example
fn example(df: &DataFrame) -> Result<DataFrame> {
    let idx = IdxCa::new("idx", &[0, 1, 9]);
    df.take(&idx)
}

Rename a column in the DataFrame.

Example
fn example(df: &mut DataFrame) -> Result<&mut DataFrame> {
    let original_name = "foo";
    let new_name = "bar";
    df.rename(original_name, new_name)
}

Sort DataFrame in place by a column.

This is the dispatch of Self::sort, and exists to reduce compile bloat by monomorphization.

Return a sorted clone of this DataFrame.

Example
fn sort_example(df: &DataFrame, reverse: bool) -> Result<DataFrame> {
    df.sort(["a"], reverse)
}

fn sort_by_multiple_columns_example(df: &DataFrame) -> Result<DataFrame> {
    df.sort(&["a", "b"], vec![false, true])
}

Sort the DataFrame by a single column with extra options.

Replace a column with a Series.

Example
let mut df: DataFrame = df!("Country" => &["United States", "China"],
                        "Area (km²)" => &[9_833_520, 9_596_961])?;
let s: Series = Series::new("Country", &["USA", "PRC"]);

assert!(df.replace("Nation", s.clone()).is_err());
assert!(df.replace("Country", s).is_ok());

Replace or update a column. The difference between this method and DataFrame::with_column is that now the value of column: &str determines the name of the column and not the name of the Series passed to this method.

Replace column at index idx with a Series.

Example
let s0 = Series::new("foo", &["ham", "spam", "egg"]);
let s1 = Series::new("ascii", &[70, 79, 79]);
let mut df = DataFrame::new(vec![s0, s1])?;

// Add 32 to get lowercase ascii values
df.replace_at_idx(1, df.select_at_idx(1).unwrap() + 32);

Apply a closure to a column. This is the recommended way to do in place modification.

Example
let s0 = Series::new("foo", &["ham", "spam", "egg"]);
let s1 = Series::new("names", &["Jean", "Claude", "van"]);
let mut df = DataFrame::new(vec![s0, s1])?;

fn str_to_len(str_val: &Series) -> Series {
    str_val.utf8()
        .unwrap()
        .into_iter()
        .map(|opt_name: Option<&str>| {
            opt_name.map(|name: &str| name.len() as u32)
         })
        .collect::<UInt32Chunked>()
        .into_series()
}

// Replace the names column by the length of the names.
df.apply("names", str_to_len);

Results in:

+--------+-------+
| foo    |       |
| ---    | names |
| str    | u32   |
+========+=======+
| "ham"  | 4     |
+--------+-------+
| "spam" | 6     |
+--------+-------+
| "egg"  | 3     |
+--------+-------+

Apply a closure to a column at index idx. This is the recommended way to do in place modification.

Example
let s0 = Series::new("foo", &["ham", "spam", "egg"]);
let s1 = Series::new("ascii", &[70, 79, 79]);
let mut df = DataFrame::new(vec![s0, s1])?;

// Add 32 to get lowercase ascii values
df.apply_at_idx(1, |s| s + 32);

Results in:

+--------+-------+
| foo    | ascii |
| ---    | ---   |
| str    | i32   |
+========+=======+
| "ham"  | 102   |
+--------+-------+
| "spam" | 111   |
+--------+-------+
| "egg"  | 111   |
+--------+-------+

Apply a closure that may fail to a column at index idx. This is the recommended way to do in place modification.

Example

This is the idomatic way to replace some values a column of a DataFrame given range of indexes.

let s0 = Series::new("foo", &["ham", "spam", "egg", "bacon", "quack"]);
let s1 = Series::new("values", &[1, 2, 3, 4, 5]);
let mut df = DataFrame::new(vec![s0, s1])?;

let idx = vec![0, 1, 4];

df.try_apply("foo", |s| {
    s.utf8()?
    .set_at_idx_with(idx, |opt_val| opt_val.map(|string| format!("{}-is-modified", string)))
});

Results in:

+---------------------+--------+
| foo                 | values |
| ---                 | ---    |
| str                 | i32    |
+=====================+========+
| "ham-is-modified"   | 1      |
+---------------------+--------+
| "spam-is-modified"  | 2      |
+---------------------+--------+
| "egg"               | 3      |
+---------------------+--------+
| "bacon"             | 4      |
+---------------------+--------+
| "quack-is-modified" | 5      |
+---------------------+--------+

Apply a closure that may fail to a column. This is the recommended way to do in place modification.

Example

This is the idomatic way to replace some values a column of a DataFrame given a boolean mask.

let s0 = Series::new("foo", &["ham", "spam", "egg", "bacon", "quack"]);
let s1 = Series::new("values", &[1, 2, 3, 4, 5]);
let mut df = DataFrame::new(vec![s0, s1])?;

// create a mask
let values = df.column("values")?;
let mask = values.lt_eq(1)? | values.gt_eq(5_i32)?;

df.try_apply("foo", |s| {
    s.utf8()?
    .set(&mask, Some("not_within_bounds"))
});

Results in:

+---------------------+--------+
| foo                 | values |
| ---                 | ---    |
| str                 | i32    |
+=====================+========+
| "not_within_bounds" | 1      |
+---------------------+--------+
| "spam"              | 2      |
+---------------------+--------+
| "egg"               | 3      |
+---------------------+--------+
| "bacon"             | 4      |
+---------------------+--------+
| "not_within_bounds" | 5      |
+---------------------+--------+

Slice the DataFrame along the rows.

Example
let df: DataFrame = df!("Fruit" => &["Apple", "Grape", "Grape", "Fig", "Fig"],
                        "Color" => &["Green", "Red", "White", "White", "Red"])?;
let sl: DataFrame = df.slice(2, 3);

assert_eq!(sl.shape(), (3, 2));
println!("{}", sl);

Output:

shape: (3, 2)
+-------+-------+
| Fruit | Color |
| ---   | ---   |
| str   | str   |
+=======+=======+
| Grape | White |
+-------+-------+
| Fig   | White |
+-------+-------+
| Fig   | Red   |
+-------+-------+

Get the head of the DataFrame.

Example
let countries: DataFrame =
    df!("Rank by GDP (2021)" => &[1, 2, 3, 4, 5],
        "Continent" => &["North America", "Asia", "Asia", "Europe", "Europe"],
        "Country" => &["United States", "China", "Japan", "Germany", "United Kingdom"],
        "Capital" => &["Washington", "Beijing", "Tokyo", "Berlin", "London"])?;
assert_eq!(countries.shape(), (5, 4));

println!("{}", countries.head(Some(3)));

Output:

shape: (3, 4)
+--------------------+---------------+---------------+------------+
| Rank by GDP (2021) | Continent     | Country       | Capital    |
| ---                | ---           | ---           | ---        |
| i32                | str           | str           | str        |
+====================+===============+===============+============+
| 1                  | North America | United States | Washington |
+--------------------+---------------+---------------+------------+
| 2                  | Asia          | China         | Beijing    |
+--------------------+---------------+---------------+------------+
| 3                  | Asia          | Japan         | Tokyo      |
+--------------------+---------------+---------------+------------+

Get the tail of the DataFrame.

Example
let countries: DataFrame =
    df!("Rank (2021)" => &[105, 106, 107, 108, 109],
        "Apple Price (€/kg)" => &[0.75, 0.70, 0.70, 0.65, 0.52],
        "Country" => &["Kosovo", "Moldova", "North Macedonia", "Syria", "Turkey"])?;
assert_eq!(countries.shape(), (5, 3));

println!("{}", countries.tail(Some(2)));

Output:

shape: (2, 3)
+-------------+--------------------+---------+
| Rank (2021) | Apple Price (€/kg) | Country |
| ---         | ---                | ---     |
| i32         | f64                | str     |
+=============+====================+=========+
| 108         | 0.63               | Syria   |
+-------------+--------------------+---------+
| 109         | 0.63               | Turkey  |
+-------------+--------------------+---------+

Iterator over the rows in this DataFrame as Arrow RecordBatches.

Panics

Panics if the DataFrame that is passed is not rechunked.

This responsibility is left to the caller as we don’t want to take mutable references here, but we also don’t want to rechunk here, as this operation is costly and would benefit the caller as well.

Get a DataFrame with all the columns in reversed order.

Shift the values by a given period and fill the parts that will be empty due to this operation with Nones.

See the method on Series for more info on the shift operation.

Replace None values with one of the following strategies:

  • Forward fill (replace None with the previous value)
  • Backward fill (replace None with the next value)
  • Mean fill (replace None with the mean of the whole array)
  • Min fill (replace None with the minimum of the whole array)
  • Max fill (replace None with the maximum of the whole array)

See the method on Series for more info on the fill_null operation.

Aggregate the columns to their maximum values.

Example
let df1: DataFrame = df!("Die n°1" => &[1, 3, 1, 5, 6],
                         "Die n°2" => &[3, 2, 3, 5, 3])?;
assert_eq!(df1.shape(), (5, 2));

let df2: DataFrame = df1.max();
assert_eq!(df2.shape(), (1, 2));
println!("{}", df2);

Output:

shape: (1, 2)
+---------+---------+
| Die n°1 | Die n°2 |
| ---     | ---     |
| i32     | i32     |
+=========+=========+
| 6       | 5       |
+---------+---------+

Aggregate the columns to their standard deviation values.

Example
let df1: DataFrame = df!("Die n°1" => &[1, 3, 1, 5, 6],
                         "Die n°2" => &[3, 2, 3, 5, 3])?;
assert_eq!(df1.shape(), (5, 2));

let df2: DataFrame = df1.std();
assert_eq!(df2.shape(), (1, 2));
println!("{}", df2);

Output:

shape: (1, 2)
+-------------------+--------------------+
| Die n°1           | Die n°2            |
| ---               | ---                |
| f64               | f64                |
+===================+====================+
| 2.280350850198276 | 1.0954451150103321 |
+-------------------+--------------------+

Aggregate the columns to their variation values.

Example
let df1: DataFrame = df!("Die n°1" => &[1, 3, 1, 5, 6],
                         "Die n°2" => &[3, 2, 3, 5, 3])?;
assert_eq!(df1.shape(), (5, 2));

let df2: DataFrame = df1.var();
assert_eq!(df2.shape(), (1, 2));
println!("{}", df2);

Output:

shape: (1, 2)
+---------+---------+
| Die n°1 | Die n°2 |
| ---     | ---     |
| f64     | f64     |
+=========+=========+
| 5.2     | 1.2     |
+---------+---------+

Aggregate the columns to their minimum values.

Example
let df1: DataFrame = df!("Die n°1" => &[1, 3, 1, 5, 6],
                         "Die n°2" => &[3, 2, 3, 5, 3])?;
assert_eq!(df1.shape(), (5, 2));

let df2: DataFrame = df1.min();
assert_eq!(df2.shape(), (1, 2));
println!("{}", df2);

Output:

shape: (1, 2)
+---------+---------+
| Die n°1 | Die n°2 |
| ---     | ---     |
| i32     | i32     |
+=========+=========+
| 1       | 2       |
+---------+---------+

Aggregate the columns to their sum values.

Example
let df1: DataFrame = df!("Die n°1" => &[1, 3, 1, 5, 6],
                         "Die n°2" => &[3, 2, 3, 5, 3])?;
assert_eq!(df1.shape(), (5, 2));

let df2: DataFrame = df1.sum();
assert_eq!(df2.shape(), (1, 2));
println!("{}", df2);

Output:

shape: (1, 2)
+---------+---------+
| Die n°1 | Die n°2 |
| ---     | ---     |
| i32     | i32     |
+=========+=========+
| 16      | 16      |
+---------+---------+

Aggregate the columns to their mean values.

Example
let df1: DataFrame = df!("Die n°1" => &[1, 3, 1, 5, 6],
                         "Die n°2" => &[3, 2, 3, 5, 3])?;
assert_eq!(df1.shape(), (5, 2));

let df2: DataFrame = df1.mean();
assert_eq!(df2.shape(), (1, 2));
println!("{}", df2);

Output:

shape: (1, 2)
+---------+---------+
| Die n°1 | Die n°2 |
| ---     | ---     |
| f64     | f64     |
+=========+=========+
| 3.2     | 3.2     |
+---------+---------+

Aggregate the columns to their median values.

Example
let df1: DataFrame = df!("Die n°1" => &[1, 3, 1, 5, 6],
                         "Die n°2" => &[3, 2, 3, 5, 3])?;
assert_eq!(df1.shape(), (5, 2));

let df2: DataFrame = df1.median();
assert_eq!(df2.shape(), (1, 2));
println!("{}", df2);

Output:

shape: (1, 2)
+---------+---------+
| Die n°1 | Die n°2 |
| ---     | ---     |
| i32     | i32     |
+=========+=========+
| 3       | 3       |
+---------+---------+

Aggregate the columns to their quantile values.

Aggregate the column horizontally to their min values.

Aggregate the column horizontally to their max values.

Aggregate the column horizontally to their sum values.

Aggregate the column horizontally to their mean values.

Pipe different functions/ closure operations that work on a DataFrame together.

Pipe different functions/ closure operations that work on a DataFrame together.

Pipe different functions/ closure operations that work on a DataFrame together.

👎 Deprecated:

use distinct

Drop duplicate rows from a DataFrame. This fails when there is a column of type List in DataFrame

Example
let df = df! {
              "flt" => [1., 1., 2., 2., 3., 3.],
              "int" => [1, 1, 2, 2, 3, 3, ],
              "str" => ["a", "a", "b", "b", "c", "c"]
          }?;

println!("{}", df.drop_duplicates(true, None)?);

Returns

+-----+-----+-----+
| flt | int | str |
| --- | --- | --- |
| f64 | i32 | str |
+=====+=====+=====+
| 1   | 1   | "a" |
+-----+-----+-----+
| 2   | 2   | "b" |
+-----+-----+-----+
| 3   | 3   | "c" |
+-----+-----+-----+

Drop duplicate rows from a DataFrame. This fails when there is a column of type List in DataFrame

Stable means that the order is maintained. This has a higher cost than an unstable distinct.

Example
let df = df! {
              "flt" => [1., 1., 2., 2., 3., 3.],
              "int" => [1, 1, 2, 2, 3, 3, ],
              "str" => ["a", "a", "b", "b", "c", "c"]
          }?;

println!("{}", df.unique_stable(None, UniqueKeepStrategy::First)?);

Returns

+-----+-----+-----+
| flt | int | str |
| --- | --- | --- |
| f64 | i32 | str |
+=====+=====+=====+
| 1   | 1   | "a" |
+-----+-----+-----+
| 2   | 2   | "b" |
+-----+-----+-----+
| 3   | 3   | "c" |
+-----+-----+-----+

Unstable distinct. See [DataFrame::distinct_stable].

Get a mask of all the unique rows in the DataFrame.

Example
let df: DataFrame = df!("Company" => &["Apple", "Microsoft"],
                        "ISIN" => &["US0378331005", "US5949181045"])?;
let ca: ChunkedArray<BooleanType> = df.is_unique()?;

assert!(ca.all());

Get a mask of all the duplicated rows in the DataFrame.

Example
let df: DataFrame = df!("Company" => &["Alphabet", "Alphabet"],
                        "ISIN" => &["US02079K3059", "US02079K1079"])?;
let ca: ChunkedArray<BooleanType> = df.is_duplicated()?;

assert!(!ca.all());

Create a new DataFrame that shows the null counts per column.

Get the supertype of the columns in this DataFrame

Unnest the given Struct columns. This means that the fields of the Struct type will be inserted as columns.

Check if DataFrames are equal. Note that None == None evaluates to false

Example
let df1: DataFrame = df!("Atomic number" => &[1, 51, 300],
                        "Element" => &[Some("Hydrogen"), Some("Antimony"), None])?;
let df2: DataFrame = df!("Atomic number" => &[1, 51, 300],
                        "Element" => &[Some("Hydrogen"), Some("Antimony"), None])?;

assert!(!df1.frame_equal(&df2));

Check if all values in DataFrames are equal where None == None evaluates to true.

Example
let df1: DataFrame = df!("Atomic number" => &[1, 51, 300],
                        "Element" => &[Some("Hydrogen"), Some("Antimony"), None])?;
let df2: DataFrame = df!("Atomic number" => &[1, 51, 300],
                        "Element" => &[Some("Hydrogen"), Some("Antimony"), None])?;

assert!(df1.frame_equal_missing(&df2));

Checks if the Arc ptrs of the Series are equal

Example
let df1: DataFrame = df!("Atomic number" => &[1, 51, 300],
                        "Element" => &[Some("Hydrogen"), Some("Antimony"), None])?;
let df2: &DataFrame = &df1;

assert!(df1.ptr_equal(df2));

Trait Implementations

Returns a copy of the value. Read more

Performs copy-assignment from source. Read more

Formats the value using the given formatter. Read more

Returns the “default value” for a type. Read more

Formats the value using the given formatter. Read more

Converts to this type from the input type.

Converts to this type from the input type.

Panics

Panics if Series have different lengths.

The returned type after indexing.

Performs the indexing (container[index]) operation. Read more

The returned type after indexing.

Performs the indexing (container[index]) operation. Read more

The returned type after indexing.

Performs the indexing (container[index]) operation. Read more

The returned type after indexing.

Performs the indexing (container[index]) operation. Read more

The returned type after indexing.

Performs the indexing (container[index]) operation. Read more

The returned type after indexing.

Performs the indexing (container[index]) operation. Read more

The returned type after indexing.

Performs the indexing (container[index]) operation. Read more

The returned type after indexing.

Performs the indexing (container[index]) operation. Read more

Convert the DataFrame into a lazy DataFrame

This method tests for self and other values to be equal, and is used by ==. Read more

This method tests for !=.

Upsample a DataFrame at a regular frequency. Read more

Upsample a DataFrame at a regular frequency. Read more

The type returned in the event of a conversion error.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more

Immutably borrows from an owned value. Read more

Mutably borrows from an owned value. Read more

Create dummy variables. Read more

Returns the argument unchanged.

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

The alignment of pointer.

The type for initializers.

Initializes a with the given initializer. Read more

Dereferences the given pointer. Read more

Mutably dereferences the given pointer. Read more

Drops the object pointed to by the given pointer. Read more

The resulting type after obtaining ownership.

Creates owned data from borrowed data, usually by cloning. Read more

🔬 This is a nightly-only experimental API. (toowned_clone_into)

Uses borrowed data to replace owned data, usually by cloning. Read more

Converts the given value to a String. Read more

The type returned in the event of a conversion error.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.