Crate polars[−][src]
Polars: DataFrames in Rust
Polars is a DataFrame library for Rust. It is based on Apache Arrows memory model.
This means that operations on Polars array's (called Series
or ChunkedArray<T>
{if the type T
is known}) are
optimally aligned cache friendly operations and SIMD. Sadly, Apache Arrow needs nightly Rust,
which means that Polars cannot run on stable.
Polars supports an eager and a lazy api. The eager api is similar to pandas, the lazy api is similar to Spark.
Eager
Read more in the pages of the following data structures /traits.
Lazy
Read more in the lazy module
Read and write CSV/ JSON
use polars::prelude::*; use std::fs::File; fn example() -> Result<DataFrame> { let file = File::open("iris.csv") .expect("could not read file"); CsvReader::new(file) .infer_schema(None) .has_header(true) .finish() }
For more IO examples see:
Joins
use polars::prelude::*; fn join() -> Result<DataFrame> { // Create first df. let temp = df!("days" => &[0, 1, 2, 3, 4], "temp" => &[22.1, 19.9, 7., 2., 3.])?; // Create second df. let rain = df!("days" => &[1, 2], "rain" => &[0.1, 0.2])?; // Left join on days column. temp.left_join(&rain, "days", "days") } println!("{}", join().unwrap())
+------+------+------+
| days | temp | rain |
| --- | --- | --- |
| i32 | f64 | f64 |
+======+======+======+
| 0 | 22.1 | null |
+------+------+------+
| 1 | 19.9 | 0.1 |
+------+------+------+
| 2 | 7 | 0.2 |
+------+------+------+
| 3 | 2 | null |
+------+------+------+
| 4 | 3 | null |
+------+------+------+
Groupby's | aggregations | pivots | melts
use polars::prelude::*; fn groupby_sum(df: &DataFrame) -> Result<DataFrame> { df.groupby("column_name")? .select("agg_column_name") .sum() }
Arithmetic
use polars::prelude::*; let s = Series::new("foo", [1, 2, 3]); let s_squared = &s * &s;
Rust iterators
use polars::prelude::*; let s: Series = [1, 2, 3].iter().collect(); let s_squared: Series = s.i32() .expect("datatype mismatch") .into_iter() .map(|optional_v| { match optional_v { Some(v) => Some(v * v), None => None, // null value } }).collect();
Apply custom closures
Besides running custom iterators, custom closures can be applied on the values of ChunkedArray
by using the apply method. This method accepts
a closure that will be applied on all values of Option<T>
that are non null. Note that this is the
fastest way to apply a custom closure on ChunkedArray
's.
let s: Series = Series::new("values", [Some(1.0), None, Some(3.0)]); // null values are ignored automatically let squared = s.f64() .unwrap() .apply(|value| value.powf(2.0)) .into_series(); assert_eq!(Vec::from(squared.f64().unwrap()), &[Some(1.0), None, Some(9.0)])
Comparisons
use polars::prelude::*; let s = Series::new("dollars", &[1, 2, 3]); let mask = s.eq(1); assert_eq!(Vec::from(mask), &[Some(true), Some(false), Some(false)]);
Temporal data types
let dates = &[ "2020-08-21", "2020-08-21", "2020-08-22", "2020-08-23", "2020-08-22", ]; // date format let fmt = "%Y-%m-%d"; // create date series let s0 = Date32Chunked::parse_from_str_slice("date", dates, fmt) .into_series();
And more...
Features
Additional cargo features:
temporal (default)
- Conversions between Chrono and Polars for temporal data
simd (default)
- SIMD operations
parquet
- Read Apache Parquet format
json
- Json serialization
ipc
- Arrow's IPC format serialization
random
- Generate array's with randomly sampled values
ndarray
- Convert from
DataFrame
tondarray
- Convert from
parallel
- Parallel variants of operations
lazy
- Lazy api
strings
- String utilities for
Utf8Chunked
- String utilities for
object
- Support for generic ChunkedArray's called
ObjectChunked<T>
(generic overT
). These will downcastable from Series through the Any trait.
- Support for generic ChunkedArray's called
Re-exports
pub use polars_io as io; |
pub use polars_lazy as lazy; |
Modules
chunked_array | The typed heart of every Series column. |
datatypes | Data types supported by Polars. |
doc | Other documentation |
error | |
frame | DataFrame module. |
functions | |
prelude | |
series | Type agnostic columnar data structure. |
testing | Testing utilities. |
Macros
apply_method_all_arrow_series | |
df |
Functions
toggle_string_cache |