1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
//! # Polars: *<small>DataFrames in Rust</small>*
//!
//! Polars is a DataFrame library for Rust. It is based on [Apache Arrow](https://arrow.apache.org/)'s memory model.
//! Apache arrow provides very cache efficient columnar data structures and is becoming the defacto
//! standard for columnar data.
//!
//! This means that Polars data structures can be shared zero copy with processes in many different
//! languages.
//!
//! ## Tree Of Contents
//!
//! * [Cookbooks](#cookbooks)
//! * [Data structures](#data-structures)
//!     - [DataFrame](#dataframe)
//!     - [Series](#series)
//!     - [ChunkedArray](#chunkedarray)
//! * [SIMD](#simd)
//! * [API](#api)
//! * [Compile times](#compile-times)
//! * [Performance](#performance-and-string-data)
//!     - [Custom allocator](#custom-allocator)
//! * [Config](#config-with-env-vars)
//! * [WASM target](#compile-for-wasm)
//!
//! ## Cookbooks
//! See examples in the cookbooks:
//!
//! * [Eager](crate::docs::eager)
//! * [Lazy](crate::docs::lazy)
//!
//! ## Data Structures
//! The base data structures provided by polars are `DataFrame`, `Series`, and `ChunkedArray<T>`.
//! We will provide a short, top-down view of these data structures.
//!
//! ### DataFrame
//! A `DataFrame` is a 2 dimensional data structure that is backed by a `Series`, and it could be
//! seen as an abstraction on `Vec<Series>`. Operations that can be executed on `DataFrame`s are very
//! similar to what is done in a `SQL` like query. You can `GROUP`, `JOIN`, `PIVOT` etc. The
//! closest arrow equivalent to a `DataFrame` is a [RecordBatch](https://docs.rs/arrow/4.0.0/arrow/record_batch/struct.RecordBatch.html),
//! and Polars provides zero copy coercion.
//!
//! ### Series
//! `Series` are the type agnostic columnar data representation of Polars. They provide many
//! operations out of the box, many via the [Series struct](crate::prelude::Series) and
//! [SeriesTrait trait](crate::series::SeriesTrait). Whether or not an operation is provided
//! by a `Series` is determined by the operation. If the operation can be done without knowing the
//! underlying columnar type, this operation probably is provided by the `Series`. If not, you must
//! downcast to the typed data structure that is wrapped by the `Series`. That is the `ChunkedArray<T>`.
//!
//! ### ChunkedArray
//! `ChunkedArray<T>` are wrappers around an arrow array, that can contain multiples chunks, e.g.
//! `Vec<dyn ArrowArray>`. These are the root data structures of Polars, and implement many operations.
//! Most operations are implemented by traits defined in [chunked_array::ops](crate::chunked_array::ops),
//! or on the [ChunkedArray struct](crate::chunked_array::ChunkedArray).
//!
//! ## SIMD
//! Polars / Arrow uses packed_simd to speed up kernels with SIMD operations. SIMD is an optional
//! `feature = "simd"`, and requires a nightly compiler. If you don't need SIMD, **Polars runs on stable!**
//!
//! ## API
//! Polars supports an eager and a lazy API, and strives to make them both equally capable.
//! The eager API is similar to [pandas](https://pandas.pydata.org/) and is easy to get started.
//! The lazy API is similar to [Spark](https://spark.apache.org/) and builds a query plan that will
//! be optimized. This may be less intuitive but could improve performance.
//!
//! ### Eager
//! Read more in the pages of the following data structures /traits.
//!
//! * [DataFrame struct](crate::frame::DataFrame)
//! * [Series struct](crate::series::Series)
//! * [Series trait](crate::series::SeriesTrait)
//! * [ChunkedArray struct](crate::chunked_array::ChunkedArray)
//! * [ChunkedArray operations traits](crate::chunked_array::ops)
//!
//! ### Lazy
//! Unlock full potential with lazy computation. This allows query optimizations and provides Polars
//! the full query context so that the fastest algorithm can be chosen.
//!
//! **[Read more in the lazy module.](polars_lazy)**
//!
//! ## Compile times
//! A DataFrame library typically consists of
//!
//! * Tons of features
//! * A lot of datatypes
//!
//! Both of these really put strain on compile times. To keep Polars lean, we make both **opt-in**,
//! meaning that you only pay the compilation cost, if you need it.
//!
//! ## Compile times and opt-in features
//! The opt-in features are (not including dtype features):
//!
//! * `lazy` - Lazy API
//!     - `lazy_regex` - Use regexes in [column selection](crate::lazy::dsl::col)
//! * `random` - Generate arrays with randomly sampled values
//! * `ndarray`- Convert from `DataFrame` to `ndarray`
//! * `temporal` - Conversions between [Chrono](https://docs.rs/chrono/) and Polars for temporal data types
//! * `strings` - Extra string utilities for `Utf8Chunked`
//! * `object` - Support for generic ChunkedArrays called `ObjectChunked<T>` (generic over `T`).
//!              These are downcastable from Series through the [Any](https://doc.rust-lang.org/std/any/index.html) trait.
//! * Performance related:
//!     - `simd` - SIMD operations _(nightly only)_
//!     - `performant` - ~40% faster chunkedarray creation but may lead to unexpected panic if iterator incorrectly sets a size_hint
//! * IO related:
//!     - `serde` - Support for [serde](https://crates.io/crates/serde) serialization and deserialization.
//!                 Can be used for JSON and more serde supported serialization formats.
//!     - `parquet` - Read Apache Parquet format
//!     - `json` - JSON serialization
//!     - `ipc` - Arrow's IPC format serialization
//!     - `decompress` - Automatically infer compression of csv-files and decompress them.
//!                      Supported compressions:
//!                         * zip
//!                         * gzip
//!
//! * `DataFrame` operations:
//!     - `pivot` - [pivot operation](crate::frame::groupby::GroupBy::pivot) on `DataFrame`s
//!     - `sort_multiple` - Allow sorting a `DataFrame` on multiple columns
//!     - `rows` - Create `DataFrame` from rows and extract rows from `DataFrames`.
//!     - `downsample` - [downsample operation](crate::frame::DataFrame::downsample) on `DataFrame`s
//!     - `asof_join` - Join as of, to join on nearest keys instead of exact equality match.
//!     - `cross_join` - Create the cartesian product of two DataFrames.
//!     - `groupby_list` - Allow groupby operation on keys of type List.
//! * `Series` operations:
//!     - `is_in` - [Check for membership in `Series`](crate::chunked_array::ops::IsIn)
//!     - `zip_with` - [Zip two Series/ ChunkedArrays](crate::chunked_array::ops::ChunkZip)
//!     - `round_series` - round underlying float types of `Series`.
//!     - `repeat_by` - [Repeat element in an Array N times, where N is given by another array.
//!     - `is_first` - Check if element is first unique value.
//!     - `is_last` - Check if element is last unique value.
//!     - `checked_arithmetic` - checked arithmetic/ returning `None` on invalid operations.
//!     - `dot_product` - Dot/inner product on Series and Expressions.
//!     - `concat_str` - Concat string data in linear time.
//!     - `reinterpret` - Utility to reinterpret bits to signed/unsigned
//!     - `take_opt_iter` - Take from a Series with `Iterator<Item=Option<usize>>`
//!     - `mode` - [Return the most occurring value(s)](crate::chunked_array::ops::ChunkUnique::mode)
//!     - `cum_agg` - [cumsum, cummin, cummax aggregation](crate::chunked_array::ops::CumAgg)
//!     - `rolling_window` [rolling window functions, like rolling_mean](crate::chunked_array::ops::ChunkWindow)
//!     - `interpolate` [interpolate None values](crate::chunked_array::ops::Interpolate)
//!     - `extract_jsonpath` - [Run jsonpath queries on Utf8Chunked](https://goessner.net/articles/JsonPath/)
//!     - `list` - [List utils](crate::chunked_array::list::namespace)
//!     - `rank` - Ranking algorithms.
//!     - `moment` - kurtosis and skew statistics
//! * `DataFrame` pretty printing (Choose one or none, but not both):
//!     - `plain_fmt` - no overflowing (less compilation times)
//!     - `pretty_fmt` - cell overflow (increased compilation times)
//!     - `row_hash` - Utility to hash DataFrame rows to UInt64Chunked
//!
//! ## Compile times and opt-in data types
//! As mentioned above, Polars `Series` are wrappers around
//! `ChunkedArray<T>` without the generic parameter `T`.
//! To get rid of the generic parameter, all the possible value of `T` are compiled
//! for `Series`. This gets more expensive the more types you want for a `Series`. In order to reduce
//! the compile times, we have decided to default to a minimal set of types and make more `Series` types
//! opt-in.
//!
//! Note that if you get strange compile time errors, you probably need to opt-in for that `Series` dtype.
//! The opt-in dtypes are:
//!
//! | data type               | feature flag      |
//! |-------------------------|-------------------|
//! | Time64NanoSecondType    | dtype-time64-ns   |
//! | DurationNanosecondType  | dtype-duration-ns |
//! | DurationMillisecondType | dtype-duration-ms |
//! | Date32Type              | dtype-date32      |
//! | Date64Type              | dtype-date64      |
//! | Int8Type                | dtype-i8          |
//! | Int16Type               | dtype-i16         |
//! | UInt8Type               | dtype-u8          |
//! | UInt16Type              | dtype-u16         |
//! | UInt64Type              | dtype-u64         |
//!
//!
//! Or you can choose on of the preconfigured pre-sets.
//!
//! * `dtype-full` - all opt-in dtypes.
//! * `dtype-slim` - slim preset of opt-in dtypes.
//!
//! ## Performance and string data
//! Large string data can really slow down your queries.
//! Read more in the [performance section](crate::docs::performance)
//!
//! ### Custom allocator
//! A DataFrame library naturally does a lot of heap allocations. It is recommended to use a custom
//! allocator. [Mimalloc](https://docs.rs/mimalloc/0.1.25/mimalloc/) for instance, shows a significant
//! performance gain in runtime as well as memory usage.
//!
//! #### Usage
//! ```ignore
//! use mimalloc::MiMalloc;
//!
//! #[global_allocator]
//! static GLOBAL: MiMalloc = MiMalloc;
//! ```
//!
//! #### Cargo.toml
//! ```ignore
//! [dependencies]
//! mimalloc = { version = "*", default-features = false }
//! ```
//! ## Config with ENV vars
//!
//! * `POLARS_PAR_SORT_BOUND` -> sets the lower bound of rows at which Polars will use a parallel sorting algorithm.
//!                              Default is 1M rows.
//! * `POLARS_FMT_NO_UTF8` -> use ascii tables in favor of utf8.
//! * `POLARS_FMT_MAX_COLS` -> maximum number of columns shown when formatting DataFrames.
//! * `POLARS_FMT_MAX_ROWS` -> maximum number of rows shown when formatting DataFrames.
//! * `POLARS_TABLE_WIDTH` -> width of the tables used during DataFrame formatting.
//! * `POLARS_MAX_THREADS` -> maximum number of threads used to initialize thread pool (on startup).
//! * `POLARS_VERBOSE` -> print logging info to stderr
//!
//! ## Compile for WASM
//! To be able to pretty print a `DataFrame` in `wasm32-wasi` you need to patch the `prettytable-rs`
//! dependency. If you add this snippet to your `Cargo.toml` you can compile and pretty print when
//! compiling to `wasm32-wasi` target.
//!
//! ```toml
//! [patch.crates-io]
//! prettytable-rs = { git = "https://github.com/phsym/prettytable-rs", branch = "master"}
//! ```
//!
//! ## User Guide
//! If you want to read more, [check the User Guide](https://pola-rs.github.io/polars-book/).
pub mod docs;
pub mod prelude;

pub use polars_core::{
    chunked_array, datatypes, doc, error, frame, functions, series, testing, toggle_string_cache,
};

pub use polars_core::apply_method_all_arrow_series;
pub use polars_core::df;

#[cfg(feature = "polars-io")]
pub use polars_io as io;
#[cfg(feature = "lazy")]
pub use polars_lazy as lazy;