Rowboat 🛶
Dataframe in rust 🦀
+-----------+------+-----------+
| strangs | nums | null nums |
+-----------+------+-----------+
| sugar | 0 | -10 |
| sweets | 1 | Null |
| candy pop | 2 | 200 |
| caramel | 3 | 400 |
| chocolate | 4 | 777 |
+-----------+------+-----------+
Import
use *;
Create
From rows
using the row! macro
let df = from_rows
.unwrap;
From csv
With ToRow proc-macro
let df = .unwrap;
Or implement ToRow manually
From structs
Create from a Vec<T> where T implements ToRow
let df = from_structs
.unwrap;
With null values
let df = from_rows
.unwrap;
With timestamp
let df = from_rows
.unwrap;
Supported types
Int(i64)Uint(u64)Str(String)Bool(bool)Float(f64)DateTime(chrono::NaiveDateTime)Null(Box<Cell>)
Display
All
df.print;
Head
df.head;
Tail
df.tail;
Metadata
Info
Print shape and types
df.info;
// DF Info
// Shape: 3_col x 5_row
// Columns: strangs <Str>, nums <Int>, null nums <Int>
Describe
df.describe.print;
Creates a describe df and prints it:
+---------+---------+------+-----------+
| :: | strangs | nums | null nums |
+---------+---------+------+-----------+
| count | 5 | 5 | 5 |
| mean | Null | 2 | 341.75 |
| std | Null | 1.41 | 301.15 |
| min | Null | 0 | -10 |
| 25% | Null | 0.5 | 95 |
| 50% | Null | 2 | 300 |
| 75% | Null | 3.5 | 588.5 |
| max | Null | 4 | 777 |
| unique | 5 | Null | Null |
| top idx | 0 | Null | Null |
| freq | 1 | Null | Null |
+---------+---------+------+-----------+
Column names
df.col_names;
Extend
Add column
df.add_col.unwrap;
+----+-------+--------+ +-------+
| id | name | active | value |
+----+-------+--------+ +-------+
| 0 | Jake | true | -10 |
| 1 | Jane | true | 30 |
| 2 | Sally | false | 20 |
| 3 | Sam | false | 4 |
+----+-------+--------+ +-------+
Add row
df.add_row.unwrap;
+----+-------+--------+-------+
| id | name | active | value |
+----+-------+--------+-------+
| 0 | Jake | true | -10 |
| 1 | Jane | true | Null |
| 2 | Sally | false | 200 |
| 3 | Sam | false | 400 |
+ + + + +
| 4 | Susan | false | 7 |
+----+-------+--------+-------+
Concat
Extend vertically, essentially a union join
df.concat.unwrap;
+-----------+------+-----------+
| strangs | nums | null nums |
+-----------+------+-----------+
| sugar | 0 | -10 |
| sweets | 1 | Null |
| candy pop | 2 | 200 |
| caramel | 3 | 400 |
+ + + +
| chocolate | 4 | 777 |
| cinnamon | 5 | 300 |
| syrup | 6 | Null |
| sprinkles | 7 | -500 |
+-----------+------+-----------+
Join
Extend horizontally on left/right column value match
Inner join
// join(other_df, left_column, right_column)
let result_df = df.join.unwrap;
+----+-------+-------- + -----+---------+
| id | name | active uid | balance |
+----+-------+-------- + -----+---------+
| 0 | Jake | true 0 | -10 |
| 1 | Jane | true 1 | Null |
| 2 | Sally | false 2 | 200 |
| 3 | Sam | false 3 | 400 |
| 4 | Susan | false 4 | 777 |
+----+-------+-------- + -----+---------+
Left join
let result_df = df.left_join.unwrap;
More on columns
Copy/update an existing column into a new column
df.add_col
.unwrap;
Create a column derived from multiple source column values
df.add_col
.unwrap;
Slice
By index
// to_dataframe copies DataSlice into new Dataframe
df.slice.unwrap.to_dataframe;
+ + + + +
| 100 | Jane | true | Null |
| 200 | Sally | false | 200 |
| 300 | Sam | false | 400 |
+ + + + +
By column
df.col_slice
.unwrap
.to_dataframe;
+--------+-----+
name | age
+--------+-----+
Jane | 24
Sally | 56
Susan | 43
Jasper | 78
Sam | 37
+--------+-----+
Get cell
// (row_index, col_name)
let cell = df.cell.unwrap;
Reshape
Drop columns
Drop specified columns
df.drop_cols;
Retain columns
Drop all columns other than those specified
df.retain_cols;
Rename column
df.rename_col.unwrap;
Filter
Operation enum variants:
EqequalNeqnot equalGtgreater thanLtless thanGtEqgreater or equal thanLtEqless or equal thanMod(i64)modiisRegexmatches regex
Simple
// where age val is not null
let df = df.filter.unwrap;
Before After
+--------+------+-------+ +--------+------+-------+
| name | age | value | | name | age | value |
+--------+------+-------+ +--------+------+-------+
| Jane | Null | -10 | | Sally | 56 | Null |
| Sally | 56 | Null | | Susan | 43 | 200 |
| Susan | 43 | 200 | | Sam | 37 | 777 |
| Jasper | Null | 400 | +--------+------+-------+
| Sam | 37 | 777 |
+--------+------+-------+
Complex
Nest as many and/or/not/exp as needed
let df = df
.filter
.unwrap;
Negate
Wrap any expression in not() to inverse the result
// filter odd values
let df = df.filter.unwrap;
Mutate
By column
df.col_mut
.unwrap
.apply
.unwrap;
By cell
Directly
// index, column, new_value
df.set_val.unwrap;
Via function
// index, column, function
df.update_val
.unwrap;
Sort
Simple
// sort by, sort dir [Asc | Desc]
df.sort.unwrap;
Complex
Use this method for multi column sorting
let sorted = df
.into_sort
.sort
.sort
.sort
.collect
.unwrap;
Iterate
Iter
let unames = df
.iter
.map
.;
Into iter
A consuming df.into_iter() is also available
Iter chunk
df.iter_chunk.for_each;
Group by
Reducer enum variants
CountSumProdMeanMinMaxTopUniqueCoalesceNonNull
Query
Group df by common group_by values then do selects to reduce data groups into a new dataframe
// Source column, reducer, new alias name
let grouped_df = df
.group_by
.select
.select
.select
.select
.select
.to_dataframe
.unwrap;
Above query transforms this raw data:
+--------+-------------+--------+-----+
| name | department | salary | age |
+--------+-------------+--------+-----+
| Jasper | Sales | 100 | 29 |
| James | Marketing | 200 | 44 |
| Susan | Sales | 300 | 65 |
| Jane | Marketing | 400 | 47 |
| Sam | Sales | 100 | 55 |
| Sally | Engineering | 200 | 30 |
+--------+-------------+--------+-----+
Into this new dataframe:
+-------------+-------+---------+---------+---------+
| department | count | max sal | min sal | avg age |
+-------------+-------+---------+---------+---------+
| Sales | 3 | 300 | 100 | 49.67 |
| Marketing | 2 | 400 | 200 | 45.5 |
| Engineering | 1 | 200 | 200 | 30 |
+-------------+-------+---------+---------+---------+
Grouped chunks
Group df by common chunk_by values into a Vec<Dataframe>
df.to_slice
.chunk_by
.unwrap
.iter
.for_each;
Store
To csv
df.to_csv.unwrap;
To SQL
Convert the df into chunks of SQL insert statements with corresponding Vec<String> args. Meant to be compatible with sqlx library.
df.iter_sql.for_each;
DataSlice type also has a to_sql method.
Examples
For more examples, see ./tests/integration_test.rs, ./tests/example/example.rs, and ./tests/example/example_from_sql.rs