dfsql 0.15.1

SQL REPL for Data Frames
Documentation
# `dfsql`

![](img/terminal.png)

- Revision: the standalone `count` command is replaced with `len`, so make sure to replace `(count)` and `col "count"` with `len` and `col "len"` respectively.
  - the unary `count <col>` command is unaffected.

## Install

```bash
cargo install dfsql
```

## How to run

```bash
dfsql --input your.csv --output a-new.csv
# ...or
dfsql -i your.csv -o a-new.csv
```

## REPL

- `exit`/`quit`: exit the REPL loop.
  ```bash
  exit
  ```
- `undo`: undo the previous successful operation.
  ```bash
  undo
  ```
- `reset`: reset all the changes and go back to the original data frame.
  ```bash
  reset
  ```
- `schema`: show column names and types of the data frame.
  ```bash
  schema
  ```
- `save`: save the current data frame to a file.
  ```bash
  save a-new.csv
  ```

## Statements

- `select`
  ```py
  select <expr>*
  ```
  ```sql
  select last_name first_name
  ```
  - Select columns "last_name" and "first_name" and collect them into a data frame.
- Group by
  ```py
  group (<col> | <var>)* agg <expr>*
  ```
  ```sql
  group first_name agg (count)
  ```
  - Group the data frame by column "first_name" and then aggregate each group with the count of the members.
- `filter`
  ```py
  filter <expr>
  ```
  ```sql
  filter first_name = "John"
  ```
- `limit`
  ```py
  limit <int>
  ```
  ```sql
  limit 5
  ```
- `reverse`
  ```sql
  reverse
  ```
- `sort`
  ```py
  sort ((asc | desc | ()) <col>)*
  ```
  ```sql
  sort icpsr_id
  ```
- `use`
  ```py
  use <var>
  ```
  ```py
  use other
  ```
  - Switch to the data frame called `other`.
- join
  ```py
  (left | right | inner | full) join <var> on <col> <col>?
  ```
  ```py
  left join other on id ID
  ```
  - left join the data frame called `other` on my column `id` and its column `ID`

## Expressions

- `col`: reference to a column.
  ```py
  col : (<str> | <var>) -> <expr>
  ```
  ```sql
  select col first_name
  ```
- `exclude`: remove columns from the data frame.
  ```py
  exclude : <expr>* -> <expr>
  ```
  ```sql
  select exclude last_name first_name
  ```
- literal: literal values like `42`, `"John"`, `1.0`, and `null`.
- binary operations
  ```sql
  select a * b
  ```
  - Calculate the product of columns "a" and "b" and collect the result.
- unary operations
  ```sql
  select -a
  ```
  ```sql
  select sum a
  ```
  - Sum all values in column "a" and collect the scalar result.
- `alias`: assign a name to a column.
  ```py
  alias : (<col> | <var>) <expr> -> <expr>
  ```
  ```sql
  select alias product a * b
  ```
  - Assign the name "product" to the product and collect the new column.
- conditional
  ```py
  <conditional> : if <expr> then <expr> (if <expr> then <expr>)* otherwise <expr> -> <expr>
  ```
  ```sql
  select if class = 0 then "A" if class = 1 then "B" else null
  ```
- `cast`: cast a column to either type `str`, `int`, or `float`.
  ```py
  cast : <type> <expr> -> <expr>
  ```
  ```sql
  select cast str id
  ```
  - Cast the column "id" to type `str` and collect the result.