datafusion-table-providers 0.6.1

Extend the capabilities of DataFusion to support additional data sources via implementations of the `TableProvider` trait.
Documentation
# DataFusion Table Providers

Note: This is not an official Apache Software Foundation project.

The goal of this repo is to extend the capabilities of DataFusion to support additional data sources via implementations of the `TableProvider` trait.

Many of the table providers in this repo are for querying data from other database systems. Those providers also integrate with the [`datafusion-federation`](https://github.com/datafusion-contrib/datafusion-federation/) crate to allow for more efficient query execution, such as pushing down joins between multiple tables from the same database system, or efficiently implementing TopK style queries (`SELECT * FROM table ORDER BY foo LIMIT 10`).

To use these table providers with efficient federation push-down, add the `datafusion-federation` crate and create a DataFusion `SessionContext` using the Federation optimizer rule and query planner with:

```rust
use datafusion::prelude::SessionContext;

let state = datafusion_federation::default_session_state();
let ctx = SessionContext::with_state(state);

// Register the specific table providers into ctx
// queries will now automatically be federated
```

## Table Providers

- PostgreSQL
- MySQL
- SQLite
- DuckDB
- Flight SQL
- ODBC

## Examples (in Rust)

Run the included examples to see how to use the table providers:

### DuckDB

```bash
# Read from a table in a DuckDB file
cargo run --example duckdb --features duckdb
# Create an external table backed by DuckDB directly in DataFusion
cargo run --example duckdb_external_table --features duckdb
# Use the result of a DuckDB function call as the source of a table
cargo run --example duckdb_function --features duckdb
```

### SQLite

```bash
# Run from repo folder
cargo run --example sqlite --features sqlite
```

### Postgres

In order to run the Postgres example, you need to have a Postgres server running. You can use the following command to start a Postgres server in a Docker container the example can use:

```bash
docker run --name postgres -e POSTGRES_PASSWORD=password -e POSTGRES_DB=postgres_db -p 5432:5432 -d postgres:16-alpine
# Wait for the Postgres server to start
sleep 30

# Create a table in the Postgres server and insert some data
docker exec -i postgres psql -U postgres -d postgres_db <<EOF
CREATE TABLE companies (
   id INT PRIMARY KEY,
  name VARCHAR(100)
);

INSERT INTO companies (id, name) VALUES
    (1, 'Acme Corporation'),
    (2, 'Widget Inc.'),
    (3, 'Gizmo Corp.'),
    (4, 'Tech Solutions'),
    (5, 'Data Innovations');

CREATE VIEW companies_view AS
  SELECT id, name FROM companies;

CREATE MATERIALIZED VIEW companies_materialized_view AS
  SELECT id, name FROM companies;
EOF
```

```bash
# Run from repo folder
cargo run -p datafusion-table-providers --example postgres --features postgres

```

### MySQL

In order to run the MySQL example, you need to have a MySQL server running. You can use the following command to start a MySQL server in a Docker container the example can use:

```bash
docker run --name mysql -e MYSQL_ROOT_PASSWORD=password -e MYSQL_DATABASE=mysql_db -p 3306:3306 -d mysql:9.0
# Wait for the MySQL server to start
sleep 30

# Create a table in the MySQL server and insert some data
docker exec -i mysql mysql -uroot -ppassword mysql_db <<EOF
CREATE TABLE companies (
   id INT PRIMARY KEY,
  name VARCHAR(100)
);

INSERT INTO companies (id, name) VALUES (1, 'Acme Corporation');
EOF
```

```bash
# Run from repo folder
cargo run -p datafusion-table-providers --example mysql --features mysql
```

### Flight SQL

```bash
brew install roapi
# or
# cargo install --locked --git https://github.com/roapi/roapi --branch main --bins roapi
roapi -t taxi=https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2024-01.parquet &

# Run from repo folder
cargo run -p datafusion-table-providers --example flight-sql --features flight
```

### ODBC
```bash
apt-get install unixodbc-dev libsqliteodbc
# or
# brew install unixodbc & brew install sqliteodbc

cargo run --example odbc_sqlite --features odbc
```

#### ARM Mac

Please see https://github.com/pacman82/odbc-api#os-x-arm--mac-m1 for reference.

Steps:
1. Install unixodbc and sqliteodbc by `brew install unixodbc sqliteodbc`.
2. Find local sqliteodbc driver path by running `brew info sqliteodbc`. The path might look like `/opt/homebrew/Cellar/sqliteodbc/0.99991`.
3. Set up odbc config file at `~/.odbcinst.ini` with your local sqliteodbc path.
Example config file:
```
[SQLite3]
Description = SQLite3 ODBC Driver
Driver      = /opt/homebrew/Cellar/sqliteodbc/0.99991/lib/libsqlite3odbc.dylib
```
4. Test configuration by running `odbcinst -q -d -n SQLite3`. If the path is printed out correctly, then you are all set.

## Examples (in Python)
1. Start a Python venv
2. Enter into venv
3. Inside python/ folder, run `maturin develop`.
4. Inside python/examples/ folder, run the corresponding test using `python3 [file_name]`.