exarrow-rs 0.12.0

[Home](index.md) · [Setup & Connect](setup-and-connect.md) · [Queries](queries.md) · [Prepared Statements](prepared-statements.md) · [Import/Export](import-export.md) · [Types](type-mapping.md) · [Driver Manager](driver-manager.md)

---

# Import / Export

High-performance bulk data transfer via HTTP transport. Supports streaming for large datasets.

## Which import path should I use?

The right path depends on your data source and how much schema control you need:

| Situation | Recommended path |
|-----------|-----------------|
| You already know your target schema and table structure | Connection methods + explicit DDL: create the table yourself, then call `connection.import_csv_from_files()` or `connection.import_parquet_from_files()` |
| You want the table created from file metadata | Parquet/CSV auto-create: pass `ParquetImportOptions::default().with_create_table_if_not_exists(true)` — the driver reads Parquet metadata and issues `CREATE TABLE` before importing |
| You already have Arrow `RecordBatch`es in memory | Arrow import via the connection method `connection.import_from_record_batch()` or the ADBC `Statement` API |
| You are calling from Python, Polars, or another language | The driver-manager path — see [Driver Manager](driver-manager.md) |

Use the connection convenience methods (`connection.import_*` and `connection.export_*`) as the default choice. They automatically propagate the connection's host and port into the HTTP transport, so you only need to set format-specific options. They also use whatever control transport you configured (native TCP by default, which gives the best throughput; WebSocket if you set `?transport=websocket` for proxy or compatibility reasons). Lower-level free functions and direct `HttpTransportClient` usage are available for advanced cases where you need full control over transport parameters, but they require you to supply host/port manually and are not the recommended starting point.

## Control Connection TLS vs HTTP Transport TLS

Exasol import and export use **two independent TLS layers**. Enabling TLS on the main control connection (native TCP or WebSocket) does **not** automatically enable TLS on the HTTP transport tunnel used for bulk data transfer. Each layer must be configured separately.

| Layer | What it protects | Where it is configured |
|-------|-----------------|------------------------|
| Control connection | Login, SQL, session | `?tls=true` / `?validateservercertificate=0` in the connection URI |
| HTTP transport tunnel | Bulk data (CSV / Parquet / Arrow) | `.use_tls(bool)` on the import/export options struct |

### Docker / local development

The Exasol Docker container uses a self-signed certificate for the control connection. Use `?validateservercertificate=0` in the URI to accept it. For the HTTP transport tunnel, use `.use_tls(false)` — Exasol Docker's ad-hoc per-session certificate handling for the data channel is separate from the control-channel certificate and does not integrate with standard TLS validation.

```rust
use exarrow_rs::adbc::Driver;
use exarrow_rs::import::CsvImportOptions;

// Local Docker / self-signed certificate
let driver = Driver::new();
let database = driver.open(
    "exasol://user:password@localhost:8563/my_schema?validateservercertificate=0"
)?;
let mut connection = database.connect().await?;

// HTTP transport does NOT use TLS against Docker:
let options = CsvImportOptions::default().use_tls(false);
connection.import_csv_from_files("my_table", &["data.csv"], options).await?;
```

### Exasol SaaS / production

In production and SaaS environments both layers should use TLS. The connection URI does not need `validateservercertificate=0` because the server presents a valid certificate. The HTTP transport uses `.use_tls(true)`, which is the correct setting for any Exasol host that supports TLS on the data channel.

```rust
use exarrow_rs::adbc::Driver;
use exarrow_rs::import::CsvImportOptions;

// Exasol SaaS / production (both layers TLS)
let driver = Driver::new();
let database = driver.open(
    "exasol://user:password@your-exasol-host.exasol.com:8563/my_schema"
)?;
let mut connection = database.connect().await?;

// HTTP transport uses TLS in production:
let options = CsvImportOptions::default().use_tls(true);
connection.import_csv_from_files("my_table", &["data.csv"], options).await?;
```

**Default**: `.use_tls(false)` is the Docker-safe default. Switch to `.use_tls(true)` for any production or SaaS Exasol host. The same `.use_tls(bool)` method is available on all six option builders: `CsvImportOptions`, `CsvExportOptions`, `ParquetImportOptions`, `ParquetExportOptions`, `ArrowImportOptions`, and `ArrowExportOptions`.

## Supported Formats

| Format    | Import | Export | Notes                         |
|-----------|--------|--------|-------------------------------|
| CSV       | Yes    | Yes    | Native Exasol format, fastest |
| Parquet   | Yes    | Yes    | Columnar with compression     |
| Arrow IPC | Yes    | Yes    | Direct RecordBatch transfer   |

## CSV Import

```rust
use exarrow_rs::import::CsvImportOptions;
use std::path::Path;

let options = CsvImportOptions::default()
    .column_separator(',')
    .skip_rows(1);  // Skip header row

let rows = connection.import_csv_from_file(
    "my_schema.my_table",
    Path::new("/path/to/data.csv"),
    options,
).await?;

println!("Imported {} rows", rows);
```

### CSV Options

| Option | Description |
|--------|-------------|
| `column_separator` | Field delimiter (default: `,`) |
| `skip_rows` | Number of header rows to skip |
| `row_separator` | Line ending (default: `\n`) |
| `null_string` | String representing NULL values |

## Parquet Import

```rust
use exarrow_rs::import::ParquetImportOptions;
use std::path::Path;

let rows = connection.import_from_parquet(
    "my_table",
    Path::new("/path/to/data.parquet"),
    ParquetImportOptions::default().with_batch_size(1024),
).await?;
```

### Auto Table Creation

Parquet imports can automatically create the target table by inferring the schema from the Parquet file metadata. This is useful when importing data without pre-defining the table structure.

```rust
use exarrow_rs::import::{ParquetImportOptions, ColumnNameMode};

let options = ParquetImportOptions::default()
    .with_create_table_if_not_exists(true)
    .column_name_mode(ColumnNameMode::Sanitize);

let rows = connection.import_from_parquet(
    "my_schema.new_table",  // Table will be created if it doesn't exist
    Path::new("/path/to/data.parquet"),
    options,
).await?;
```

The schema is inferred by reading only the Parquet metadata (not the full data), making it efficient even for large files.

### Column Name Modes

When auto-creating tables, column names can be handled in two ways:

| Mode | Behavior |
|------|----------|
| `Quoted` | Preserves original names exactly, wrapped in double quotes |
| `Sanitize` | Converts to uppercase, replaces invalid characters with `_`, quotes reserved words |

```rust
// Preserve exact column names (e.g., "customerId", "Order Date")
.column_name_mode(ColumnNameMode::Quoted)

// Sanitize for Exasol compatibility (e.g., CUSTOMERID, ORDER_DATE)
.column_name_mode(ColumnNameMode::Sanitize)
```

### Multi-File Schema Inference

When importing multiple Parquet files with auto table creation, the system computes a **union schema** that accommodates all files:

```rust
let files = vec![
    Path::new("/data/part-001.parquet"),
    Path::new("/data/part-002.parquet"),
];

let options = ParquetImportOptions::default()
    .with_create_table_if_not_exists(true);

let rows = connection.import_parquet_from_files(
    "combined_table",
    &files,
    options,
).await?;
```

Type widening rules when fields differ across files:
- Identical types remain unchanged
- `DECIMAL` types widen to max(precision), max(scale)
- `VARCHAR` types widen to max(size)
- `DECIMAL` + `DOUBLE` widens to `DOUBLE`
- Incompatible types fall back to `VARCHAR(2000000)`

### Native Parquet Import (Exasol 2025.1.11+)

When connected to Exasol 2025.1.11 or newer, the driver automatically uses **native Parquet import**: raw Parquet bytes are served to the server over HTTP range requests, and the SQL uses `IMPORT INTO ... FROM PARQUET AT '...;MaxConcurrentReads=1' FILE 'NNN.parquet'` with no CSV format clauses. This eliminates client-side decoding and re-encoding cost and reduces wire-bytes by 5–30x for typed columnar data. On older Exasol versions (7.x, 8.x) the driver continues to convert Parquet to CSV transparently — no change in behavior for existing deployments.

The server version is detected automatically at connection time. To override the auto-detected behavior, use `ParquetImportOptions::with_native_parquet`:

```rust
use exarrow_rs::import::ParquetImportOptions;

// Force CSV conversion path (e.g. for testing or compatibility):
let options = ParquetImportOptions::default().with_native_parquet(Some(false));

// Force native mode (errors on pre-2025.1.11 servers):
let options = ParquetImportOptions::default().with_native_parquet(Some(true));

// Default: auto-detect from server version (recommended):
let options = ParquetImportOptions::default().with_native_parquet(None);
```

The same override applies to `import_from_parquet`, `import_from_parquet_stream`, and `import_parquet_from_files`.

## Parquet Export

```rust
use exarrow_rs::export::{ExportSource, ParquetExportOptions, ParquetCompression};
use std::path::Path;

let rows = connection.export_to_parquet(
    ExportSource::Table {
        schema: None,
        name: "my_table".into(),
        columns: vec![]
    },
    Path::new("/tmp/export.parquet"),
    ParquetExportOptions::default().with_compression(ParquetCompression::Snappy),
).await?;
```

### Export Sources

Export from tables or queries:

```rust
// Export entire table
ExportSource::Table { schema: None, name: "users".into(), columns: vec![] }

// Export specific columns
ExportSource::Table {
    schema: Some("prod".into()),
    name: "users".into(),
    columns: vec!["id".into(), "name".into()]
}

// Export query results
ExportSource::Query("SELECT * FROM users WHERE active = true".into())
```

## Parallel Import

For large datasets, import multiple files in parallel:

```rust
use exarrow_rs::import::ParallelImportOptions;

let files = vec![
    Path::new("/data/part-001.csv"),
    Path::new("/data/part-002.csv"),
    Path::new("/data/part-003.csv"),
];

let rows = connection.import_csv_parallel(
    "my_table",
    &files,
    CsvImportOptions::default(),
    ParallelImportOptions::default().with_max_connections(4),
).await?;
```

This uses multiple HTTP connections to Exasol for higher throughput.