arrow-tiberius 0.1.0

Apache Arrow and SQL Server bridge through Tiberius
Documentation

arrow-tiberius

Crates.io Docs.rs License

arrow-tiberius is a Rust library for bridging Apache Arrow and Microsoft SQL Server through the Tiberius TDS driver.

The crate is designed around a bidirectional boundary:

Arrow Schema + RecordBatch values
    -> SQL Server write plan and DDL
    -> SQL Server bulk load through Tiberius

SQL Server metadata and rows through Tiberius
    -> Arrow schema and RecordBatch values

The v0.1 release implements the Arrow-to-SQL Server write path first. The public API is still intentionally shaped around Arrow, SQL Server profiles, structured diagnostics, and directional modules so a SQL Server-to-Arrow read path can be added without renaming the crate or replacing the core model.

[!NOTE] v0.1 implements the Arrow-to-SQL Server direction only. SQL Server-to-Arrow reading is reserved for a later release.

Scope

In v0.1, arrow-tiberius provides:

  • Arrow-to-SQL Server schema planning.
  • SQL Server identifiers, type metadata, compatibility profile, and DDL helpers.
  • Structured planning and runtime diagnostics.
  • Arrow RecordBatch bulk writing through Tiberius.
  • Baseline and optimized writer backend selection.
  • SQL Server integration tests and writer benchmark harnesses.

It does not provide SQL Server-to-Arrow reads yet.

Quick Start

Add the crate:

[dependencies]
arrow-tiberius = "0.1"

Plan an Arrow schema and render deterministic CREATE TABLE SQL:

use arrow_schema::{DataType, Field, Schema};
use arrow_tiberius::{
    MssqlProfile, PlanOptions, TableName, create_table_sql_from_mappings,
    plan_arrow_schema_to_mssql_mappings,
};

fn main() -> arrow_tiberius::Result<()> {
    let schema = Schema::new(vec![
        Field::new("id", DataType::Int64, false),
        Field::new("name", DataType::Utf8, true),
    ]);

    let outcome = plan_arrow_schema_to_mssql_mappings(
        &schema,
        MssqlProfile::sql_server_2016_compat_100(),
        PlanOptions::default(),
    )?;

    let table = TableName::new("dbo", "people")?;
    let ddl = create_table_sql_from_mappings(&table, outcome.value());

    assert!(ddl.contains("CREATE TABLE [dbo].[people]"));
    Ok(())
}

Write batches to an existing SQL Server table with BulkWriter:

use arrow_array::RecordBatch;
use arrow_tiberius::{
    BulkWriter, MssqlProfile, PlanOptions, TableName, WriteBackend, WriteOptions,
    plan_arrow_schema_to_mssql_mappings,
};
use futures_util::io::{AsyncRead, AsyncWrite};

async fn write_batch<S>(
    client: &mut tiberius::Client<S>,
    batch: &RecordBatch,
) -> arrow_tiberius::Result<()>
where
    S: AsyncRead + AsyncWrite + Unpin + Send,
{
    let outcome = plan_arrow_schema_to_mssql_mappings(
        batch.schema().as_ref(),
        MssqlProfile::sql_server_2016_compat_100(),
        PlanOptions::default(),
    )?;

    let table = TableName::new("dbo", "people")?;
    let mut writer = BulkWriter::new(
        client,
        table,
        outcome.value().to_vec(),
        WriteOptions {
            backend: WriteBackend::DirectRawBulk,
            ..WriteOptions::default()
        },
    )
    .await?;

    writer.write_batch(batch).await?;
    writer.finish().await?;
    Ok(())
}

BulkWriter validates the target table metadata before sending rows. It does not create the target table automatically; callers can use the DDL helpers when they want this crate to produce the table definition.

Diagnostics

Planning and write failures return structured diagnostics instead of relying on string parsing. Callers can inspect severity, machine-readable code, field, row, and message.

use arrow_schema::{DataType, Field, Schema};
use arrow_tiberius::{
    Error, MssqlProfile, PlanOptions, plan_arrow_schema_to_mssql_mappings,
};

let schema = Schema::new(vec![Field::new("raw", DataType::UInt64, false)]);
let err = plan_arrow_schema_to_mssql_mappings(
    &schema,
    MssqlProfile::sql_server_2016_compat_100(),
    PlanOptions::default(),
)
.expect_err("UInt64 requires an explicit policy by default");

if let Error::Planning { diagnostics } = err {
    for diagnostic in diagnostics.all() {
        println!("{:?}: {}", diagnostic.code(), diagnostic.message());
    }
}

See Arrow to SQL Server Type Mapping for the full supported and unsupported mapping surface.

Writer Backends

WriteBackend controls how planned Arrow rows are sent to SQL Server:

Backend Purpose
Auto Default selection. Currently resolves to DirectRawBulk.
BaselineTokenRow Compatibility and reference path using Tiberius TokenRow bulk load.
DirectFramedBulk Direct Arrow-to-TDS row encoding through Tiberius framed writes.
DirectRawBulk Optimized direct encoder plus raw bulk packet writes from the Tiberius fork.

The direct raw backend is the optimized production path for currently supported mappings. The baseline backend remains useful for compatibility checks, debugging, and parity tests.

Examples

Compile-checked examples are available under examples/ and do not require SQL Server:

cargo run --example schema_to_ddl
cargo run --example planning_diagnostics
cargo run --example backend_selection
cargo run --example policy_dependent_planning

The examples cover schema to DDL, planning diagnostics, backend selection, and policy-dependent planning.

An environment-gated SQL Server write example is also available:

ARROW_TIBERIUS_EXAMPLE_MSSQL_URL='server=tcp:localhost,1433;user=sa;password=...;TrustServerCertificate=true' \
  cargo run --example sqlserver_batch_write

By default it creates, writes to, and drops [dbo].[arrow_tiberius_example_write]. Set ARROW_TIBERIUS_EXAMPLE_KEEP_TABLE=1 to keep the disposable table, or set ARROW_TIBERIUS_EXAMPLE_MSSQL_SCHEMA, ARROW_TIBERIUS_EXAMPLE_MSSQL_TABLE, and ARROW_TIBERIUS_EXAMPLE_EXISTING_TABLE=1 to write to an existing table explicitly.

SQL Server Compatibility

The v0.1 profile targets SQL Server 2016 with database compatibility level 100:

use arrow_tiberius::MssqlProfile;

let profile = MssqlProfile::sql_server_2016_compat_100();

See Integration Tests for the SQL Server validation path used by this repository.

Tiberius Dependency Model

arrow-tiberius depends on the published tiberius-raw-bulk package as the crate name tiberius:

tiberius = { package = "tiberius-raw-bulk", version = "=0.12.3-raw-bulk.13", default-features = false, features = [
    "tds73",
    "winauth",
    "native-tls",
] }

If a downstream crate also constructs the SQL Server client passed to BulkWriter, it must use the same package identity:

[dependencies]
arrow-tiberius = "0.1"
tiberius = { package = "tiberius-raw-bulk", version = "=0.12.3-raw-bulk.13", default-features = false, features = [
    "tds73",
    "winauth",
    "native-tls",
] }

Depending on upstream tiberius separately creates a distinct crate type. A client from upstream tiberius is not the same type as a client from tiberius-raw-bulk and will not match the BulkWriter API.

The fork exists because upstream Tiberius does not expose the raw bulk packet APIs needed by the optimized direct writer. The baseline writer and direct writer use the same forked package dependency; only the optimized backend calls the raw-row APIs.

Feature Flags

Feature Default Purpose
bench-profile no Enables benchmark-only direct write profiling hooks and forwards to tiberius/bulk-load-profile.
integration-tests no Enables SQL Server integration tests that require explicit environment setup or the xtask runner.

Docs.rs is configured to build with all features so feature-gated public items are documented. Normal library use does not require either feature.

Validation

Default local validation does not require SQL Server:

cargo fmt --check
cargo clippy --workspace --all-targets --all-features -- -D warnings
cargo test --workspace

Run SQL Server integration tests through the xtask harness:

cargo xtask sqlserver-test

The harness starts SQL Server when possible, configures compatibility level 100, runs feature-gated integration tests, and cleans up managed resources. See Integration Tests for container runtime and existing-server options.

Writer benchmark commands and interpretation guidance are in Writer Benchmarks. The curated direct raw benchmark summary is in Direct Raw Benchmark Comparison.

Related Crates

arrow-odbc is the broader Arrow/ODBC crate. It targets ODBC data sources generally and supports reading and writing Arrow arrays through ODBC drivers. Use it when you need a database-agnostic ODBC path or SQL-to-Arrow reads today.

arrow-tiberius is narrower: it targets Microsoft SQL Server through Tiberius and focuses v0.1 on Arrow-to-SQL Server bulk writes. That narrower scope lets the direct raw backend use SQL Server-specific TDS bulk-load encoding instead of going through ODBC.

For the SQL Server write workloads this crate is built around, the local benchmark data generally favors DirectRawBulk: it is much faster than arrow-odbc on primitive and mixed nullable rows while using far less memory. Representative runs show about 3.05x throughput on primitive numeric rows with about 20 MiB peak RSS versus about 998 MiB, and about 1.66x throughput on mixed nullable rows with about 21 MiB peak RSS versus about 157 MiB. The main exception is some large variable-width text/binary workloads, where arrow-odbc can write about 1.28x to 1.37x faster but with roughly 1.4 GiB peak RSS versus about 100 MiB for DirectRawBulk. See primitive direct raw comparison and variable-width direct raw comparison for the measured numbers and setup.

Project Status

arrow-tiberius is preparing its first v0.1 release. The v0.1 release focus is Arrow-to-SQL Server writing. SQL Server-to-Arrow reading is reserved for a later release.

See v0.1 Release Boundary for the maintainer release scope, gates, and publication checklist.