arrow-tiberius
arrow-tiberius is a Rust library for bridging Apache Arrow and Microsoft SQL
Server through the Tiberius TDS driver.
The crate is designed around a bidirectional boundary:
Arrow Schema + RecordBatch values
-> SQL Server write plan and DDL
-> SQL Server bulk load through Tiberius
SQL Server metadata and rows through Tiberius
-> Arrow schema and RecordBatch values
The v0.1 release implements the Arrow-to-SQL Server write path first. The public API is still intentionally shaped around Arrow, SQL Server profiles, structured diagnostics, and directional modules so a SQL Server-to-Arrow read path can be added without renaming the crate or replacing the core model.
[!NOTE] v0.1 implements the Arrow-to-SQL Server direction only. SQL Server-to-Arrow reading is reserved for a later release.
Scope
In v0.1, arrow-tiberius provides:
- Arrow-to-SQL Server schema planning.
- SQL Server identifiers, type metadata, compatibility profile, and DDL helpers.
- Structured planning and runtime diagnostics.
- Arrow
RecordBatchbulk writing through Tiberius. - Baseline and optimized writer backend selection.
- SQL Server integration tests and writer benchmark harnesses.
It does not provide SQL Server-to-Arrow reads yet.
Quick Start
Add the crate:
[]
= "0.1"
Plan an Arrow schema and render deterministic CREATE TABLE SQL:
use ;
use ;
Write batches to an existing SQL Server table with BulkWriter:
use RecordBatch;
use ;
use ;
async
BulkWriter validates the target table metadata before sending rows. It does
not create the target table automatically; callers can use the DDL helpers when
they want this crate to produce the table definition.
Diagnostics
Planning and write failures return structured diagnostics instead of relying on string parsing. Callers can inspect severity, machine-readable code, field, row, and message.
use ;
use ;
let schema = new;
let err = plan_arrow_schema_to_mssql_mappings
.expect_err;
if let Planning = err
See Arrow to SQL Server Type Mapping for the full supported and unsupported mapping surface.
Writer Backends
WriteBackend controls how planned Arrow rows are sent to SQL Server:
| Backend | Purpose |
|---|---|
Auto |
Default selection. Currently resolves to DirectRawBulk. |
BaselineTokenRow |
Compatibility and reference path using Tiberius TokenRow bulk load. |
DirectFramedBulk |
Direct Arrow-to-TDS row encoding through Tiberius framed writes. |
DirectRawBulk |
Optimized direct encoder plus raw bulk packet writes from the Tiberius fork. |
The direct raw backend is the optimized production path for currently supported mappings. The baseline backend remains useful for compatibility checks, debugging, and parity tests.
Examples
Compile-checked examples are available under examples/ and do not require SQL
Server:
The examples cover schema to DDL, planning diagnostics, backend selection, and policy-dependent planning.
An environment-gated SQL Server write example is also available:
ARROW_TIBERIUS_EXAMPLE_MSSQL_URL='server=tcp:localhost,1433;user=sa;password=...;TrustServerCertificate=true' \
By default it creates, writes to, and drops [dbo].[arrow_tiberius_example_write].
Set ARROW_TIBERIUS_EXAMPLE_KEEP_TABLE=1 to keep the disposable table, or set
ARROW_TIBERIUS_EXAMPLE_MSSQL_SCHEMA, ARROW_TIBERIUS_EXAMPLE_MSSQL_TABLE,
and ARROW_TIBERIUS_EXAMPLE_EXISTING_TABLE=1 to write to an existing table
explicitly.
SQL Server Compatibility
The v0.1 profile targets SQL Server 2016 with database compatibility level 100:
use MssqlProfile;
let profile = sql_server_2016_compat_100;
See Integration Tests for the SQL Server validation path used by this repository.
Tiberius Dependency Model
arrow-tiberius depends on the published tiberius-raw-bulk package as the
crate name tiberius:
= { = "tiberius-raw-bulk", = "=0.12.3-raw-bulk.13", = false, = [
"tds73",
"winauth",
"native-tls",
] }
If a downstream crate also constructs the SQL Server client passed to
BulkWriter, it must use the same package identity:
[]
= "0.1"
= { = "tiberius-raw-bulk", = "=0.12.3-raw-bulk.13", = false, = [
"tds73",
"winauth",
"native-tls",
] }
Depending on upstream tiberius separately creates a distinct crate type. A
client from upstream tiberius is not the same type as a client from
tiberius-raw-bulk and will not match the BulkWriter API.
The fork exists because upstream Tiberius does not expose the raw bulk packet APIs needed by the optimized direct writer. The baseline writer and direct writer use the same forked package dependency; only the optimized backend calls the raw-row APIs.
Feature Flags
| Feature | Default | Purpose |
|---|---|---|
bench-profile |
no | Enables benchmark-only direct write profiling hooks and forwards to tiberius/bulk-load-profile. |
integration-tests |
no | Enables SQL Server integration tests that require explicit environment setup or the xtask runner. |
Docs.rs is configured to build with all features so feature-gated public items are documented. Normal library use does not require either feature.
Validation
Default local validation does not require SQL Server:
Run SQL Server integration tests through the xtask harness:
The harness starts SQL Server when possible, configures compatibility level 100, runs feature-gated integration tests, and cleans up managed resources. See Integration Tests for container runtime and existing-server options.
Writer benchmark commands and interpretation guidance are in Writer Benchmarks. The curated direct raw benchmark summary is in Direct Raw Benchmark Comparison.
Related Crates
arrow-odbc is the broader Arrow/ODBC crate. It
targets ODBC data sources generally and supports reading and writing Arrow
arrays through ODBC drivers. Use it when you need a database-agnostic ODBC path
or SQL-to-Arrow reads today.
arrow-tiberius is narrower: it targets Microsoft SQL Server through Tiberius
and focuses v0.1 on Arrow-to-SQL Server bulk writes. That narrower scope lets
the direct raw backend use SQL Server-specific TDS bulk-load encoding instead of
going through ODBC.
For the SQL Server write workloads this crate is built around, the local
benchmark data generally favors DirectRawBulk: it is much faster than
arrow-odbc on primitive and mixed nullable rows while using far less memory.
Representative runs show about 3.05x throughput on primitive numeric rows
with about 20 MiB peak RSS versus about 998 MiB, and about 1.66x throughput on
mixed nullable rows with about 21 MiB peak RSS versus about 157 MiB. The main
exception is some large variable-width text/binary workloads, where arrow-odbc
can write about 1.28x to 1.37x faster but with roughly 1.4 GiB peak RSS versus
about 100 MiB for DirectRawBulk. See
primitive direct raw comparison
and
variable-width direct raw comparison
for the measured numbers and setup.
Project Status
arrow-tiberius is preparing its first v0.1 release. The v0.1 release focus is
Arrow-to-SQL Server writing. SQL Server-to-Arrow reading is reserved for a later
release.
See v0.1 Release Boundary for the maintainer release scope, gates, and publication checklist.