arrow-odbc 0.23.0

Read/Write Apache Arrow arrays from/to ODBC data sources.
Documentation

arrow-odbc

Docs Licence Crates.io

Fill Apache Arrow arrays from ODBC data sources. This crate is build on top of the arrow and odbc-api crate and enables you to read the data of an ODBC data source as sequence of Apache Arrow record batches. This crate can be used to insert the contens of Arrow record batches into a database table, too.

About Arrow

Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.

About ODBC

ODBC (Open DataBase Connectivity) is a standard which enables you to access data from a wide variaty of data sources using SQL.

Usage

use arrow_odbc::{odbc_api::Environment, OdbcReader};

const CONNECTION_STRING: &str = "\
    Driver={ODBC Driver 17 for SQL Server};\
    Server=localhost;\
    UID=SA;\
    PWD=My@Test@Password1;\
";

fn main() -> Result<(), anyhow::Error> {

    let odbc_environment = Environment::new()?;
    
    // Connect with database.
    let connection = odbc_environment.connect_with_connection_string(CONNECTION_STRING)?;

    // This SQL statement does not require any arguments.
    let parameters = ();

    // Execute query and create result set
    let cursor = connection
        .execute("SELECT * FROM MyTable", parameters)?
        .expect("SELECT statement must produce a cursor");

    // Each batch shall only consist of maximum 10.000 rows.
    let max_batch_size = 10_000;

    // Read result set as arrow batches. Infer Arrow types automatically using the meta
    // information of `cursor`.
    let arrow_record_batches = OdbcReader::new(cursor, max_batch_size)?;

    for batch in arrow_record_batches {
        // ... process batch ...
    }
    Ok(())
}

Matching of ODBC to Arrow types then querying

ODBC Arrow
Numeric(p <= 38) Decimal128
Decimal(p <= 38) Decimal128
Integer Int32
SmallInt Int16
Real Float32
Float(p <=24) Float32
Double Float64
Float(p > 24) Float64
Date Date32
LongVarbinary Binary
Timestamp(p = 0) TimestampSecond
Timestamp(p: 1..3) TimestampMilliSecond
Timestamp(p: 4..6) TimestampMicroSecond
Timestamp(p >= 7 ) TimestampNanoSecond
BigInt Int64
TinyInt Int8
Bit Boolean
Varbinary Binary
Binary FixedSizedBinary
All others Utf8

Matching of Arrow to ODBC types then inserting

Arrow ODBC
Utf8 VarChar
Decimal128(p, s = 0) VarChar(p + 1)
Decimal128(p, s != 0) VarChar(p + 2)
Decimal256(p, s = 0) VarChar(p + 1)
Decimal256(p, s != 0) VarChar(p + 2)
Int8 TinyInt
Int16 SmallInt
Int32 Integer
Int64 BigInt
Float16 Real
Float32 Real
Float64 Double
Timestamp s Timestamp(7)
Timestamp ms Timestamp(7)
Timestamp us Timestamp(7)
Timestamp ns Timestamp(7)
Date32 Date
Date64 Date
Time32 s Time
Time32 ms VarChar(12)
Time64 us VarChar(15)
Time64 ns VarChar(16)
Binary Varbinary
FixedBinary(l) Varbinary(l)
All others Unsupported

The mapping for insertion is not the optimal yet, but before spending a lot of work on improving it I was curious that usecase would pop up for users. So if something does not work, but maybe could provided a better mapping of Arrow to ODBC types, feel free to open an issue. If you do so please give a lot of context of what you are trying to do.

Supported Arrow types

Appart from the afformentioned Arrow types Uint8 is also supported if specifying the Arrow schema directly.