connector_arrow 0.1.0

Load data from databases to Apache Arrow, the fastest way.
Documentation

Connector Arrow

An database client for many databases, exposing an interface that produces Apache Arrow.

Documentation

Inspired by ConnectorX, with focus on being a Rust library, instead of a Python library.

To be more specific, this crate:

  • does not support multiple destinations, but only arrow,
  • does not include parallelism, but allows downstream creates to implement it themselves,
  • does not include connection pooling, but allows downstream creates to implement it themselves,
  • uses minimal dependencies (it even disables default features).

API features

  • Querying that returns Vec<RecordBatch>
  • Record batch streaming
  • Query parameters
  • Writing to the data store

Sources

None of the sources are enabled by default, use src_ features to enable them:

  • SQLite (src_sqlite, using rusqlite)
  • DuckDB (src_duckdb)
  • PostgreSQL (src_postgres)
  • Redshift (through postgres protocol, untested)
  • MySQL
  • MariaDB (through mysql protocol)
  • ClickHouse (through mysql protocol)
  • SQL Server
  • Azure SQL Database (through mssql protocol)
  • Oracle
  • Big Query

Types

When converting non-arrow data sources (everything except DuckDB), only a subset of all possible arrows types is produced. Here is a list of what it is currently possible to produce:

  • Null
  • Boolean
  • Int8
  • Int16
  • Int32
  • Int64
  • UInt8
  • UInt16
  • UInt32
  • UInt64
  • Float16
  • Float32
  • Float64
  • Timestamp
  • Date32
  • Date64
  • Time32
  • Time64
  • Duration
  • Interval
  • Binary
  • FixedSizeBinary
  • LargeBinary
  • Utf8
  • LargeUtf8
  • List
  • FixedSizeList
  • LargeList
  • Struct
  • Union
  • Dictionary
  • Decimal128
  • Decimal256
  • Map
  • RunEndEncoded

This restriction mostly has to do with non-trivial mapping of Arrow type into Rust native types.