hdbconnect-arrow
Apache Arrow integration for the hdbconnect SAP HANA driver. Converts HANA result sets to Arrow RecordBatch format, enabling zero-copy interoperability with the entire Arrow ecosystem.
Why Arrow?
Apache Arrow is the universal columnar data format for analytics. By converting SAP HANA data to Arrow, you unlock seamless integration with:
| Category | Tools |
|---|---|
| DataFrames | Polars, pandas, Vaex, Dask |
| Query engines | DataFusion, DuckDB, ClickHouse, Ballista |
| ML/AI | Ray, Hugging Face Datasets, PyTorch, TensorFlow |
| Data lakes | Delta Lake, Apache Iceberg, Lance |
| Visualization | Perspective, Graphistry, Falcon |
| Languages | Rust, Python, R, Julia, Go, Java, C++ |
[!TIP] Arrow's columnar format enables vectorized processing — operations run 10-100x faster than row-by-row iteration.
Installation
[]
= "0.1"
Or with cargo-add:
[!IMPORTANT] Requires Rust 1.88 or later.
Usage
Basic batch processing
use ;
use ;
use Arc;
Schema mapping
use ;
use TypeId;
// Convert individual types
let arrow_type = hana_type_to_arrow;
// Returns: DataType::Decimal128(18, 2)
// Convert entire field metadata
let arrow_field = hana_field_to_arrow;
Custom batch size
use BatchConfig;
use NonZeroUsize;
let config = new;
Ecosystem integration
Query HANA data with SQL using Apache DataFusion:
use *;
let batches = collect_batches_from_hana?;
let ctx = new;
ctx.register_batch?;
let df = ctx.sql.await?;
df.show.await?;
Load Arrow data directly into DuckDB:
use ;
let conn = open_in_memory?;
conn.register_arrow?;
let mut stmt = conn.prepare?;
let result = stmt.query_arrow?;
Convert to Polars DataFrame:
use *;
let batch = processor.flush?.unwrap;
let df = try_from?;
let result = df.lazy
.filter
.group_by
.agg
.collect?;
Serialize Arrow data for storage or network transfer:
use FileWriter;
use ArrowWriter;
use File;
// Arrow IPC (Feather) format
let file = create?;
let mut writer = try_new?;
writer.write?;
writer.finish?;
// Parquet format
let file = create?;
let mut writer = try_new?;
writer.write?;
writer.close?;
Export Arrow data to Python without copying (requires pyo3):
use PyArrowType;
use *;
// Python: df = pl.from_arrow(get_hana_data())
Features
Enable optional features in Cargo.toml:
[]
= { = "0.1", = ["async", "test-utils"] }
| Feature | Description | Default |
|---|---|---|
async |
Async support via hdbconnect_async |
No |
test-utils |
Expose MockRow/MockRowBuilder for testing |
No |
[!TIP] Enable
test-utilsin dev-dependencies for unit testing without a HANA connection.
Type mapping
| HANA Type | Arrow Type | Notes |
|---|---|---|
| TINYINT | UInt8 | Unsigned in HANA |
| SMALLINT | Int16 | |
| INT | Int32 | |
| BIGINT | Int64 | |
| REAL | Float32 | |
| DOUBLE | Float64 | |
| DECIMAL(p,s) | Decimal128(p,s) | Full precision preserved |
| CHAR, VARCHAR | Utf8 | |
| NCHAR, NVARCHAR | Utf8 | Unicode strings |
| CLOB, NCLOB | LargeUtf8 | Large text |
| BLOB | LargeBinary | Large binary |
| DATE | Date32 | Days since epoch |
| TIME | Time64(Nanosecond) | |
| TIMESTAMP | Timestamp(Nanosecond) | |
| BOOLEAN | Boolean | |
| GEOMETRY, POINT | Binary | WKB format |
API overview
HanaBatchProcessor— Converts HANA rows to ArrowRecordBatchwith configurable batch sizesBatchConfig— Configuration for batch processing (usesNonZeroUsizefor type-safe batch size)SchemaMapper— Maps HANA result set metadata to Arrow schemasBuilderFactory— Creates appropriate Arrow array builders for HANA typesTypeCategory— Centralized HANA type classification enum
HanaCompatibleBuilder— Trait for Arrow builders that accept HANA valuesFromHanaValue— Sealed trait for type-safe value conversionBatchProcessor— Core batch processing interfaceLendingBatchIterator— GAT-based streaming iterator for large result setsRowLike— Row abstraction for testing without HANA connection
When test-utils feature is enabled:
use ;
let row = new
.push_i64
.push_string
.push_null
.build;
use ;
Performance
The crate is optimized for high-throughput data transfer:
- Zero-copy conversion — Arrow builders write directly to memory without intermediate allocations
- Batch processing — Configurable batch sizes to balance memory usage and throughput
- Decimal optimization — Direct BigInt arithmetic avoids string parsing overhead
- Builder reuse — Builders reset between batches, eliminating repeated allocations
[!NOTE] For large result sets, use
LendingBatchIteratorto stream data with constant memory usage.
Part of pyhdb-rs
This crate is part of the pyhdb-rs workspace, providing the Arrow integration layer for the Python SAP HANA driver.
Related crates:
hdbconnect-py— PyO3 bindings exposing Arrow data to Python
Resources
- Apache Arrow — Official Arrow project
- Arrow Rust — Rust implementation
- DataFusion — Query engine built on Arrow
- Powered by Arrow — Projects using Arrow
MSRV policy
[!NOTE] Minimum Supported Rust Version: 1.88. MSRV increases are minor version bumps.
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT license (LICENSE-MIT)
at your option.