hdbconnect-arrow

Apache Arrow integration for the hdbconnect SAP HANA driver. Converts HANA result sets to Arrow RecordBatch format, enabling zero-copy interoperability with the entire Arrow ecosystem.

Why Arrow?

Apache Arrow is the universal columnar data format for analytics. By converting SAP HANA data to Arrow, you unlock seamless integration with:

Category	Tools
DataFrames	Polars, pandas, Vaex, Dask
Query engines	DataFusion, DuckDB, ClickHouse, Ballista
ML/AI	Ray, Hugging Face Datasets, PyTorch, TensorFlow
Data lakes	Delta Lake, Apache Iceberg, Lance
Visualization	Perspective, Graphistry, Falcon
Languages	Rust, Python, R, Julia, Go, Java, C++

[!TIP] Arrow's columnar format enables vectorized processing — operations run 10-100x faster than row-by-row iteration.

Installation

[dependencies]
hdbconnect-arrow = "0.3"

Or with cargo-add:

cargo add hdbconnect-arrow

[!IMPORTANT] Requires Rust 1.88 or later.

Usage

Basic batch processing

use hdbconnect_arrow::{HanaBatchProcessor, BatchConfig, Result};
use arrow_schema::{Schema, Field, DataType};
use std::sync::Arc;

fn process_results(result_set: hdbconnect::ResultSet) -> Result<()> {
    let schema = Arc::new(Schema::new(vec![
        Field::new("id", DataType::Int64, false),
        Field::new("name", DataType::Utf8, true),
    ]));

    let config = BatchConfig::default();
    let mut processor = HanaBatchProcessor::new(Arc::clone(&schema), config);

    for row in result_set {
        if let Some(batch) = processor.process_row(&row?)? {
            println!("Batch with {} rows", batch.num_rows());
        }
    }

    // Flush remaining rows
    if let Some(batch) = processor.flush()? {
        println!("Final batch with {} rows", batch.num_rows());
    }

    Ok(())
}

Schema mapping

use hdbconnect_arrow::{hana_type_to_arrow, hana_field_to_arrow};
use hdbconnect::TypeId;

// Convert individual types
let arrow_type = hana_type_to_arrow(TypeId::DECIMAL, Some(18), Some(2));
// Returns: DataType::Decimal128(18, 2)

// Convert entire field metadata
let arrow_field = hana_field_to_arrow(&hana_field_metadata);

Custom batch size

use hdbconnect_arrow::BatchConfig;
use std::num::NonZeroUsize;

let config = BatchConfig::new(NonZeroUsize::new(10_000).unwrap());

Performance

The crate is optimized for high-throughput data transfer with several performance enhancements in v0.3.2:

Optimization Techniques

Enum-based dispatch — Eliminates vtable overhead by replacing Box<dyn HanaCompatibleBuilder> with BuilderEnum, resulting in ~10-20% performance improvement through better cache locality and monomorphized dispatch
Homogeneous loop hoisting — Detects schemas where all columns share the same type and hoists the dispatch match outside the row loop for +4-8% throughput on wide tables (100+ columns)
Zero-copy decimal conversion — Uses Cow::Borrowed to avoid cloning BigInt during decimal conversion, improving decimal throughput by +222% (55 → 177 Melem/s) and saving 8MB per 1M decimals
String capacity pre-sizing — Extracts max_length from HANA field metadata to pre-allocate optimal buffer sizes, reducing reallocation overhead by 2-3x per string column
Batch processing — Configurable batch sizes to balance memory usage and throughput
Builder reuse — Builders reset between batches, eliminating repeated allocations

[!TIP] For large result sets, use LendingBatchIterator to stream data with constant memory usage.

Profiling Support

Enable the optional profiling feature flag to integrate dhat heap profiler:

[dependencies]
hdbconnect-arrow = { version = "0.3", features = ["profiling"] }

This enables allocation tracking with zero impact on release builds through conditional compilation. See src/profiling.rs for usage examples.

Ecosystem integration

Query HANA data with SQL using Apache DataFusion:

use datafusion::prelude::*;

let batches = collect_batches_from_hana(result_set)?;
let ctx = SessionContext::new();
ctx.register_batch("hana_data", batches[0].clone())?;

let df = ctx.sql("SELECT * FROM hana_data WHERE amount > 1000").await?;
df.show().await?;

Load Arrow data directly into DuckDB:

use duckdb::{Connection, arrow::record_batch_to_duckdb};

let conn = Connection::open_in_memory()?;
conn.register_arrow("sales", batches)?;

let mut stmt = conn.prepare("SELECT region, SUM(amount) FROM sales GROUP BY region")?;
let result = stmt.query_arrow([])?;

Convert to Polars DataFrame:

use polars::prelude::*;

let batch = processor.flush()?.unwrap();
let df = DataFrame::try_from(batch)?;

let result = df.lazy()
    .filter(col("status").eq(lit("active")))
    .group_by([col("region")])
    .agg([col("amount").sum()])
    .collect()?;

Serialize Arrow data for storage or network transfer:

use arrow_ipc::writer::FileWriter;
use parquet::arrow::ArrowWriter;
use std::fs::File;

// Arrow IPC (Feather) format
let file = File::create("data.arrow")?;
let mut writer = FileWriter::try_new(file, &schema)?;
writer.write(&batch)?;
writer.finish()?;

// Parquet format
let file = File::create("data.parquet")?;
let mut writer = ArrowWriter::try_new(file, schema.clone(), None)?;
writer.write(&batch)?;
writer.close()?;

Export Arrow data to Python without copying (requires pyo3):

use pyo3_arrow::PyArrowType;
use pyo3::prelude::*;

#[pyfunction]
fn get_hana_data(py: Python<'_>) -> PyResult<PyArrowType<RecordBatch>> {
    let batch = fetch_from_hana()?;
    Ok(PyArrowType(batch))
}

// Python: df = pl.from_arrow(get_hana_data())

Features

Enable optional features in Cargo.toml:

[dependencies]
hdbconnect-arrow = { version = "0.3", features = ["async", "test-utils", "profiling"] }

Feature	Description	Default
`async`	Async support via `hdbconnect_async`	No
`test-utils`	Expose `MockRow`/`MockRowBuilder` for testing	No
`profiling`	dhat heap profiler integration for performance analysis	No

[!TIP] Enable test-utils in dev-dependencies for unit testing without a HANA connection.

Type mapping

HANA Type	Arrow Type	Notes
TINYINT	UInt8	Unsigned in HANA
SMALLINT	Int16
INT	Int32
BIGINT	Int64
REAL	Float32
DOUBLE	Float64
DECIMAL(p,s)	Decimal128(p,s)	Full precision preserved
CHAR, VARCHAR	Utf8
NCHAR, NVARCHAR	Utf8	Unicode strings
CLOB, NCLOB	LargeUtf8	Large text
BLOB	LargeBinary	Large binary
DATE	Date32	Days since epoch
TIME	Time64(Nanosecond)
TIMESTAMP	Timestamp(Nanosecond)
BOOLEAN	Boolean
GEOMETRY, POINT	Binary	WKB format

API overview

HanaBatchProcessor — Converts HANA rows to Arrow RecordBatch with configurable batch sizes
BatchConfig — Configuration for batch processing (uses NonZeroUsize for type-safe batch size)
SchemaMapper — Maps HANA result set metadata to Arrow schemas
BuilderFactory — Creates appropriate Arrow array builders for HANA types
TypeCategory — Centralized HANA type classification enum

BuilderEnum — Enum-wrapped builder for static dispatch (eliminates vtable overhead)
BuilderKind — Discriminant identifying builder type for schema profiling
SchemaProfile — Classifies schemas as homogeneous or mixed for optimized processing paths

HanaCompatibleBuilder — Trait for Arrow builders that accept HANA values
FromHanaValue — Sealed trait for type-safe value conversion
BatchProcessor — Core batch processing interface
LendingBatchIterator — GAT-based streaming iterator for large result sets
RowLike — Row abstraction for testing without HANA connection

When test-utils feature is enabled:

use hdbconnect_arrow::{MockRow, MockRowBuilder};

let row = MockRowBuilder::new()
    .push_i64(42)
    .push_string("test")
    .push_null()
    .build();

use hdbconnect_arrow::{ArrowConversionError, Result};

fn convert_data() -> Result<()> {
    // ArrowConversionError covers:
    // - Type mismatches
    // - Decimal overflow
    // - Schema incompatibilities
    // - Invalid batch configuration
    Ok(())
}

Part of pyhdb-rs

This crate is part of the pyhdb-rs workspace, providing the Arrow integration layer for the Python SAP HANA driver.

Related crates:

hdbconnect-py — PyO3 bindings exposing Arrow data to Python

Resources

Apache Arrow — Official Arrow project
Arrow Rust — Rust implementation
DataFusion — Query engine built on Arrow
Powered by Arrow — Projects using Arrow

MSRV policy

[!NOTE] Minimum Supported Rust Version: 1.88. MSRV increases are minor version bumps.

License

Licensed under either of:

Apache License, Version 2.0 (LICENSE-APACHE)
MIT license (LICENSE-MIT)

at your option.

hdbconnect-arrow 0.3.2