hdbconnect-arrow 0.2.4

Apache Arrow integration for hdbconnect SAP HANA driver
Documentation

hdbconnect-arrow

Crates.io docs.rs codecov MSRV License

Apache Arrow integration for the hdbconnect SAP HANA driver. Converts HANA result sets to Arrow RecordBatch format, enabling zero-copy interoperability with the entire Arrow ecosystem.

Why Arrow?

Apache Arrow is the universal columnar data format for analytics. By converting SAP HANA data to Arrow, you unlock seamless integration with:

Category Tools
DataFrames Polars, pandas, Vaex, Dask
Query engines DataFusion, DuckDB, ClickHouse, Ballista
ML/AI Ray, Hugging Face Datasets, PyTorch, TensorFlow
Data lakes Delta Lake, Apache Iceberg, Lance
Visualization Perspective, Graphistry, Falcon
Languages Rust, Python, R, Julia, Go, Java, C++

[!TIP] Arrow's columnar format enables vectorized processing — operations run 10-100x faster than row-by-row iteration.

Installation

[dependencies]
hdbconnect-arrow = "0.1"

Or with cargo-add:

cargo add hdbconnect-arrow

[!IMPORTANT] Requires Rust 1.88 or later.

Usage

Basic batch processing

use hdbconnect_arrow::{HanaBatchProcessor, BatchConfig, Result};
use arrow_schema::{Schema, Field, DataType};
use std::sync::Arc;

fn process_results(result_set: hdbconnect::ResultSet) -> Result<()> {
    let schema = Arc::new(Schema::new(vec![
        Field::new("id", DataType::Int64, false),
        Field::new("name", DataType::Utf8, true),
    ]));

    let config = BatchConfig::default();
    let mut processor = HanaBatchProcessor::new(Arc::clone(&schema), config);

    for row in result_set {
        if let Some(batch) = processor.process_row(&row?)? {
            println!("Batch with {} rows", batch.num_rows());
        }
    }

    // Flush remaining rows
    if let Some(batch) = processor.flush()? {
        println!("Final batch with {} rows", batch.num_rows());
    }

    Ok(())
}

Schema mapping

use hdbconnect_arrow::{hana_type_to_arrow, hana_field_to_arrow};
use hdbconnect::TypeId;

// Convert individual types
let arrow_type = hana_type_to_arrow(TypeId::DECIMAL, Some(18), Some(2));
// Returns: DataType::Decimal128(18, 2)

// Convert entire field metadata
let arrow_field = hana_field_to_arrow(&hana_field_metadata);

Custom batch size

use hdbconnect_arrow::BatchConfig;
use std::num::NonZeroUsize;

let config = BatchConfig::new(NonZeroUsize::new(10_000).unwrap());

Ecosystem integration

Query HANA data with SQL using Apache DataFusion:

use datafusion::prelude::*;

let batches = collect_batches_from_hana(result_set)?;
let ctx = SessionContext::new();
ctx.register_batch("hana_data", batches[0].clone())?;

let df = ctx.sql("SELECT * FROM hana_data WHERE amount > 1000").await?;
df.show().await?;

Load Arrow data directly into DuckDB:

use duckdb::{Connection, arrow::record_batch_to_duckdb};

let conn = Connection::open_in_memory()?;
conn.register_arrow("sales", batches)?;

let mut stmt = conn.prepare("SELECT region, SUM(amount) FROM sales GROUP BY region")?;
let result = stmt.query_arrow([])?;

Convert to Polars DataFrame:

use polars::prelude::*;

let batch = processor.flush()?.unwrap();
let df = DataFrame::try_from(batch)?;

let result = df.lazy()
    .filter(col("status").eq(lit("active")))
    .group_by([col("region")])
    .agg([col("amount").sum()])
    .collect()?;

Serialize Arrow data for storage or network transfer:

use arrow_ipc::writer::FileWriter;
use parquet::arrow::ArrowWriter;
use std::fs::File;

// Arrow IPC (Feather) format
let file = File::create("data.arrow")?;
let mut writer = FileWriter::try_new(file, &schema)?;
writer.write(&batch)?;
writer.finish()?;

// Parquet format
let file = File::create("data.parquet")?;
let mut writer = ArrowWriter::try_new(file, schema.clone(), None)?;
writer.write(&batch)?;
writer.close()?;

Export Arrow data to Python without copying (requires pyo3):

use pyo3_arrow::PyArrowType;
use pyo3::prelude::*;

#[pyfunction]
fn get_hana_data(py: Python<'_>) -> PyResult<PyArrowType<RecordBatch>> {
    let batch = fetch_from_hana()?;
    Ok(PyArrowType(batch))
}

// Python: df = pl.from_arrow(get_hana_data())

Features

Enable optional features in Cargo.toml:

[dependencies]
hdbconnect-arrow = { version = "0.1", features = ["async", "test-utils"] }
Feature Description Default
async Async support via hdbconnect_async No
test-utils Expose MockRow/MockRowBuilder for testing No

[!TIP] Enable test-utils in dev-dependencies for unit testing without a HANA connection.

Type mapping

HANA Type Arrow Type Notes
TINYINT UInt8 Unsigned in HANA
SMALLINT Int16
INT Int32
BIGINT Int64
REAL Float32
DOUBLE Float64
DECIMAL(p,s) Decimal128(p,s) Full precision preserved
CHAR, VARCHAR Utf8
NCHAR, NVARCHAR Utf8 Unicode strings
CLOB, NCLOB LargeUtf8 Large text
BLOB LargeBinary Large binary
DATE Date32 Days since epoch
TIME Time64(Nanosecond)
TIMESTAMP Timestamp(Nanosecond)
BOOLEAN Boolean
GEOMETRY, POINT Binary WKB format

API overview

  • HanaBatchProcessor — Converts HANA rows to Arrow RecordBatch with configurable batch sizes
  • BatchConfig — Configuration for batch processing (uses NonZeroUsize for type-safe batch size)
  • SchemaMapper — Maps HANA result set metadata to Arrow schemas
  • BuilderFactory — Creates appropriate Arrow array builders for HANA types
  • TypeCategory — Centralized HANA type classification enum
  • HanaCompatibleBuilder — Trait for Arrow builders that accept HANA values
  • FromHanaValue — Sealed trait for type-safe value conversion
  • BatchProcessor — Core batch processing interface
  • LendingBatchIterator — GAT-based streaming iterator for large result sets
  • RowLike — Row abstraction for testing without HANA connection

When test-utils feature is enabled:

use hdbconnect_arrow::{MockRow, MockRowBuilder};

let row = MockRowBuilder::new()
    .push_i64(42)
    .push_string("test")
    .push_null()
    .build();
use hdbconnect_arrow::{ArrowConversionError, Result};

fn convert_data() -> Result<()> {
    // ArrowConversionError covers:
    // - Type mismatches
    // - Decimal overflow
    // - Schema incompatibilities
    // - Invalid batch configuration
    Ok(())
}

Performance

The crate is optimized for high-throughput data transfer:

  • Zero-copy conversion — Arrow builders write directly to memory without intermediate allocations
  • Batch processing — Configurable batch sizes to balance memory usage and throughput
  • Decimal optimization — Direct BigInt arithmetic avoids string parsing overhead
  • Builder reuse — Builders reset between batches, eliminating repeated allocations

[!NOTE] For large result sets, use LendingBatchIterator to stream data with constant memory usage.

Part of pyhdb-rs

This crate is part of the pyhdb-rs workspace, providing the Arrow integration layer for the Python SAP HANA driver.

Related crates:

Resources

MSRV policy

[!NOTE] Minimum Supported Rust Version: 1.88. MSRV increases are minor version bumps.

License

Licensed under either of:

at your option.