Crate arrow_udf_js

Source
Expand description

§JavaScript UDF for Apache Arrow

Crate Docs

§Usage

Add the following lines to your Cargo.toml:

[dependencies]
arrow-udf-js = "0.6"

Create a Runtime and define your JS functions in string form. Note that the function must be exported and its name must match the one you pass to add_function.

use arrow_udf_js::{FunctionOptions, Runtime};

let mut runtime = Runtime::new().await?;
runtime
    .add_function(
        "gcd",
        arrow_schema::DataType::Int32,
        r#"
        export function gcd(a, b) {
            while (b != 0) {
                let t = b;
                b = a % b;
                a = t;
            }
            return a;
        }
        "#,
        FunctionOptions::default().return_null_on_null_input(),
    )
    .await?;

You can then call the JS function on a RecordBatch:

let input: RecordBatch = ...;
let output: RecordBatch = runtime.call("gcd", &input).await?;

If you print the input and output batch, it will be like this:

 input     output
+----+----+-----+
| a  | b  | gcd |
+----+----+-----+
| 15 | 25 | 5   |
|    | 1  |     |
+----+----+-----+

For set-returning functions (or so-called table functions), define the function as a generator:

use arrow_udf_js::{FunctionOptions, Runtime};

let mut runtime = Runtime::new().await?;
runtime
    .add_function(
        "range",
        arrow_schema::DataType::Int32,
        r#"
        export function* range(n) {
            for (let i = 0; i < n; i++) {
                yield i;
            }
        }
        "#,
        FunctionOptions::default().return_null_on_null_input(),
    )
    .await?;

You can then call the table function via call_table_function:

let chunk_size = 1024;
let input: RecordBatch = ...;
let outputs = runtime.call_table_function("range", &input, chunk_size)?;
while let Some(output) = outputs.next().await {
    // do something with the output
}

If you print the output batch, it will be like this:

+-----+-------+
| row | range |
+-----+-------+
| 0   | 0     |
| 2   | 0     |
| 2   | 1     |
| 2   | 2     |
+-----+-------+

The JS code will be run in an embedded QuickJS interpreter.

See the example for more details.

§Type Mapping

The following table shows the type mapping between Arrow and JavaScript:

Arrow TypeJS Type
Nullnull
Booleanboolean
Int8number
Int16number
Int32number
Int64number
UInt8number
UInt16number
UInt32number
UInt64number
Float32number
Float64number
Stringstring
LargeStringstring
Date32Date
TimestampDate
Decimal128BigDecimal
Decimal256BigDecimal
BinaryUint8Array
LargeBinaryUint8Array
List(Int8)Int8Array
List(Int16)Int16Array
List(Int32)Int32Array
List(Int64)BigInt64Array
List(UInt8)Uint8Array
List(UInt16)Uint16Array
List(UInt32)Uint32Array
List(UInt64)BigUint64Array
List(Float32)Float32Array
List(Float64)Float64Array
List(others)Array
Structobject

This crate also supports the following Arrow extension types:

Extension TypePhysical TypeARROW:extension:nameJS Type
JSONString, Binary, LargeBinaryarrowudf.jsonany (parsed by JSON.parse(string))
DecimalStringarrowudf.decimalBigDecimal

§Async Functions and Fetch API

An async function is a JavaScript function that returns a promise. If the function involves IO operations, it’s usually more efficient to use async functions.

We provide a Fetch API to allow making HTTP requests from JavaScript UDFs in async way. To use it, you need to enable it in the Runtime:

runtime.enable_fetch();

To enable async functions, you need to set async_mode() when adding the function. Then you can use the async fetch() function in your JavaScript code.

runtime
    .add_function(
        "echo",
        DataType::Utf8View,
        r#"
export async function my_fetch_udf(id) {
    const response = await fetch("https://api.example.com/" + id);
    const data = await response.json();
    return data.value;
}
"#,
        FunctionOptions::default().return_null_on_null_input().async_mode(),
    )
    .await
    .unwrap();

See the README of the fetch module for more details.

§Batched Function

When a function is batched, it will be called once for all rows in the input RecordBatch. The input arguments will be an array of values.

runtime
    .add_function(
        "echo",
        DataType::Utf8View,
        r#"
export function echo(vals) {
    return vals.map(v => v + "!")
}
"#,
        FunctionOptions::default().return_null_on_null_input().batched(),
    )
    .await
    .unwrap();

Currently, table functions can not be batched.

Structs§

AggregateOptions
Options for configuring user-defined aggregate functions.
FunctionOptions
Options for configuring user-defined functions.
RecordBatchIter
An iterator over the result of a table function.
Runtime
A runtime to execute user defined functions in JavaScript.

Enums§

CallMode
Whether the function will be called when some of its arguments are null.

Traits§

IntoField
Converts a type into a Field. Implementors are DataType and Field.

Type Aliases§

MemoryUsage
A struct with information about the runtimes memory usage.