Crate arrow_udf_js
source ·Expand description
§JavaScript UDF for Apache Arrow
§Usage
Add the following lines to your Cargo.toml:
[dependencies]
arrow-udf-js = "0.1"
Create a Runtime and define your JS functions in string form.
Note that the function must be exported and its name must match the one you pass to add_function.
use arrow_udf_js::{Runtime, CallMode};
let mut runtime = Runtime::new().unwrap();
runtime
.add_function(
"gcd",
arrow_schema::DataType::Int32,
CallMode::ReturnNullOnNullInput,
r#"
export function gcd(a, b) {
while (b != 0) {
let t = b;
b = a % b;
a = t;
}
return a;
}
"#,
)
.unwrap();You can then call the JS function on a RecordBatch:
ⓘ
let input: RecordBatch = ...;
let output: RecordBatch = runtime.call("gcd", &input).unwrap();If you print the input and output batch, it will be like this:
input output
+----+----+-----+
| a | b | gcd |
+----+----+-----+
| 15 | 25 | 5 |
| | 1 | |
+----+----+-----+
For set-returning functions (or so-called table functions), define the function as a generator:
use arrow_udf_js::{Runtime, CallMode};
let mut runtime = Runtime::new().unwrap();
runtime
.add_function(
"range",
arrow_schema::DataType::Int32,
CallMode::ReturnNullOnNullInput,
r#"
export function* range(n) {
for (let i = 0; i < n; i++) {
yield i;
}
}
"#,
)
.unwrap();You can then call the table function via call_table_function:
ⓘ
let chunk_size = 1024;
let input: RecordBatch = ...;
let outputs = runtime.call_table_function("range", &input, chunk_size).unwrap();
for result in outputs {
let output: RecordBatch = result?;
// do something with the output
}If you print the output batch, it will be like this:
+-----+-------+
| row | range |
+-----+-------+
| 0 | 0 |
| 2 | 0 |
| 2 | 1 |
| 2 | 2 |
+-----+-------+
The JS code will be run in an embedded QuickJS interpreter.
See the example for more details.
§Type Mapping
The following table shows the type mapping between Arrow and JavaScript:
| Arrow Type | JS Type | Note |
|---|---|---|
| Null | null | |
| Boolean | boolean | |
| Int8 | number | |
| Int16 | number | |
| Int32 | number | |
| Int64 | number | |
| UInt8 | number | |
| UInt16 | number | |
| UInt32 | number | |
| UInt64 | number | |
| Float32 | number | |
| Float64 | number | |
| Utf8 | string | |
| Binary | Uint8Array | |
| LargeString (json) | null, boolean, number, string, array or object | JSON.parse(string) |
| LargeBinary (decimal) | BigDecimal | |
| List(Int8) | Int8Array | |
| List(Int16) | Int16Array | |
| List(Int32) | Int32Array | |
| List(Int64) | BigInt64Array | |
| List(UInt8) | Uint8Array | |
| List(UInt16) | Uint16Array | |
| List(UInt32) | Uint32Array | |
| List(UInt64) | BigUint64Array | |
| List(Float32) | Float32Array | |
| List(Float64) | Float64Array | |
| List(others) | Array | |
| Struct | object |
Structs§
- An iterator over the result of a table function.
- The JS UDF runtime.
Enums§
- Whether the function will be called when some of its arguments are null.