Rust UDF for Apache Arrow
Usage
Add the following lines to your Cargo.toml
:
[]
= "0.2"
Define your functions with the #[function]
macro:
use function;
The macro will generate a function that takes a RecordBatch
as input and returns a RecordBatch
as output.
The function can be named with the optional output
parameter.
If not specified, it will be named arbitrarily like gcd_int4_int4_int4_eval
.
You can then call the generated function on a RecordBatch
:
let input: RecordBatch = ...;
let output: RecordBatch = eval_gcd.unwrap;
If you print the input and output batch, it will be like this:
input output
+----+----+-----+
| a | b | gcd |
+----+----+-----+
| 15 | 25 | 5 |
| | 1 | |
+----+----+-----+
Fallible Functions
If your function returns a Result
:
use function;
The output batch will contain a column of errors. Error rows will be filled with NULL in the output column, and the error message will be stored in the error column.
input output
+----+----+-----+------------------+
| a | b | div | error |
+----+----+-----+------------------+
| 15 | 25 | 0 | |
| 5 | 0 | | division by zero |
+----+----+-----+------------------+
Struct Types
You can define a struct type with the StructType
trait:
use StructType;
Then you can use the struct type in function signatures:
use function;
Currently struct types are only supported as return types.
Function Registry
If you want to lookup functions by signature, you can enable the global_registry
feature:
[]
= { = "0.2", = ["global_registry"] }
Each function will be registered in a global registry when it is defined.
Then you can lookup functions from the REGISTRY
:
use Int32;
use REGISTRY;
let sig = REGISTRY.get.expect;
let output = sig.function.as_scalar.unwrap.unwrap;
See the example for more details.