Attribute Macro arrow_udf_macros::function
source · #[function]
Expand description
Defining a function on Arrow arrays.
§Table of Contents
- SQL Function Signature
- Rust Function Signature
- Table Function
- Registration and Invocation
- Appendix: Type Matrix
The following example demonstrates a simple usage:
#[function("add(int, int) -> int")]
fn add(x: i32, y: i32) -> i32 {
x + y
}
§SQL Function Signature
Each function must have a signature, specified in the function("...")
part of the macro
invocation. The signature follows this pattern:
name ( [arg_types],* [...] ) [ -> [setof] return_type ]
Where name
is the function name.
arg_types
is a comma-separated list of argument types. The allowed data types are listed in
in the name
column of the appendix’s type matrix. Wildcards or auto
can also be used, as
explained below. If the function is variadic, the last argument can be denoted as ...
.
When setof
appears before the return type, this indicates that the function is a set-returning
function (table function), meaning it can return multiple values instead of just one. For more
details, see the section on table functions.
If no return type is specified, the function returns null
.
§Multiple Function Definitions
Multiple #[function]
macros can be applied to a single generic Rust function to define
multiple SQL functions of different types. For example:
#[function("add(int16, int16) -> int16")]
#[function("add(int32, int32) -> int32")]
#[function("add(int64, int64) -> int64")]
fn add<T: Add>(x: T, y: T) -> T {
x + y
}
§Rust Function Signature
The #[function]
macro can handle various types of Rust functions.
Each argument corresponds to the Rust type T
in the type matrix.
The return value type can be any type that implements AsRef<T>
.
§Nullable Arguments
The functions above will only be called when all arguments are not null. If null arguments need
to be considered, the Option
type can be used:
#[function("add(int, int) -> int")]
fn add(x: Option<i32>, y: i32) -> i32 {...}
§Return Value
Similarly, the return value type can be one of the following:
T
: Indicates that a non-null value is always returned, and errors will not occur.Option<T>
: Indicates that a null value may be returned, but errors will not occur.Result<T>
: Indicates that an error may occur, but a null value will not be returned.Result<Option<T>>
: Indicates that a null value may be returned, and an error may also occur.
§Optimization
When all input and output types of the function are primitive type (int2, int4, int8, float4, float8)
and do not contain any Option or Result, the #[function]
macro will automatically
generate SIMD vectorized execution code.
Therefore, try to avoid returning Option
and Result
whenever possible.
§Functions Returning Strings
For functions that return string types, you can also use the writer style function signature to avoid memory copying and dynamic memory allocation:
#[function("trim(string) -> string")]
fn trim(s: &str, writer: &mut impl Write) {
writer.write_str(s.trim()).unwrap();
}
If errors may be returned, then the return value should be Result<()>
:
#[function("trim(string) -> string")]
fn trim(s: &str, writer: &mut impl Write) -> Result<()> {
writer.write_str(s.trim()).unwrap();
Ok(())
}
If null values may be returned, then the return value should be Option<()>
:
#[function("trim(string) -> string")]
fn trim(s: &str, writer: &mut impl Write) -> Option<()> {
if s.is_empty() {
None
} else {
writer.write_str(s.trim()).unwrap();
Some(())
}
}
§Table Function
A table function is a special kind of function that can return multiple values instead of just
one. Its function signature must include the setof
keyword, and the Rust function should
return an iterator of the form impl Iterator<Item = T>
or its derived types.
For example:
#[function("generate_series(int32, int32) -> setof int32")]
fn generate_series(start: i32, stop: i32) -> impl Iterator<Item = i32> {
start..=stop
}
Likewise, the return value Iterator
can include Option
or Result
either internally or
externally. For instance:
impl Iterator<Item = Result<T>>
Result<impl Iterator<Item = T>>
Result<impl Iterator<Item = Result<Option<T>>>>
§Registration and Invocation
Every function defined by #[function]
is automatically registered in the global function registry.
You can lookup the function by name and types:
use arrow_udf::sig::REGISTRY;
use arrow_schema::DataType::Int32;
let sig = REGISTRY.get("add", &[Int32, Int32], &Int32).unwrap();
§Appendix: Type Matrix
§Base Types
Arrow data type | Aliases | Rust type as argument | Rust type as return value |
---|---|---|---|
boolean | bool | bool | bool |
int8 | i8 | i8 | |
int16 | smallint | i16 | i16 |
int32 | int | i32 | i32 |
int64 | bigint | i64 | i64 |
float32 | real | f32 | f32 |
float32 | double precision | f64 | f64 |
date32 | date | chrono::NaiveDate | chrono::NaiveDate |
time64 | time | chrono::NaiveTime | chrono::NaiveTime |
timestamp | chrono::NaiveDateTime | chrono::NaiveDateTime | |
timestamptz | not supported yet | not supported yet | |
interval | arrow_udf::types::Interval | arrow_udf::types::Interval | |
string | varchar | &str | impl AsRef<str> , e.g. String , Box<str> , &str |
binary | bytea | &[u8] | impl AsRef<[u8]> , e.g. Vec<u8> , Box<[u8]> , &[u8] |
§Extension Types
We also support the following extension types that are not part of the Arrow data types:
Data type | Metadata | Rust type as argument | Rust type as return value |
---|---|---|---|
decimal | arrowudf.decimal | rust_decimal::Decimal | rust_decimal::Decimal |
json | arrowudf.json | serde_json::Value | serde_json::Value |
§Array Types
SQL type | Rust type as argument | Rust type as return value |
---|---|---|
int8[] | &[i8] | impl Iterator<Item = i8> |
int16[] | &[i16] | impl Iterator<Item = i16> |
int32[] | &[i32] | impl Iterator<Item = i32> |
int64[] | &[i64] | impl Iterator<Item = i64> |
float32[] | &[f32] | impl Iterator<Item = f32> |
float64[] | &[f64] | impl Iterator<Item = f64> |
string[] | &StringArray | impl Iterator<Item = &str> |
binary[] | &BinaryArray | impl Iterator<Item = &[u8]> |
largestring[] | &LargeStringArray | impl Iterator<Item = &str> |
largebinary[] | &LargeBinaryArray | impl Iterator<Item = &[u8]> |
others[] | not supported yet | not supported yet |
§Composite Types
SQL type | Rust type as argument | Rust type as return value |
---|---|---|
struct<..> | UserDefinedStruct | UserDefinedStruct |