Expand description
ndatafusion provides linear algebra and machine learning scalar and aggregate UDFs for
DataFusion.
Register the scalar and aggregate catalog with register_all, or register the full SQL
surface including table functions with register_all_session. Call the functions from SQL or
by constructing expressions with helpers from functions.
The current catalog supports Float32 and Float64 across dense vector, dense matrix, sparse
CSR, fixed-shape tensor, variable-shape tensor, grouped statistics/model fits, and selected
solver routines. The current complex-valued slice covers dense vector, dense matrix,
complex PCA, fixed-shape tensor, and variable-shape tensor operations over canonical
ndarrow.complex64 columns.
Use the make_* constructor family when SQL starts from ordinary List values. If a table
already stores canonical FixedSizeList or extension-backed Arrow values, call the numerical
UDFs directly. Selected constructor, aggregate, and control-parameter UDFs also support named
arguments in SQL. For numerical UDFs, prefer positional data arguments first and named trailing
control arguments after.
§Quick Start
use datafusion::prelude::SessionContext;
#[tokio::main]
async fn main() -> datafusion::common::Result<()> {
let mut ctx = SessionContext::new();
ndatafusion::register_all(&mut ctx)?;
let batches = ctx
.sql(
"SELECT
vector_dot(make_vector(left_values, 2), make_vector(right_values, 2)) AS dot,
matrix_determinant(make_matrix(matrix_values, 2, 2)) AS det
FROM (
SELECT
[3.0, 4.0] AS left_values,
[4.0, 0.0] AS right_values,
[9.0, 0.0, 0.0, 4.0] AS matrix_values
)",
)
.await?
.collect()
.await?;
assert_eq!(batches[0].num_rows(), 1);
Ok(())
}§Constructors
The constructor UDFs convert ordinary nested List values into the canonical Arrow contracts
used by the numerical catalog. They are not required when input columns already use those
canonical contracts:
make_vectormake_matrixmake_tensormake_variable_tensormake_csr_matrix_batch
§Included UDF Groups
The registered catalog includes:
- constructors for canonical numerical values
- dense vector operations, including the current complex-vector subset
- complex dense matrix products, statistics, iterative solvers, matrix functions, and the current complex eigen / Schur / polar subset
- dense matrix operations, decompositions, direct solvers, and Sylvester matrix equations
- sparse CSR operations
- fixed-shape and variable-shape tensor operations, including the current complex tensor subset
- differentiation, optimization, and matrix-equation helpers
- statistics, real and complex PCA, iterative solvers, and linear regression
- grouped aggregate fits for covariance, correlation, PCA, and linear regression
- sparse factorization, tensor decomposition, and the
unpack_structtable function viaregister_all_session
For the complete SQL function inventory and notes on result contracts, see CATALOG.md in the
repository root. For small copy-paste query examples, see EXERCISES.md.
§Features
Feature forwarding follows nabled directly:
blaslapack-provideropenblas-systemopenblas-staticnetlib-systemnetlib-staticmagma-systemaccelerator-rayonaccelerator-wgpu
Re-exports§
pub use register::register_all;pub use register::register_all_session;