Skip to main content

Crate ndatafusion

Crate ndatafusion 

Source
Expand description

ndatafusion provides linear algebra and machine learning scalar and aggregate UDFs for DataFusion.

Register the scalar and aggregate catalog with register_all, or register the full SQL surface including table functions with register_all_session. Call the functions from SQL or by constructing expressions with helpers from functions.

The current catalog supports Float32 and Float64 across dense vector, dense matrix, sparse CSR, fixed-shape tensor, variable-shape tensor, grouped statistics/model fits, and selected solver routines. The current complex-valued slice covers dense vector, dense matrix, complex PCA, fixed-shape tensor, and variable-shape tensor operations over canonical ndarrow.complex64 columns.

Use the make_* constructor family when SQL starts from ordinary List values. If a table already stores canonical FixedSizeList or extension-backed Arrow values, call the numerical UDFs directly. Selected constructor, aggregate, and control-parameter UDFs also support named arguments in SQL. For numerical UDFs, prefer positional data arguments first and named trailing control arguments after.

§Quick Start

use datafusion::prelude::SessionContext;

#[tokio::main]
async fn main() -> datafusion::common::Result<()> {
    let mut ctx = SessionContext::new();
    ndatafusion::register_all(&mut ctx)?;

    let batches = ctx
        .sql(
            "SELECT
                vector_dot(make_vector(left_values, 2), make_vector(right_values, 2)) AS dot,
                matrix_determinant(make_matrix(matrix_values, 2, 2)) AS det
             FROM (
                SELECT
                    [3.0, 4.0] AS left_values,
                    [4.0, 0.0] AS right_values,
                    [9.0, 0.0, 0.0, 4.0] AS matrix_values
             )",
        )
        .await?
        .collect()
        .await?;

    assert_eq!(batches[0].num_rows(), 1);
    Ok(())
}

§Constructors

The constructor UDFs convert ordinary nested List values into the canonical Arrow contracts used by the numerical catalog. They are not required when input columns already use those canonical contracts:

  • make_vector
  • make_matrix
  • make_tensor
  • make_variable_tensor
  • make_csr_matrix_batch

§Included UDF Groups

The registered catalog includes:

  • constructors for canonical numerical values
  • dense vector operations, including the current complex-vector subset
  • complex dense matrix products, statistics, iterative solvers, matrix functions, and the current complex eigen / Schur / polar subset
  • dense matrix operations, decompositions, direct solvers, and Sylvester matrix equations
  • sparse CSR operations
  • fixed-shape and variable-shape tensor operations, including the current complex tensor subset
  • differentiation, optimization, and matrix-equation helpers
  • statistics, real and complex PCA, iterative solvers, and linear regression
  • grouped aggregate fits for covariance, correlation, PCA, and linear regression
  • sparse factorization, tensor decomposition, and the unpack_struct table function via register_all_session

For the complete SQL function inventory and notes on result contracts, see CATALOG.md in the repository root. For small copy-paste query examples, see EXERCISES.md.

§Features

Feature forwarding follows nabled directly:

  • blas
  • lapack-provider
  • openblas-system
  • openblas-static
  • netlib-system
  • netlib-static
  • magma-system
  • accelerator-rayon
  • accelerator-wgpu

Re-exports§

pub use register::register_all;
pub use register::register_all_session;

Modules§

error
functions
Expression builders for the ndatafusion scalar and aggregate UDF catalog.
register
udafs
Public aggregate-UDF constructors and catalog assembly helpers.
udf
Internal UDF family modules grouped by SQL surface area.
udfs
Public scalar-UDF constructors and catalog assembly helpers.