ndatafusion 0.0.1-reserve.0

Extensions and support for linear algebra in DataFusion
docs.rs failed to build ndatafusion-0.0.1-reserve.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

🪐 ndatafusion

ndatafusion is a DataFusion extension providing nabled powered linear algebra and ML UDFs.

Install

[dependencies]
ndatafusion = { version = "0.0.1", features = ["openblas-system"] }

nabled/arrow is part of the base ndatafusion contract and is enabled unconditionally.

Feature forwarding follows nabled directly:

  • blas
  • lapack-provider
  • openblas-system
  • openblas-static
  • netlib-system
  • netlib-static
  • magma-system
  • accelerator-rayon
  • accelerator-wgpu

Quick Start

use datafusion::prelude::SessionContext;

#[tokio::main]
async fn main() -> datafusion::common::Result<()> {
    let mut ctx = SessionContext::new();
    ndatafusion::register_all(&mut ctx)?;

    let batches = ctx
        .sql(
            "SELECT
                vector_dot(make_vector(left_values, 2), make_vector(right_values, 2)) AS dot,
                matrix_determinant(make_matrix(matrix_values, 2, 2)) AS det
             FROM (
                SELECT
                    [3.0, 4.0] AS left_values,
                    [4.0, 0.0] AS right_values,
                    [9.0, 0.0, 0.0, 4.0] AS matrix_values
             )",
        )
        .await?
        .collect()
        .await?;

    assert_eq!(batches[0].num_rows(), 1);
    Ok(())
}

The make_* constructors convert ordinary nested List values into the canonical numerical contracts used by the linalg/ml UDFs. They are not required when input columns already use the expected Arrow contract, such as FixedSizeList<Float32|Float64>(D) for dense vectors or the extension-backed matrix, tensor, and sparse batch layouts emitted by ndarrow:

  1. make_vector
  2. make_matrix
  3. make_tensor
  4. make_variable_tensor
  5. make_csr_matrix_batch

Selected constructor and control-parameter UDFs also support named arguments in SQL. For numerical UDFs, prefer positional data arguments first and named trailing control arguments after. For example:

SELECT matrix_exp(
  make_matrix(values => matrix_values, rows => 2, cols => 2),
  max_terms => 32,
  tolerance => 1e-6
)

Status

For the user-facing SQL catalog, see CATALOG.md. For quick copy-paste queries, see EXERCISES.md. For runnable examples, see cargo run --example hello_sql, cargo run --example direct_arrow_vectors, and cargo run --example pca_pipeline.

ndatafusion currently registers:

  1. 147 scalar UDFs through register_all
  2. 4 aggregate UDFs through register_all
  3. 1 table function through register_all_session

The current surface includes:

  1. canonical SQL constructors for dense vector, dense matrix, fixed-shape tensor, variable-shape tensor, and CSR sparse-matrix batches
  2. dense vector row ops plus complex-vector Hermitian dot / norm / cosine-similarity / normalization helpers over canonical ndarrow.complex64 vectors, plus named-function differentiation helpers (jacobian, jacobian_central, gradient, hessian)
  3. complex dense matrix matvec / matmat, complex matrix statistics, and complex dense iterative solvers over canonical arrow.fixed_shape_tensor<ndarrow.complex64> matrix batches
  4. dense matrix matvec, batched matmul, lower/upper triangular solves, and LU/Cholesky/QR least-squares solves
  5. struct-valued LU, Cholesky, QR, reduced QR, pivoted QR, SVD, truncated SVD, tolerance-thresholded SVD, symmetric/generalized eigen, real-input complex nonsymmetric eigen plus bi-eigen, complex nonsymmetric eigen, Schur, polar, their current complex Schur/polar counterparts, and PCA workflows
  6. matrix inverse, determinant, log-determinant, QR condition number / reconstruction, SVD null-space / pseudo-inverse / condition number / rank / reconstruction, non-symmetric matrix balancing, Gram-Schmidt variants, zero-config matrix functions (matrix_exp_eigen, matrix_log_eigen, matrix_log_svd, matrix_sign), and configurable matrix functions (matrix_exp, matrix_log_taylor, matrix_power) plus the current complex matrix-function slice (matrix_exp_complex, matrix_exp_eigen_complex, matrix_log_eigen_complex, matrix_log_svd_complex, matrix_power_complex, matrix_sign_complex)
  7. sparse batch matvec, sparse-dense matmat, sparse transpose, and sparse-sparse matmat
  8. fixed-shape tensor last-axis reductions, normalization, batched products, row-wise permutation/contraction, variable-shape tensor last-axis workflows, and the first complex fixed-shape / variable-shape tensor norm / normalization surface
  9. matrix column means, real and complex PCA fit plus PCA transform/inverse-transform, dense iterative solvers, linear regression, and grouped aggregate fits for vector covariance/correlation/PCA and linear regression
  10. sparse direct solve via sparse_lu_solve
  11. Sylvester matrix-equation solvers for real and complex batches, including mixed-precision result contracts that report refinement iterations when magma-system is enabled
  12. complex optimization helpers, sparse factorization/preconditioner UDFs, and tensor decomposition / tensor-train / Tucker UDFs over canonical batch contracts
  13. retractable grouped aggregates that can also be used in ordered window frames, and the generic unpack_struct table function for turning struct-valued results into one-row tables

The current real-valued surface supports Float32 and Float64 across the implemented catalog. The current complex-valued surface covers canonical ndarrow.complex64 dense vector, dense matrix, complex PCA, fixed-shape tensor, and variable-shape tensor inputs. The crate also has end-to-end SQL integration coverage for:

  1. literal-backed constructor pipelines
  2. list-column-backed vector and matrix queries
  3. triangular solve plus matrix-function queries
  4. parameterized matrix-function queries
  5. decomposition-variant queries
  6. spectral / orthogonalization queries
  7. iterative solver queries
  8. sparse plus variable-shape tensor composition
  9. fixed-shape tensor constructor plus reduction pipelines
  10. fixed-shape tensor axis permutation / contraction queries
  11. direct canonical complex vector, matrix, and tensor queries without make_*
  12. direct sparse factorization, tensor decomposition, differentiation, optimization, grouped aggregate, and ordered window queries

If you want the table-function surface in SQL, register with ndatafusion::register_all_session instead of register_all. The current table-function catalog exposes:

  1. unpack_struct, which expands a scalar struct-valued expression into a one-row relation

License

Apache License, Version 2.0