Module substrait_expr::builder

source ·
Expand description

§Builders to create expressions and schemas

This module contains utilities for programmatically creating expressions, schemas, and types.

§Overview

To create expressions you first create a schema, then an expression builder, and then add the expressions you want, and finally build a message. Here is an example creating an ExtendedExpression message with a single expression x+3

use substrait_expr::builder::schema::SchemaBuildersExt;
use substrait_expr::helpers::schema::SchemaInfo;
use substrait_expr::helpers::types;
use substrait_expr::{
    builder::{BuilderParams, ExpressionsBuilder},
    functions::functions_arithmetic::FunctionsArithmeticExt,
    helpers::literals::literal,
};

let schema = SchemaInfo::new_full()
    .field("score", types::i32(false))
    .nested("location", false, |builder| {
        builder
            .field("x", types::fp32(false))
            .field("y", types::fp64(true))
    })
    .build();

let builder = ExpressionsBuilder::new(schema, BuilderParams::default());

builder
    .add_expression(
        "sum",
        builder
            .functions()
            .add(
                builder.fields().resolve_by_name("location.x").unwrap(),
                literal(3.0_f32),
            )
            .build()
            .unwrap(),
    )
    .unwrap();

let expressions = builder.build();

§Creating a Schema

Before you can create any expressions you will need a schema. There are four different kinds of schemas, unknown, names only, types only, and full. Which one you create will depend on how much information you know about the input fields. For more information see the docs on schema resolution.

Creating an empty schema is simple.


let schema = SchemaInfo::Empty(EmptySchema::default());

The rest of the schema types have builders.

use substrait_expr::builder::schema::SchemaBuildersExt;
use substrait_expr::helpers::schema::SchemaInfo;
use substrait_expr::helpers::types;

// Constructing a schema for
// {
//   "score": fp32?,
//   "location": {
//     "x": fp64,
//     "y": fp64
//   }
// }

// Names only
let schema = SchemaInfo::new_names()
    .field("score")
    .nested("location", |builder| builder.field("x").field("y"));

// Types only
let schema = SchemaInfo::new_types()
    .field(types::fp32(true))
    .nested(false, |builder| {
        builder.field(types::fp64(false)).field(types::fp64(false))
    })
    .build();

// Full schema
// TODO

If you need to use parameterized types or user defined types then you can use the schema builder to create those as well. This works because every schema builder also has a type registry that gets passed through to the created schema.

use substrait_expr::builder::schema::SchemaBuildersExt;
use substrait_expr::helpers::schema::SchemaInfo;

let builder = SchemaInfo::new_types();
let complex_number = builder
    .types()
    .user_defined("https://imaginary.com/types", "complex-number");
let schema = builder.field(complex_number.with_nullability(true)).build();

There are also utility macros for creating schemas. These are mainly used in unit tests since they require you to know the fields in the schema at compile time.

use substrait_expr::macros::names_schema;

// Names only
let schema = names_schema!({
    score: {},
    location: {
       x: {},
       y: {}
    }
});

// Types only
// TODO

// Full
// TODO

§Creating Expressions

Once you have a schema you can create an expression builder and start creating expressions. One important thing to note is that expressions in Substrait cannot stand alone. They must either be part of a Plan or part of an ExtendedExpression. An ExtendedExpression is a collection of expressions plus schema/type/function metadata. This is what the expression builder creates.

There is an example above covering the entire process.

§Referencing fields

To reference a field in the schema you can use crate::builder::ExpressionsBuilder::fields.

You can reference fields by name


let reference = builder.fields().resolve_by_name("location.x").unwrap();

The syntax for referencing fields by name is fairly simplistic. The . character will choose a subfield. To choose a list item you can use [].


let list_item = builder.fields().resolve_by_name("genres[3]").unwrap();

If you have a map column and the map-key is string then you can also reference it with [].


let map_item = builder.fields().resolve_by_name("metadata[size]").unwrap();

Modules§

Structs§

Traits§