Skip to main content

Crate ogc_cql2

Crate ogc_cql2 

Source
Expand description

OGC CQL2 parser and runtime interpreter.

The next paragraphs explain in more details the elements of this project as well as the rationale behind some of the decisions that shaped its components.

§Expressions

The kernel of this project is OGC CQL2 Expressions represented by the Expression enumeration. The two variants: TextEncoded and JsonEncoded respectively represent the text-based and json-based mandated representations.

Parsing user-provided input is done by invoking one of the following two methods: Expression::try_from_text() and Expression::try_from_json() as shown in the following example:

use ogc_cql2::prelude::*;
use std::error::Error;

let expr = Expression::try_from_text(r#""name" NOT LIKE 'foo%' AND "value" > 10"#)?;
// ...
let expr = Expression::try_from_json(r#"
{
 "op": "t_finishes",
 "args": [
   { "interval": [ { "property": "starts_at" }, { "property": "ends_at" } ] },
   { "interval": [ "1991-10-07", "2010-02-10T05:29:20.073225Z" ] }
 ]
}"#)?;

An Ok result implies a syntactically correct parsed expression!

For convenience, a standalone tool is included that can be used from the command line to quickly test the vailidity of candidate expressions.

Once the library is built (cargo b↵), it can be invoked by calling:

cargo r --bin repl↵

Read more about it here

§Evaluators

An OGC CQL2 Expression on its own is close to useless unless it is evaluated against, what the (CQL2) standard refers to as Resources. A Resource here is essentially a Map of property names (i.e. strings) to queryable values. More on that later.

This library represents those objects by the Evaluator trait. A simple example of an implementation of this trait is provided –see ExEvaluator.

In an earlier incarnation an Evaluator used to have a teardown() hook. Not anymore. Rust’s Drop trait sort of makes that method superfluous.

§Data sources

Data sources represent providers of data to be processed by Evaluators to filter (i.e. include or exclude) items based on the result of Expressions.

The DataSource (marker) trait represents those objects. Currently the library provides two implementations: CSVDataSource and GPkgDataSource. The first represents Comma Separated Values (CSV) sourced from tabular data where each row is mapped to a Feature containing one geometry (spatial) property and other non-geometry attributes. The second represents GeoPackage files. A GeoPackage is

an open, standards-based, platform-independent, portable, self-describing, compact format for transferring geospatial information. It is a platform-independent SQLite database file

Coding concrete implementations of those data source traits is facilitated by the library providing two macros: gen_csv_ds! and gen_gpkg_ds!. The first for the CSV variety while the second for the GeoPackage one.

I intend to provide two additional implementations: one for ESRI Shapefiles and another for PostGIS enabled tables.

§Features and Resources

I frequently mention the term Feature in the documentation to refer to an abstract type that closely relates to its data source. For a CSV data source, it’s a structure that is serde deserializable. For example, in the tests/samples/data folder, a CSV file named ne_110m_rivers_lake_centerlines representing one of the 3 data sets referred to in the standard for testing compliance is provided. The Feature for that data source looks like this:

use serde::Deserialize;
use std::marker::PhantomData;

#[derive(Debug, Default, Deserialize)]
pub(crate) struct ZRiver {
    /* 0 */ fid: i32,
    /* 1 */ geom: String,
    /* 2 */ name: String,
    #[serde(skip)] ignored: PhantomData<String>
}

This makes sense b/c the csv crate used for reading the CSV data works smoothly with deserializable structures. Worth noting here that the spatial data (the geom field) is expected to be encoded as WKT (Well Known Text).

When dealing w/ a GeoPackage version of the same data uses this structure:

use sqlx::FromRow;

#[derive(Debug, FromRow)]
pub(crate) struct TRiver {
    fid: i32,
    geom: Vec<u8>,
    name: String,
}

As one can see this best suits the sqlx crate used for reading GeoPackage data. In this type of Feature the same geom spatial attribute is now expected to be a byte array containing the WKB (Well Known Binary) encoded value of the vector geometry.

Finally on that note, a Feature implementation must provide a way of converting an instance of Self to a Resource. Here it is for the above rivers CSV version:

use ogc_cql2::prelude::*;
use std::error::Error;

impl TryFrom<ZRiver> for Resource {
    type Error = MyError;

    fn try_from(value: ZRiver) -> Result<Self, Self::Error> {
        Ok(HashMap::from([
            ("fid".into(), Q::try_from(value.fid)?),
            ("geom".into(), Q::try_from_wkt(&value.geom)?),
            ("name".into(), Q::new_plain_str(&value.name)),
        ]))
    }
}

A Resource on the other hand, as mentioned earlier, is generic in the sense that it’s a simple map of propery names to values in a similar vain to how JSON objects are handled. In the same vain as how serde models JSON values, the types of value a resource’s queryable, property, or attribute are embodied by the Queryable enumeration.

Note though that this resource genericity is too expensive in terms of performance.

§Iterable and Streamable

Access to the contents of a DataSource is possible by implementing one or both of the two traits: IterableDS and StreamableDS.

The first exposes a method (iter()) that returns an Iterator over the Features of the data source.

Considering that the CSVDataSource related macro gen_csv_ds! does exactly that, one can easily write something like this…

use ogc_cql2::prelude::*;
use std::error::Error;

// somewhere the macro is invoked to generate module-private artifacts...
gen_csv_ds!(pub(crate), "River", "...ne_110m_rivers_lake_centerlines.csv", ZRiver);

// now we collect all the "rivers" in the collection...
let csv = RiverCSV::new();
let it: Result<Vec<ZRiver>, MyError> = csv.iter()?.collect();
// ...

The StreamableDS trait is more versatile. It exposes methods to stream asynchronously the contents as Features (fetch() and fetch_where()) and Resources (stream() and stream_where()). The methods with the _where suffix expect an Expression argument that will be delegated to the data source itself to use for filtering the contents in the best way it can; e.g. SQL WHERE clause for a GeoPackage file, and PostGIS DB tables, etc…

Similar to the CSV data source, the gen_gpkg_ds! macro does the heavy lifting generating the necessary artifcats for a GeoPackage data source.

§Relative performance

With the introduction of the DataSource, IterableDS and StreamableDS traits and the provided CSV and GeoPackage implementations, a User can effectively process the data in 3 ways:

  • as Features using the IterableDS trait –from a CSV table.
  • as either Features or Resources using the StreamableDS trait through the fetch() or stream() hooks –from a GeoPackage database file,
  • as Features or Resources using the StreamableDS trait through the fetch_where() or stream_where() hooks –from a GeoPackage DB.

The last approach is by far the most effective since it delegates to a DB engine the job of filtering the records, while the 2nd one is the worst b/c it involves converting every Feature to a Resource even when we may not need all the queryables from that newly created Resource.

As an example of relative performance of those approaches, consider the timing of test_points, test_points_gpkg and test_points_sql in a9::test_37 which correspond to those 3 strategies respectively when processing a data set of 243 records. On my development laptop, w/ the profile [unoptimized + debuginfo] i get…

+---+--------------------+-------+
| # | test               | time  |
+---+--------------------+-------+
| 1 | test_points()      | 0.10s |
| 2 | test_points_gpkg() | 5.91s |
| 3 | test_points_sql()  | 0.08s |
+---+--------------------+-------+

§Third-party crates

This project, in addition to the external software mentioned in the README, relies on few 3rd party crates. In addition to the csv, and sqlx crates already mentioned, here are the most important ones…

  1. PEG

    • peg: Provides a Rust macro that builds a recursive descent parser from a concise definition of a grammar.
  2. JSON Deserialization:

  3. Date + Time:

    • jiff: for time-zone-aware date and timestamp handling.
  4. Case + Accent Insensitive Strings:

  5. CRS Transformation:

    • proj: for coordinate transformation via bindings to the PROJ API.

§Functions

The CQL2 Standard makes provisions for externally defined Functions in addition to few specific ones defined in the specs.

This project implements all the required standard functions. It also offers:

  • Support for few ones called builtins that can be used in Filter Expressions.
  • A mechanism for externally defined functions, implemented as Rust Closures, that can be registered in a Context which is then passed to an Evaluator implementation (such as ExEvaluator) for processing an Expression against one or more Resource.

Examples of both types, and the plumbing to wire them, abound in the tests folder. Here is a working simple example:

use ogc_cql2::prelude::*;

// define a function that adds 2 numbers together...
let sum = |x: f64, y: f64| x + y;

// create a Context and register that function and its metadata...
let mut ctx = Context::new();
ctx.register(
    "sum",
    vec![ExtDataType::Num, ExtDataType::Num],
    ExtDataType::Num,
    move |args| {
        let a1 = args.first()?.downcast_ref::<f64>()?;
        let a2 = args.get(1)?.downcast_ref::<f64>()?;
        Some(Box::new(sum(*a1, *a2)))
    },
 );

// freeze the Context (make it read-only) so we can share it safely... 
let shared_ctx = ctx.freeze();

// parse an Expression from a text string...
let expression = Expression::try_from_text("3 = sum(1, 2)")?;

// instantiate an Evaluator instance and feed it the Context...
let mut evaluator = ExEvaluator::new(shared_ctx);

// now set up that Evaluator for evaluating Resources...
evaluator.setup(expression)?;

// since our Expression does not need any queryable Resource property,
// use an empty one...
let feature = Resource::new();

// evaluate the Expression...
let res = evaluator.evaluate(&feature)?;

// assert the outcome is TRUE...
assert!(matches!(res, Outcome::T));

§Data types

This library supports a subset of data types available in a Rust environment for use with function arguments and results. The ExtDataType variants embody those types. Each variant maps to a specific yet opaque Rust type…

ExtDataType variantSymbolinner type
NumNf64
StrSQString (only the plain variant)
BoolBbool
TimestampZjiff::Zoned
DateZjiff::Zoned
GeomGG

§Numeric (Num) builtins

NameArgument(s)ResultDescriptionSee
absx: NNCompute absolute value of x.See
acosx: NNCompute arccosine of x. Result is in radians.See
asinx: NNCompute arcsine of x. Result is in radians.See
atanx: NNCompute arctangent of x. Result is in radians.See
cbrtx: NNCompute cube root of x.See
ceilx: NNCompute smallest integer greater than or equal to x.See
cosx: NNCompute cosine of x (in radians).See
floorx: NNCompute largest integer less than or equal to x.See
lnx: NNCompute natural logarithm of x.See
sinx: NNCompute sine of x (in radians).See
sqrtx: NNCompute square root of x.See
tanx: NNCompute tangent of x (in radians).See
maxx: N, y: NNCompute maximum of x and y.See
avgx: N, y: NNCompute midpoint (average) between x and y.See
minx: N, y: NNCompute minimum of x and y.See

§String (Str) builtins

NameArgument(s)ResultDescriptionSee
trimx: SSRemove leading and trailing whitespaces from x.See
lenx: SNCompute length of x in bytes.See
concatx: S, y: SSAppend y to the end of x.See
starts_withx: S, y: SBReturn TRUE if y is a prefix of x. FALSE otherwise.See
ends_withx: S, y: SBReturn TRUE if y is a suffix of x. FALSE otherwise.See

§Temporal builtins

NameArgument(s)ResultDescription
nowZReturn the current timestamp in UTC time-zone.
todayZReturn today’s date in UTC time-zone.

§Geometry (Geom) builtins

NameArgument(s)ResultDescription
boundaryx: GGReturn the closure of combinatorial boundary of x.
bufferx: G, y: NGReturn a geometry representing all points whose distance from x is less than or equal to y.
envelopex: GGReturn the minimum bounding box of x.
centroidx: GGReturn the geometric centre of x.
convex_hullx: GGReturn minimum convex geometry that encloses all geometries within x.
get_xx: GNReturn the X coordinate of x if it’s a Point.
get_yx: GNReturn the Y coordinate of x if it’s a Point.
get_zx: GNReturn the Z coordinate of x if it’s a Point and is 3D.
wktx: G, p: NSReturn a WKT representation of x w/ p precision. See here for details

§Configuring this library

This library, so far, relies on 3 environment variables DEFAULT_CRS, DEFAULT_PRECISION, and RUST_LOG.

The file .env.template contains those variables w/ their defaults. To adapt it to your environment make a copy, rename it .env and change the values as required.

§DEFAULT_CRS

This environment variable defines the implicit Coordinate Reference System (CRS) code to use when checking if coordinates fall w/in a geometry’s CRS validity extent (a.k.a Area Of Use). It defaults to EPSG:4326 if undefined.

The standard mentions this in…

Since WKT and GeoJSON do not provide a capability to specify the CRS of a geometry literal, the server has to determine the CRS of the geometry literals in a filter expression through another mechanism.

This value is fed to a Context when created using the new() constructor and will trickle down and be used when parsing Expressions containing geometry queryables and literals. For example…

    let shared_ctx = Context::new().freeze();

Because the Conformance Tests expect EPSG:4326 to be indeed the implicit CRS when using included (in the standard) test data, this library allows overriding the global implicit CRS when constructing a Context before freezing and handing it over to Evaluators. Here’s an example when used in most of the Conformance Tests

    let shared_ctx = Context::try_with_crs("EPSG:4326")?.freeze();

§DEFAULT_PRECISION

By Precision I mean the number of digits after the decimal point.

This environment variable controls 3 things: (a) the precision to keep when ingesting geometry coordinates, (b) the precision to use when rendering geometry WKT output using the to_wkt() generic method, and (c) the Precision to use when invoking certain spatial ST functions such as ST_Within and others.

The default value of 7 ensures that coordinates in WGS 84 (which is the default implicit CRS) are compared w/ an accuracy of 1.11 cm.

For now only integers greater than or equal to 0 and less than or equal to 32 are allowed.

The GTrait made public since vesion 0.2.0 and implemented for all geometry variants allows for fine-tuning the WKT output by offering the following method…

    fn to_wkt_fmt(&self, precision: usize) -> String;

§RUST_LOG

See https://docs.rs/env_logger/latest/env_logger/#enabling-logging for details.

Modules§

prelude
Group imports of many common traits and types by adding a glob import for use by clients of this library.

Macros§

gen_csv_ds
Macro to generate a concrete CSVDataSource.
gen_gpkg_ds
Macro to generate a concrete GPkgDataSource.
gen_pg_ds
Macro to generate a concrete PGDataSource.

Structs§

BBox
2D or 3D bounding box.
CRS
Representation of a Coordinate Reference System
CSVDataSource
DataSource of Features and Resources mapped from CSV rows/records.
Context
A Context object we will be handing to evaluators so they are aware of external registered Functions.
ExEvaluator
A concrete evaluator that does the work w/o relying on any external source or capability that may be available in high-level data sources such as a database engine endowed w/ spatial and other operators.
FnInfo
A struct that holds metadata about a Function.
GPkgDataSource
GeoPackage DataSource binding a .gpkg database file + a layer name that maps rows to Features and Resources.
Geometries
Collection of mixed geometries.
JsonEncoded
JSON-encoded CQL2 Expression.
Line
2D or 3D line-string geometry.
Lines
Collection of line-string geometries.
PGDataSource
DataSource binding a PostGIS enabled database + a table name that maps rows to Features and Resources.
PgDate
Our representation of a PostgreSQL DATE type to ease w/ sqlx and jiff.
PgTimestamp
Our representation of a PostgreSQL TIMESTAMP type to use w/ sqlx and jiff.
Point
2D or 3D point geometry.
Points
Collection of point geometries.
Polygon
2D or 3D polygon geometry.
Polygons
Collection of polygon geometries.
QString
String based type used by Queryables to represent a plain string, and a set of flags to indicate how to use it in case and/or accent insensitive contexts.
SRID
Representation of a Spatial Reference IDentifier. For now the Authority is implied to be EPSG.
TextEncoded
Text-encoded CQL2 Expression.

Enums§

Bound
Possible variants of a CQL2 Instant and Interval limit.
DataType
Queryable type variants.
Expression
An instance of an OGC CQL2 filter.
ExtDataType
Externally visible data type variants for arguments and result types used and referenced by user-defined and registered functions invoked in filter expressions.
G
Geometry type variants handled by this library.
MyError
Variants of error raised from this library.
Outcome
Possible outcome values when evaluating an Expression against an individual Resource from a collection.
Q
A Resource queryable property possible concrete value variants.

Constants§

EPSG_4326
The constant representing the ubiquitous EPSG:4326 or WGS'84 SRID.

Traits§

DataSource
Trait for a type that can act as a data source provider of Features and Resources, including a Geometry attribute, in the context of processing CQL2 filter expressions.
Evaluator
Capability of processing OGC CQL2 expressions, both text- and json-encoded.
GTrait
Geometry Trait implemented by all geometry types in this library.
IterableDS
Capability of a DataSource to provide an iterator over a collection of Features or Resources.
StreamableDS
Capability of a DataSource to asynchronously stream Features or Resources.

Type Aliases§

Resource
A dictionary of queryable property names (strings) to Queryable values.
SharedContext
What we share between Evaluators.