Expand description
OGC CQL2 parser and runtime interpreter.
The next paragraphs explain in more details the elements of this project as well as the rationale behind some of the decisions that shaped its components.
§Expressions
The kernel of this project is OGC CQL2 Expressions represented by the
Expression enumeration. The two variants: TextEncoded and JsonEncoded
respectively represent the text-based and json-based mandated representations.
Parsing user-provided input is done by invoking one of the following two
methods: Expression::try_from_text() and Expression::try_from_json()
as shown in the following example:
use ogc_cql2::prelude::*;
use std::error::Error;
let expr = Expression::try_from_text(r#""name" NOT LIKE 'foo%' AND "value" > 10"#)?;
// ...
let expr = Expression::try_from_json(r#"
{
"op": "t_finishes",
"args": [
{ "interval": [ { "property": "starts_at" }, { "property": "ends_at" } ] },
{ "interval": [ "1991-10-07", "2010-02-10T05:29:20.073225Z" ] }
]
}"#)?;An Ok result implies a syntactically correct parsed expression!
For convenience, a standalone tool is included that can be used from the command line to quickly test the vailidity of candidate expressions.
Once the library is built (cargo b↵), it can be invoked by calling:
cargo r --bin repl↵Read more about it here
§Evaluators
An OGC CQL2 Expression on its own is close to useless unless it is evaluated
against, what the (CQL2) standard refers to as Resources. A Resource
here is essentially a Map of property names (i.e. strings) to queryable
values. More on that later.
This library represents those objects by the Evaluator trait. A simple
example of an implementation of this trait is provided –see ExEvaluator.
In an earlier incarnation an Evaluator used to have a teardown() hook.
Not anymore. Rust’s Drop trait
sort of makes that method superfluous.
§Data sources
Data sources represent providers of data to be processed by Evaluators to
filter (i.e. include or exclude) items based on the result of Expressions.
The DataSource (marker) trait represents those objects. Currently the
library provides two implementations: CSVDataSource and GPkgDataSource.
The first represents Comma Separated Values (CSV) sourced from tabular data
where each row is mapped to a Feature containing one geometry (spatial)
property and other non-geometry attributes. The second represents GeoPackage
files. A GeoPackage is
… an open, standards-based, platform-independent, portable, self-describing, compact format for transferring geospatial information. It is a platform-independent SQLite database file…
Coding concrete implementations of those data source traits is facilitated by the library providing two macros: gen_csv_ds! and gen_gpkg_ds!. The first for the CSV variety while the second for the GeoPackage one.
I intend to provide two additional implementations: one for ESRI Shapefiles and another for PostGIS enabled tables.
§Features and Resources
I frequently mention the term Feature in the documentation to refer to
an abstract type that closely relates to its data source. For a CSV data
source, it’s a structure that is serde deserializable. For example, in the
tests/samples/data folder, a CSV file named ne_110m_rivers_lake_centerlines
representing one of the 3 data sets referred to in the standard for testing
compliance is provided. The Feature for that data source looks like this:
use serde::Deserialize;
use std::marker::PhantomData;
#[derive(Debug, Default, Deserialize)]
pub(crate) struct ZRiver {
/* 0 */ fid: i32,
/* 1 */ geom: String,
/* 2 */ name: String,
#[serde(skip)] ignored: PhantomData<String>
}This makes sense b/c the csv crate used for
reading the CSV data works smoothly with deserializable structures.
Worth noting here that the spatial data (the geom field) is expected to
be encoded as WKT (Well Known Text).
When dealing w/ a GeoPackage version of the same data uses this structure:
use sqlx::FromRow;
#[derive(Debug, FromRow)]
pub(crate) struct TRiver {
fid: i32,
geom: Vec<u8>,
name: String,
}As one can see this best suits the sqlx crate
used for reading GeoPackage data. In this type of Feature the same
geom spatial attribute is now expected to be a byte array containing the
WKB (Well Known Binary) encoded value of the vector geometry.
Finally on that note, a Feature implementation must provide a way of
converting an instance of Self to a Resource. Here it is for the
above rivers CSV version:
use ogc_cql2::prelude::*;
use std::error::Error;
impl TryFrom<ZRiver> for Resource {
type Error = MyError;
fn try_from(value: ZRiver) -> Result<Self, Self::Error> {
Ok(HashMap::from([
("fid".into(), Q::try_from(value.fid)?),
("geom".into(), Q::try_from_wkt(&value.geom)?),
("name".into(), Q::new_plain_str(&value.name)),
]))
}
}A Resource on the other hand, as mentioned earlier, is generic in the
sense that it’s a simple map of propery names to values in a similar vain
to how JSON objects are handled. In the same vain as how serde models
JSON values, the types of value a resource’s queryable, property, or
attribute are embodied by the Queryable enumeration.
Note though that this resource genericity is too expensive in terms of performance.
§Iterable and Streamable
Access to the contents of a DataSource is possible by implementing
one or both of the two traits: IterableDS and StreamableDS.
The first exposes a method (iter()) that returns an
Iterator over
the Features of the data source.
Considering that the CSVDataSource related macro gen_csv_ds! does
exactly that, one can easily write something like this…
use ogc_cql2::prelude::*;
use std::error::Error;
// somewhere the macro is invoked to generate module-private artifacts...
gen_csv_ds!(pub(crate), "River", "...ne_110m_rivers_lake_centerlines.csv", ZRiver);
// now we collect all the "rivers" in the collection...
let csv = RiverCSV::new();
let it: Result<Vec<ZRiver>, MyError> = csv.iter()?.collect();
// ...The StreamableDS trait is more versatile. It exposes methods to stream
asynchronously the contents as Features (fetch()
and fetch_where()) and Resources
(stream() and stream_where()).
The methods with the _where suffix expect an Expression argument that
will be delegated to the data source itself to use for filtering the
contents in the best way it can; e.g. SQL WHERE clause for a GeoPackage
file, and PostGIS DB tables, etc…
Similar to the CSV data source, the gen_gpkg_ds! macro does the heavy
lifting generating the necessary artifcats for a GeoPackage data source.
§Relative performance
With the introduction of the DataSource, IterableDS and StreamableDS
traits and the provided CSV and GeoPackage implementations, a User can
effectively process the data in 3 ways:
- as Features using the
IterableDStrait –from a CSV table. - as either Features or Resources using the
StreamableDStrait through thefetch()orstream()hooks –from a GeoPackage database file, - as Features or
Resources using theStreamableDStrait through thefetch_where()orstream_where()hooks –from a GeoPackage DB.
The last approach is by far the most effective since it delegates to a
DB engine the job of filtering the records, while the 2nd one
is the worst b/c it involves converting every Feature to a Resource
even when we may not need all the queryables from that newly created
Resource.
As an example of relative performance of those approaches, consider the
timing of test_points, test_points_gpkg and test_points_sql in
a9::test_37 which correspond to those 3 strategies respectively when
processing a data set of 243 records. On my development laptop, w/ the
profile [unoptimized + debuginfo] i get…
+---+--------------------+-------+
| # | test | time |
+---+--------------------+-------+
| 1 | test_points() | 0.10s |
| 2 | test_points_gpkg() | 5.91s |
| 3 | test_points_sql() | 0.08s |
+---+--------------------+-------+§Third-party crates
This project, in addition to the external software mentioned in the README,
relies on few 3rd party crates. In addition to the csv, and sqlx
crates already mentioned, here are the most important ones…
-
PEG
peg: Provides a Rust macro that builds a recursive descent parser from a concise definition of a grammar.
-
JSON Deserialization:
- serde: for the basic capabilities.
- serde_json: for the JSON format bindings.
- serde_with: for custom helpers.
-
Date + Time:
- jiff: for time-zone-aware date and timestamp handling.
-
Case + Accent Insensitive Strings:
- unicase: for comparing strings when case is not important.
- unicode-normalization: for un-accenting strings w/ Unicode decomposition.
-
CRS Transformation:
§Functions
The CQL2 Standard makes provisions for externally defined Functions in addition to few specific ones defined in the specs.
This project implements all the required standard functions. It also offers:
- Support for few ones called builtins that can be used in Filter Expressions.
- A mechanism for externally defined functions, implemented as Rust Closures, that can be registered in a
Contextwhich is then passed to anEvaluatorimplementation (such asExEvaluator) for processing anExpressionagainst one or moreResource.
Examples of both types, and the plumbing to wire them, abound in the tests folder. Here is a working simple example:
use ogc_cql2::prelude::*;
// define a function that adds 2 numbers together...
let sum = |x: f64, y: f64| x + y;
// create a Context and register that function and its metadata...
let mut ctx = Context::new();
ctx.register(
"sum",
vec![ExtDataType::Num, ExtDataType::Num],
ExtDataType::Num,
move |args| {
let a1 = args.first()?.downcast_ref::<f64>()?;
let a2 = args.get(1)?.downcast_ref::<f64>()?;
Some(Box::new(sum(*a1, *a2)))
},
);
// freeze the Context (make it read-only) so we can share it safely...
let shared_ctx = ctx.freeze();
// parse an Expression from a text string...
let expression = Expression::try_from_text("3 = sum(1, 2)")?;
// instantiate an Evaluator instance and feed it the Context...
let mut evaluator = ExEvaluator::new(shared_ctx);
// now set up that Evaluator for evaluating Resources...
evaluator.setup(expression)?;
// since our Expression does not need any queryable Resource property,
// use an empty one...
let feature = Resource::new();
// evaluate the Expression...
let res = evaluator.evaluate(&feature)?;
// assert the outcome is TRUE...
assert!(matches!(res, Outcome::T));
§Data types
This library supports a subset of data types available in a Rust environment for use with function arguments and results. The ExtDataType variants embody those types. Each variant maps to a specific yet opaque Rust type…
ExtDataType variant | Symbol | inner type |
|---|---|---|
Num | N | f64 |
Str | S | QString (only the plain variant) |
Bool | B | bool |
Timestamp | Z | jiff::Zoned |
Date | Z | jiff::Zoned |
Geom | G | G |
§Numeric (Num) builtins
| Name | Argument(s) | Result | Description | See |
|---|---|---|---|---|
abs | x: N | N | Compute absolute value of x. | See |
acos | x: N | N | Compute arccosine of x. Result is in radians. | See |
asin | x: N | N | Compute arcsine of x. Result is in radians. | See |
atan | x: N | N | Compute arctangent of x. Result is in radians. | See |
cbrt | x: N | N | Compute cube root of x. | See |
ceil | x: N | N | Compute smallest integer greater than or equal to x. | See |
cos | x: N | N | Compute cosine of x (in radians). | See |
floor | x: N | N | Compute largest integer less than or equal to x. | See |
ln | x: N | N | Compute natural logarithm of x. | See |
sin | x: N | N | Compute sine of x (in radians). | See |
sqrt | x: N | N | Compute square root of x. | See |
tan | x: N | N | Compute tangent of x (in radians). | See |
max | x: N, y: N | N | Compute maximum of x and y. | See |
avg | x: N, y: N | N | Compute midpoint (average) between x and y. | See |
min | x: N, y: N | N | Compute minimum of x and y. | See |
§String (Str) builtins
| Name | Argument(s) | Result | Description | See |
|---|---|---|---|---|
trim | x: S | S | Remove leading and trailing whitespaces from x. | See |
len | x: S | N | Compute length of x in bytes. | See |
concat | x: S, y: S | S | Append y to the end of x. | See |
starts_with | x: S, y: S | B | Return TRUE if y is a prefix of x. FALSE otherwise. | See |
ends_with | x: S, y: S | B | Return TRUE if y is a suffix of x. FALSE otherwise. | See |
§Temporal builtins
| Name | Argument(s) | Result | Description |
|---|---|---|---|
now | Z | Return the current timestamp in UTC time-zone. | |
today | Z | Return today’s date in UTC time-zone. |
§Geometry (Geom) builtins
| Name | Argument(s) | Result | Description |
|---|---|---|---|
boundary | x: G | G | Return the closure of combinatorial boundary of x. |
buffer | x: G, y: N | G | Return a geometry representing all points whose distance from x is less than or equal to y. |
envelope | x: G | G | Return the minimum bounding box of x. |
centroid | x: G | G | Return the geometric centre of x. |
convex_hull | x: G | G | Return minimum convex geometry that encloses all geometries within x. |
get_x | x: G | N | Return the X coordinate of x if it’s a Point. |
get_y | x: G | N | Return the Y coordinate of x if it’s a Point. |
get_z | x: G | N | Return the Z coordinate of x if it’s a Point and is 3D. |
wkt | x: G, p: N | S | Return a WKT representation of x w/ p precision. See here for details |
§Configuring this library
This library, so far, relies on 3 environment variables DEFAULT_CRS, DEFAULT_PRECISION, and RUST_LOG.
The file .env.template contains those variables w/ their defaults. To adapt it to your environment make a copy, rename it .env and change the values as required.
§DEFAULT_CRS
This environment variable defines the implicit Coordinate Reference System (CRS) code to use when checking if coordinates fall w/in a geometry’s CRS validity extent (a.k.a Area Of Use). It defaults to EPSG:4326 if undefined.
The standard mentions this in…
Since WKT and GeoJSON do not provide a capability to specify the CRS of a geometry literal, the server has to determine the CRS of the geometry literals in a filter expression through another mechanism.
This value is fed to a Context when created using the new() constructor and will trickle down and be used when parsing Expressions containing geometry queryables and literals. For example…
let shared_ctx = Context::new().freeze();Because the Conformance Tests expect EPSG:4326 to be indeed the implicit CRS when using included (in the standard) test data, this library allows overriding the global implicit CRS when constructing a Context before freezing and handing it over to Evaluators. Here’s an example when used in most of the Conformance Tests…
let shared_ctx = Context::try_with_crs("EPSG:4326")?.freeze();§DEFAULT_PRECISION
By Precision I mean the number of digits after the decimal point.
This environment variable controls 3 things: (a) the precision to keep when ingesting geometry coordinates, (b) the precision to use when rendering geometry WKT output using the to_wkt() generic method, and (c) the Precision to use when invoking certain spatial ST functions such as ST_Within and others.
The default value of 7 ensures that coordinates in WGS 84 (which is the default implicit CRS) are compared w/ an accuracy of 1.11 cm.
For now only integers greater than or equal to 0 and less than or equal to 32 are allowed.
The GTrait made public since vesion 0.2.0 and implemented for all geometry variants allows for fine-tuning the WKT output by offering the following method…
fn to_wkt_fmt(&self, precision: usize) -> String;§RUST_LOG
See https://docs.rs/env_logger/latest/env_logger/#enabling-logging for details.
Modules§
- prelude
- Group imports of many common traits and types by adding a glob import for use by clients of this library.
Macros§
- gen_
csv_ ds - Macro to generate a concrete CSVDataSource.
- gen_
gpkg_ ds - Macro to generate a concrete GPkgDataSource.
- gen_
pg_ ds - Macro to generate a concrete PGDataSource.
Structs§
- BBox
- 2D or 3D bounding box.
- CRS
- Representation of a Coordinate Reference System
- CSVData
Source DataSourceof Features and Resources mapped from CSV rows/records.- Context
- A Context object we will be handing to evaluators so they are aware of external registered Functions.
- ExEvaluator
- A concrete evaluator that does the work w/o relying on any external source or capability that may be available in high-level data sources such as a database engine endowed w/ spatial and other operators.
- FnInfo
- A struct that holds metadata about a Function.
- GPkg
Data Source - GeoPackage
DataSourcebinding a.gpkgdatabase file + a layer name that maps rows to Features and Resources. - Geometries
- Collection of mixed geometries.
- Json
Encoded - JSON-encoded CQL2
Expression. - Line
- 2D or 3D line-string geometry.
- Lines
- Collection of line-string geometries.
- PGData
Source DataSourcebinding a PostGIS enabled database + a table name that maps rows to Features and Resources.- PgDate
- Our representation of a PostgreSQL DATE type to ease w/ sqlx and jiff.
- PgTimestamp
- Our representation of a PostgreSQL TIMESTAMP type to use w/ sqlx and jiff.
- Point
- 2D or 3D point geometry.
- Points
- Collection of point geometries.
- Polygon
- 2D or 3D polygon geometry.
- Polygons
- Collection of polygon geometries.
- QString
- String based type used by
Queryables to represent a plain string, and a set of flags to indicate how to use it in case and/or accent insensitive contexts. - SRID
- Representation of a Spatial Reference IDentifier. For now the Authority is implied to be EPSG.
- Text
Encoded - Text-encoded CQL2
Expression.
Enums§
- Bound
- Possible variants of a CQL2 Instant and Interval limit.
- Data
Type - Queryable type variants.
- Expression
- An instance of an OGC CQL2 filter.
- ExtData
Type - Externally visible data type variants for arguments and result types used and referenced by user-defined and registered functions invoked in filter expressions.
- G
- Geometry type variants handled by this library.
- MyError
- Variants of error raised from this library.
- Outcome
- Possible outcome values when evaluating an
Expressionagainst an individualResourcefrom a collection. - Q
- A
Resourcequeryable property possible concrete value variants.
Constants§
- EPSG_
4326 - The constant representing the ubiquitous
EPSG:4326orWGS'84SRID.
Traits§
- Data
Source - Trait for a type that can act as a data source provider of Features
and
Resources, including a Geometry attribute, in the context of processing CQL2 filter expressions. - Evaluator
- Capability of processing OGC CQL2 expressions, both text- and json-encoded.
- GTrait
- Geometry Trait implemented by all geometry types in this library.
- IterableDS
- Capability of a
DataSourceto provide an iterator over a collection of Features or Resources. - StreamableDS
- Capability of a
DataSourceto asynchronously stream Features or Resources.
Type Aliases§
- Resource
- A dictionary of queryable property names (strings) to
Queryablevalues. - Shared
Context - What we share between Evaluators.