ogc_cql2/
lib.rs

1// SPDX-License-Identifier: Apache-2.0
2
3#![warn(missing_docs)]
4
5//! OGC CQL2 parser and runtime interpreter.
6//!
7//! The next paragraphs explain in more details the elements of this project
8//! as well as the rationale behind some of the decisions that shaped its
9//! components.
10//!
11//! # Expressions
12//!
13//! The kernel of this project is OGC CQL2 Expressions represented by the
14//! [`Expression`] enumeration. The two variants: [`TextEncoded`] and [`JsonEncoded`]
15//! respectively represent the text-based and json-based mandated representations.
16//!
17//! Parsing user-provided input is done by invoking one of the following two
18//! methods: [`Expression::try_from_text()`] and [`Expression::try_from_json()`]
19//! as shown in the following example:
20//! ```rust
21//! use ogc_cql2::prelude::*;
22//! use std::error::Error;
23//!
24//! # fn test() -> Result<(), Box<dyn Error>> {
25//! let expr = Expression::try_from_text(r#""name" NOT LIKE 'foo%' AND "value" > 10"#)?;
26//! // ...
27//! let expr = Expression::try_from_json(r#"
28//! {
29//!  "op": "t_finishes",
30//!  "args": [
31//!    { "interval": [ { "property": "starts_at" }, { "property": "ends_at" } ] },
32//!    { "interval": [ "1991-10-07", "2010-02-10T05:29:20.073225Z" ] }
33//!  ]
34//! }"#)?;
35//! #    Ok(())
36//! # }
37//! ```
38//! An `Ok` result implies a syntactically correct parsed expression!
39//!
40//! For convenience, a standalone tool is included that can be used from the
41//! command line to quickly test the vailidity of candidate expressions.
42//!
43//! Once the library is built (`cargo b↵`), it can be invoked by calling:
44//! ```bash
45//! cargo r --bin repl↵
46//! ```
47//! Read more about it [here](../repl/index.html)
48//!
49//! # Evaluators
50//!
51//! An OGC CQL2 _Expression_ on its own is close to useless unless it is evaluated
52//! against, what the (CQL2) standard refers to as [`Resource`]s. A [`Resource`]
53//! here is essentially a _Map_ of property names (i.e. strings) to [queryable][Q]
54//! values. More on that later.
55//!
56//! This library represents those objects by the [`Evaluator`] trait. A simple
57//! example of an implementation of this trait is provided --see [`ExEvaluator`].
58//!
59//! In an earlier incarnation an [`Evaluator`] used to have a `teardown()` hook.
60//! Not anymore. Rust's [`Drop` trait](https://doc.rust-lang.org/std/ops/trait.Drop.html)
61//! sort of makes that method superfluous.
62//!
63//! # Data sources
64//!
65//! Data sources represent providers of data to be processed by [`Evaluator`]s to
66//! filter (i.e. include or exclude) items based on the result of [`Expression`]s.
67//!
68//! The [`DataSource`] (marker) trait represents those objects. Currently the
69//! library provides two implementations: [`CSVDataSource`] and [`GPkgDataSource`].
70//! The first represents _Comma Separated Values_ (CSV) sourced from tabular data
71//! where each row is mapped to a _Feature_ containing one geometry (spatial)
72//! property and other non-geometry attributes. The second represents [GeoPackage][gpkg]
73//! files. A _GeoPackage_ is
74//! > ... _an open, standards-based, platform-independent, portable,
75//! > self-describing, compact format for transferring geospatial information.
76//! > It is a platform-independent SQLite database file_...
77//!
78//! Coding concrete implementations of those data source traits is facilitated
79//! by the library providing two macros: [gen_csv_ds!] and [gen_gpkg_ds!]. The
80//! first for the _CSV_ variety while the second for the _GeoPackage_ one.
81//!
82//! I intend to provide two additional implementations: one for [ESRI
83//! Shapefiles][shapefile] and another for [PostGIS enabled tables][pgis].
84//!
85//! # Features and Resources
86//!
87//! I frequently mention the term _Feature_ in the documentation to refer to
88//! an abstract type that closely relates to its data source. For a CSV data
89//! source, it's a structure that is `serde` deserializable. For example, in the
90//! `tests/samples/data` folder, a CSV file named `ne_110m_rivers_lake_centerlines`
91//! representing one of the 3 data sets referred to in the standard for testing
92//! compliance is provided. The _Feature_ for that data source looks like this:
93//! ```rust
94//! use serde::Deserialize;
95//! use std::marker::PhantomData;
96//!
97//! #[derive(Debug, Default, Deserialize)]
98//! pub(crate) struct ZRiver {
99//!     /* 0 */ fid: i32,
100//!     /* 1 */ geom: String,
101//!     /* 2 */ name: String,
102//!     #[serde(skip)] ignored: PhantomData<String>
103//! }
104//! ```
105//! This makes sense b/c the [csv crate](https://crates.io/crates/csv) used for
106//! reading the _CSV_ data works smoothly with deserializable structures.
107//! Worth noting here that the spatial data (the `geom` field) is expected to
108//! be encoded as WKT (Well Known Text).
109//!
110//! When dealing w/ a _GeoPackage_ version of the same data uses this
111//! structure:
112//! ```rust
113//! use sqlx::FromRow;
114//!
115//! #[derive(Debug, FromRow)]
116//! pub(crate) struct TRiver {
117//!     fid: i32,
118//!     geom: Vec<u8>,
119//!     name: String,
120//! }
121//! ```
122//! As one can see this best suits the [sqlx crate](https://crates.io/crates/sqlx)
123//! used for reading _GeoPackage_ data. In this type of _Feature_ the same
124//! `geom` spatial attribute is now expected to be a byte array containing the
125//! WKB (Well Known Binary) encoded value of the vector geometry.
126//!
127//! Finally on that note, a _Feature_ implementation must provide a way of
128//! converting an instance of `Self` to a [`Resource`]. Here it is for the
129//! above _rivers_ CSV version:
130//! ```rust
131//! use ogc_cql2::prelude::*;
132//! use std::error::Error;
133//! # use serde::Deserialize;
134//! # use std::marker::PhantomData;
135//! # use std::collections::HashMap;
136//! # #[derive(Debug, Default, Deserialize)]
137//! # pub(crate) struct ZRiver {
138//! #    /* 0 */ fid: i32,
139//! #    /* 1 */ geom: String,
140//! #    /* 2 */ name: String,
141//! #    #[serde(skip)] ignored: PhantomData<String>
142//! # }
143//!
144//! impl TryFrom<ZRiver> for Resource {
145//!     type Error = MyError;
146//!
147//!     fn try_from(value: ZRiver) -> Result<Self, Self::Error> {
148//!         Ok(HashMap::from([
149//!             ("fid".into(), Q::try_from(value.fid)?),
150//!             ("geom".into(), Q::try_from_wkt(&value.geom)?),
151//!             ("name".into(), Q::new_plain_str(&value.name)),
152//!         ]))
153//!     }
154//! }
155//! ```
156//!
157//! A [`Resource`] on the other hand, as mentioned earlier, is generic in the
158//! sense that it's a simple map of propery names to values in a similar vain
159//! to how JSON objects are handled. In the same vain as how `serde` models
160//! JSON values, the types of _value_ a _resource's_ queryable, property, or
161//! attribute are embodied by the [Queryable][Q] enumeration.
162//!
163//! Note though that this _resource_ genericity is too expensive in terms of
164//! performance.
165//!
166//! # Iterable and Streamable
167//!
168//! Access to the contents of a [`DataSource`] is possible by implementing
169//! one or both of the two traits: [`IterableDS`] and [`StreamableDS`].
170//!
171//! The first exposes a method ([`iter()`][IterableDS::iter()]) that returns an
172//! [_Iterator_](https://doc.rust-lang.org/std/iter/trait.Iterator.html) over
173//! the _Features_ of the data source.
174//!
175//! Considering that the [`CSVDataSource`] related macro [`gen_csv_ds!`] does
176//! exactly that, one can easily write something like this...
177//! ```rust
178//! use ogc_cql2::prelude::*;
179//! use std::error::Error;
180//! # use std::fs::File;
181//! # use std::collections::HashMap;
182//! # use serde::Deserialize;
183//! # use std::marker::PhantomData;
184//! # #[derive(Debug, Default, Deserialize)]
185//! # pub(crate) struct ZRiver {
186//! #    /* 0 */ fid: i32,
187//! #    /* 1 */ geom: String,
188//! #    /* 2 */ name: String,
189//! #    #[serde(skip)] ignored: PhantomData<String>
190//! # }
191//! # impl TryFrom<ZRiver> for Resource {
192//! #    type Error = MyError;
193//! #
194//! #    fn try_from(value: ZRiver) -> Result<Self, Self::Error> {
195//! #        Ok(HashMap::from([
196//! #            ("fid".into(), Q::try_from(value.fid)?),
197//! #            ("geom".into(), Q::try_from_wkt(&value.geom)?),
198//! #            ("name".into(), Q::new_plain_str(&value.name)),
199//! #        ]))
200//! #    }
201//! # }
202//!
203//! // somewhere the macro is invoked to generate module-private artifacts...
204//! gen_csv_ds!(pub(crate), "River", "...ne_110m_rivers_lake_centerlines.csv", ZRiver);
205//!
206//! # fn test() -> Result<(), Box<dyn Error>> {
207//! // now we collect all the "rivers" in the collection...
208//! let csv = RiverCSV::new();
209//! let it: Result<Vec<ZRiver>, MyError> = csv.iter()?.collect();
210//! // ...
211//! #     Ok(())
212//! # }
213//! ```
214//! The [`StreamableDS`] trait is more versatile. It exposes methods to stream
215//! asynchronously the contents as _Features_ ([`fetch()`][StreamableDS::fetch()]
216//! and [`fetch_where()`][StreamableDS::fetch_where()]) and _Resources_
217//! ([`stream()`][StreamableDS::stream()] and [`stream_where()`][StreamableDS::stream_where()]).
218//! The methods with the `_where` suffix expect an [`Expression`] argument that
219//! will be delegated to the data source itself to use for _filtering_ the
220//! contents in the best way it can; e.g. SQL WHERE clause for a _GeoPackage_
221//! file, and _PostGIS_ DB tables, etc...
222//!
223//! Similar to the CSV data source, the [`gen_gpkg_ds!`] macro does the heavy
224//! lifting generating the necessary artifcats for a _GeoPackage_ data source.
225//!
226//! 
227//! # Relative performance
228//! 
229//! With the introduction of the [`DataSource`], [`IterableDS`] and [`StreamableDS`]
230//! traits and the provided _CSV_ and _GeoPackage_ implementations, a User can
231//! effectively process the data in 3 ways:
232//! 
233//! * as _Features_ using the [`IterableDS`] trait --from a _CSV_ table.
234//! * as either _Features_ or [Resource]s using the [`StreamableDS`] trait through
235//!   the `fetch()` or `stream()` hooks --from a _GeoPackage_ database file,
236//! * as _Features_ or [`Resource`]s using the [`StreamableDS`] trait through the
237//!   `fetch_where()` or `stream_where()` hooks --from a _GeoPackage_ DB.
238//! 
239//! The last approach is by far the most effective since it delegates to a
240//! DB engine the job of filtering the records, while the 2<sup>nd</sup> one
241//! is the worst b/c it involves converting every _Feature_ to a [Resource]
242//! even when we may not need all the queryables from that newly created
243//! [Resource].
244//! 
245//! As an example of relative performance of those approaches, consider the
246//! timing of `test_points`, `test_points_gpkg` and `test_points_sql` in
247//! `a9::test_37` which correspond to those 3 strategies respectively when
248//! processing a data set of 243 records. On my development laptop, i get...
249//! 
250//! 1. `test_points()` -> `0.10s`
251//! 2. `test_points_gpkg()` -> `5.91s`
252//! 3. `test_points_sql()` -> `0.08s`
253//! 
254//! 
255//! # Third-party crates
256//!
257//! This project, in addition to the external software mentioned in the [README][readme],
258//! relies on few 3<sup>rd</sup> party crates. In addition to the `csv`, and `sqlx`
259//! crates already mentioned, here are the most important ones...
260//!
261//! 1. PEG
262//!    * [`peg`](https://crates.io/crates/peg): Provides a Rust macro that builds
263//!      a recursive descent parser from a concise definition of a grammar.
264//!
265//! 2. JSON Deserialization:
266//!    * [serde][3]: for the basic capabilities.
267//!    * [serde_json][4]: for the JSON format bindings.
268//!    * [serde_with][5]: for custom helpers.
269//!
270//! 3. Date + Time:
271//!    * [jiff][6]: for time-zone-aware date and timestamp handling.
272//!
273//! 4. Case + Accent Insensitive Strings:
274//!    * [unicase][7]: for comparing strings when case is not important.
275//!    * [unicode-normalization][8]: for un-accenting strings w/ Unicode
276//!      decomposition.
277//!
278//! 5. CRS Transformation:
279//!    * [proj][9]: for coordinate transformation via bindings to the [PROJ][10]
280//!      API.
281//!
282//!
283//! [1]: https://crates.io/crates/geos
284//! [2]: https://libgeos.org/
285//! [3]: https://crates.io/crates/serde
286//! [4]: https://crates.io/crates/serde_json
287//! [5]: https://crates.io/crates/serde_with
288//! [6]: https://crates.io/crates/jiff
289//! [7]: https://crates.io/crates/unicase
290//! [8]: https://crates.io/crates/unicode-normalization
291//! [9]: https://crates.io/crates/proj
292//! [10]: https://proj.org/
293//!
294//! [gpkg]: https://www.geopackage.org/spec140/index.html
295//! [shapefile]: https://en.wikipedia.org/wiki/Shapefile
296//! [pgis]: https://en.wikipedia.org/wiki/PostGIS
297//! [sqlx]: https://crates.io/crates/sqlx
298//! [readme]: https://crates.io/crates/xapi-rs
299//!
300
301#![doc = include_str!("../doc/FUNCTION.md")]
302#![doc = include_str!("../doc/CONFIGURATION.md")]
303
304mod bound;
305mod config;
306mod context;
307mod crs;
308mod ds;
309mod error;
310mod evaluator;
311mod expr;
312mod function;
313mod geom;
314mod json;
315mod op;
316mod qstring;
317mod queryable;
318mod srid;
319mod text;
320mod wkb;
321
322pub use bound::*;
323pub use context::*;
324pub use crs::*;
325pub use ds::*;
326pub use evaluator::*;
327pub use function::*;
328pub use geom::*;
329pub use qstring::QString;
330pub use queryable::*;
331pub use srid::*;
332
333pub mod prelude;
334
335use crate::{expr::E, text::cql2::expression};
336use core::fmt;
337pub use error::MyError;
338
339/// An instance of an OGC CQL2 filter.
340#[derive(Debug)]
341pub enum Expression {
342    /// Instance generated from a successfully parsed text-encoded input string.
343    Text(TextEncoded),
344    /// Instance generated from a successfully parsed JSON-encoded input string.
345    Json(Box<JsonEncoded>),
346}
347
348impl Expression {
349    /// Try to construct from a text-encoded string.
350    pub fn try_from_text(s: &str) -> Result<Self, MyError> {
351        let x = expression(s).map_err(MyError::Text)?;
352        Ok(Expression::Text(TextEncoded(x)))
353    }
354
355    /// Try to construct from a JSON-encoded string.
356    pub fn try_from_json(s: &str) -> Result<Self, MyError> {
357        let x = serde_json::from_str::<json::Expression>(s).map_err(MyError::Json)?;
358        Ok(Expression::Json(Box::new(JsonEncoded(x))))
359    }
360
361    /// Return a reference to the text-encoded variant as an `Option`.
362    pub fn as_text_encoded(&self) -> Option<&TextEncoded> {
363        match self {
364            Expression::Text(x) => Some(x),
365            Expression::Json(_) => None,
366        }
367    }
368
369    // convert both variants to the common `E` intermediary form.
370    pub(crate) fn to_inner(&self) -> Result<E, MyError> {
371        match self {
372            Expression::Text(x) => Ok(x.0.to_owned()),
373            Expression::Json(x) => {
374                let s = &x.0.to_string();
375                let te = Self::try_from_text(s)?;
376                let it = te
377                    .as_text_encoded()
378                    .ok_or_else(|| MyError::Runtime("Failed converting to TE".into()))?;
379                Ok(it.0.to_owned())
380            }
381        }
382    }
383}
384
385impl fmt::Display for Expression {
386    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
387        match self {
388            Expression::Text(x) => write!(f, "{}", x.0),
389            Expression::Json(x) => write!(f, "{}", x.0),
390        }
391    }
392}
393
394/// Text-encoded CQL2 Expression.
395#[derive(Debug, PartialEq)]
396pub struct TextEncoded(expr::E);
397
398/// JSON-encoded CQL2 Expression.
399#[derive(Debug)]
400pub struct JsonEncoded(json::Expression);
401
402/// Possible outcome values when evaluating an [Expression] against an
403/// individual _Resource_ from a collection.
404///
405/// From [OGC CQL2][1]:
406/// > _Each resource instance in the source collection is evaluated against
407/// > a filtering expression. The net effect of evaluating a filter
408/// > [Expression] is a subset of resources that satisfy the predicate(s)
409/// > in the [Expression]._
410///
411/// Logically connected predicates are evaluated according to the following
412/// truth table, where `T` is TRUE, `F` is FALSE and `N` is NULL.
413/// ```text
414/// +-----+-----+---------+---------+
415/// | P1  | P2  | P1 & P2 | P1 | P2 |
416/// +-----+-----+---------+---------+
417/// |  T  |  T  |    T    |    T    |
418/// |  T  |  F  |    F    |    T    |
419/// |  F  |  T  |    F    |    T    |
420/// |  F  |  F  |    F    |    F    |
421/// |  T  |  N  |    N    |    T    |
422/// |  F  |  N  |    F    |    N    |
423/// |  N  |  T  |    N    |    T    |
424/// |  N  |  F  |    F    |    N    |
425/// |  N  |  N  |    N    |    N    |
426/// +-----+-----+---------+---------+
427/// ```
428/// [1]: https://docs.ogc.org/is/21-065r2/21-065r2.html
429#[derive(Debug, PartialEq, Eq)]
430pub enum Outcome {
431    /// The input satisfies the [Expression] and should be marked as being in
432    /// the result set.
433    T,
434    /// The input does not satisfy the filter [Expression] and should not be
435    /// included the result set.
436    F,
437    /// Likewise.
438    N,
439}
440
441impl fmt::Display for Outcome {
442    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
443        match self {
444            Outcome::T => write!(f, "T"),
445            Outcome::F => write!(f, "F",),
446            Outcome::N => write!(f, "N"),
447        }
448    }
449}
450
451impl Outcome {
452    /// Constructor from an optional boolean.
453    pub fn new(flag: Option<&bool>) -> Self {
454        match flag {
455            Some(b) => match b {
456                true => Self::T,
457                false => Self::F,
458            },
459            None => Self::N,
460        }
461    }
462}