ogc_cql2/lib.rs
1// SPDX-License-Identifier: Apache-2.0
2
3#![warn(missing_docs)]
4
5//! OGC CQL2 parser and runtime interpreter.
6//!
7//! The next paragraphs explain in more details the elements of this project
8//! as well as the rationale behind some of the decisions that shaped its
9//! components.
10//!
11//! # Expressions
12//!
13//! The kernel of this project is OGC CQL2 Expressions represented by the
14//! [`Expression`] enumeration. The two variants: [`TextEncoded`] and [`JsonEncoded`]
15//! respectively represent the text-based and json-based mandated representations.
16//!
17//! Parsing user-provided input is done by invoking one of the following two
18//! methods: [`Expression::try_from_text()`] and [`Expression::try_from_json()`]
19//! as shown in the following example:
20//! ```rust
21//! use ogc_cql2::prelude::*;
22//! use std::error::Error;
23//!
24//! # fn test() -> Result<(), Box<dyn Error>> {
25//! let expr = Expression::try_from_text(r#""name" NOT LIKE 'foo%' AND "value" > 10"#)?;
26//! // ...
27//! let expr = Expression::try_from_json(r#"
28//! {
29//! "op": "t_finishes",
30//! "args": [
31//! { "interval": [ { "property": "starts_at" }, { "property": "ends_at" } ] },
32//! { "interval": [ "1991-10-07", "2010-02-10T05:29:20.073225Z" ] }
33//! ]
34//! }"#)?;
35//! # Ok(())
36//! # }
37//! ```
38//! An `Ok` result implies a syntactically correct parsed expression!
39//!
40//! For convenience, a standalone tool is included that can be used from the
41//! command line to quickly test the vailidity of candidate expressions.
42//!
43//! Once the library is built (`cargo b↵`), it can be invoked by calling:
44//! ```bash
45//! cargo r --bin repl↵
46//! ```
47//! Read more about it [here](https://github.com/raif-s-naffah/ogc-cql2/blob/master/doc/REPL.md)
48//!
49//! # Evaluators
50//!
51//! An OGC CQL2 _Expression_ on its own is close to useless unless it is evaluated
52//! against, what the (CQL2) standard refers to as [`Resource`]s. A [`Resource`]
53//! here is essentially a _Map_ of property names (i.e. strings) to [queryable][Q]
54//! values. More on that later.
55//!
56//! This library represents those objects by the [`Evaluator`] trait. A simple
57//! example of an implementation of this trait is provided --see [`ExEvaluator`].
58//!
59//! In an earlier incarnation an [`Evaluator`] used to have a `teardown()` hook.
60//! Not anymore. Rust's [`Drop` trait](https://doc.rust-lang.org/std/ops/trait.Drop.html)
61//! sort of makes that method superfluous.
62//!
63//! # Data sources
64//!
65//! Data sources represent providers of data to be processed by [`Evaluator`]s to
66//! filter (i.e. include or exclude) items based on the result of [`Expression`]s.
67//!
68//! The [`DataSource`] (marker) trait represents those objects. Currently the
69//! library provides two implementations: [`CSVDataSource`] and [`GPkgDataSource`].
70//! The first represents _Comma Separated Values_ (CSV) sourced from tabular data
71//! where each row is mapped to a _Feature_ containing one geometry (spatial)
72//! property and other non-geometry attributes. The second represents [GeoPackage][gpkg]
73//! files. A _GeoPackage_ is
74//! > ... _an open, standards-based, platform-independent, portable,
75//! > self-describing, compact format for transferring geospatial information.
76//! > It is a platform-independent SQLite database file_...
77//!
78//! Coding concrete implementations of those data source traits is facilitated
79//! by the library providing two macros: [gen_csv_ds!] and [gen_gpkg_ds!]. The
80//! first for the _CSV_ variety while the second for the _GeoPackage_ one.
81//!
82//! I intend to provide two additional implementations: one for [ESRI
83//! Shapefiles][shapefile] and another for [PostGIS enabled tables][pgis].
84//!
85//! # Features and Resources
86//!
87//! I frequently mention the term _Feature_ in the documentation to refer to
88//! an abstract type that closely relates to its data source. For a CSV data
89//! source, it's a structure that is `serde` deserializable. For example, in the
90//! `tests/samples/data` folder, a CSV file named `ne_110m_rivers_lake_centerlines`
91//! representing one of the 3 data sets referred to in the standard for testing
92//! compliance is provided. The _Feature_ for that data source looks like this:
93//! ```rust
94//! use serde::Deserialize;
95//! use std::marker::PhantomData;
96//!
97//! #[derive(Debug, Default, Deserialize)]
98//! pub(crate) struct ZRiver {
99//! /* 0 */ fid: i32,
100//! /* 1 */ geom: String,
101//! /* 2 */ name: String,
102//! #[serde(skip)] ignored: PhantomData<String>
103//! }
104//! ```
105//! This makes sense b/c the [csv crate](https://crates.io/crates/csv) used for
106//! reading the _CSV_ data works smoothly with deserializable structures.
107//! Worth noting here that the spatial data (the `geom` field) is expected to
108//! be encoded as WKT (Well Known Text).
109//!
110//! When dealing w/ a _GeoPackage_ version of the same data uses this
111//! structure:
112//! ```rust
113//! use sqlx::FromRow;
114//!
115//! #[derive(Debug, FromRow)]
116//! pub(crate) struct TRiver {
117//! fid: i32,
118//! geom: Vec<u8>,
119//! name: String,
120//! }
121//! ```
122//! As one can see this best suits the [sqlx crate](https://crates.io/crates/sqlx)
123//! used for reading _GeoPackage_ data. In this type of _Feature_ the same
124//! `geom` spatial attribute is now expected to be a byte array containing the
125//! WKB (Well Known Binary) encoded value of the vector geometry.
126//!
127//! Finally on that note, a _Feature_ implementation must provide a way of
128//! converting an instance of `Self` to a [`Resource`]. Here it is for the
129//! above _rivers_ CSV version:
130//! ```rust
131//! use ogc_cql2::prelude::*;
132//! use std::error::Error;
133//! # use serde::Deserialize;
134//! # use std::marker::PhantomData;
135//! # use std::collections::HashMap;
136//! # #[derive(Debug, Default, Deserialize)]
137//! # pub(crate) struct ZRiver {
138//! # /* 0 */ fid: i32,
139//! # /* 1 */ geom: String,
140//! # /* 2 */ name: String,
141//! # #[serde(skip)] ignored: PhantomData<String>
142//! # }
143//!
144//! impl TryFrom<ZRiver> for Resource {
145//! type Error = MyError;
146//!
147//! fn try_from(value: ZRiver) -> Result<Self, Self::Error> {
148//! Ok(HashMap::from([
149//! ("fid".into(), Q::try_from(value.fid)?),
150//! ("geom".into(), Q::try_from_wkt(&value.geom)?),
151//! ("name".into(), Q::new_plain_str(&value.name)),
152//! ]))
153//! }
154//! }
155//! ```
156//!
157//! A [`Resource`] on the other hand, as mentioned earlier, is generic in the
158//! sense that it's a simple map of propery names to values in a similar vain
159//! to how JSON objects are handled. In the same vain as how `serde` models
160//! JSON values, the types of _value_ a _resource's_ queryable, property, or
161//! attribute are embodied by the [Queryable][Q] enumeration.
162//!
163//! Note though that this _resource_ genericity is too expensive in terms of
164//! performance.
165//!
166//! # Iterable and Streamable
167//!
168//! Access to the contents of a [`DataSource`] is possible by implementing
169//! one or both of the two traits: [`IterableDS`] and [`StreamableDS`].
170//!
171//! The first exposes a method ([`iter()`][IterableDS::iter()]) that returns an
172//! [_Iterator_](https://doc.rust-lang.org/std/iter/trait.Iterator.html) over
173//! the _Features_ of the data source.
174//!
175//! Considering that the [`CSVDataSource`] related macro [`gen_csv_ds!`] does
176//! exactly that, one can easily write something like this...
177//! ```rust
178//! use ogc_cql2::prelude::*;
179//! use std::error::Error;
180//! # use std::fs::File;
181//! # use std::collections::HashMap;
182//! # use serde::Deserialize;
183//! # use std::marker::PhantomData;
184//! # #[derive(Debug, Default, Deserialize)]
185//! # pub(crate) struct ZRiver {
186//! # /* 0 */ fid: i32,
187//! # /* 1 */ geom: String,
188//! # /* 2 */ name: String,
189//! # #[serde(skip)] ignored: PhantomData<String>
190//! # }
191//! # impl TryFrom<ZRiver> for Resource {
192//! # type Error = MyError;
193//! #
194//! # fn try_from(value: ZRiver) -> Result<Self, Self::Error> {
195//! # Ok(HashMap::from([
196//! # ("fid".into(), Q::try_from(value.fid)?),
197//! # ("geom".into(), Q::try_from_wkt(&value.geom)?),
198//! # ("name".into(), Q::new_plain_str(&value.name)),
199//! # ]))
200//! # }
201//! # }
202//!
203//! // somewhere the macro is invoked to generate module-private artifacts...
204//! gen_csv_ds!(pub(crate), "River", "...ne_110m_rivers_lake_centerlines.csv", ZRiver);
205//!
206//! # fn test() -> Result<(), Box<dyn Error>> {
207//! // now we collect all the "rivers" in the collection...
208//! let csv = RiverCSV::new();
209//! let it: Result<Vec<ZRiver>, MyError> = csv.iter()?.collect();
210//! // ...
211//! # Ok(())
212//! # }
213//! ```
214//! The [`StreamableDS`] trait is more versatile. It exposes methods to stream
215//! asynchronously the contents as _Features_ ([`fetch()`][StreamableDS::fetch()]
216//! and [`fetch_where()`][StreamableDS::fetch_where()]) and _Resources_
217//! ([`stream()`][StreamableDS::stream()] and [`stream_where()`][StreamableDS::stream_where()]).
218//! The methods with the `_where` suffix expect an [`Expression`] argument that
219//! will be delegated to the data source itself to use for _filtering_ the
220//! contents in the best way it can; e.g. SQL WHERE clause for a _GeoPackage_
221//! file, and _PostGIS_ DB tables, etc...
222//!
223//! Similar to the CSV data source, the [`gen_gpkg_ds!`] macro does the heavy
224//! lifting generating the necessary artifcats for a _GeoPackage_ data source.
225//!
226//!
227//! # Relative performance
228//!
229//! With the introduction of the [`DataSource`], [`IterableDS`] and [`StreamableDS`]
230//! traits and the provided _CSV_ and _GeoPackage_ implementations, a User can
231//! effectively process the data in 3 ways:
232//!
233//! * as _Features_ using the [`IterableDS`] trait --from a _CSV_ table.
234//! * as either _Features_ or [Resource]s using the [`StreamableDS`] trait through
235//! the `fetch()` or `stream()` hooks --from a _GeoPackage_ database file,
236//! * as _Features_ or [`Resource`]s using the [`StreamableDS`] trait through the
237//! `fetch_where()` or `stream_where()` hooks --from a _GeoPackage_ DB.
238//!
239//! The last approach is by far the most effective since it delegates to a
240//! DB engine the job of filtering the records, while the 2<sup>nd</sup> one
241//! is the worst b/c it involves converting every _Feature_ to a [`Resource`]
242//! even when we may not need all the queryables from that newly created
243//! [`Resource`].
244//!
245//! As an example of relative performance of those approaches, consider the
246//! timing of `test_points`, `test_points_gpkg` and `test_points_sql` in
247//! `a9::test_37` which correspond to those 3 strategies respectively when
248//! processing a data set of 243 records. On my development laptop, w/ the
249//! `profile [unoptimized + debuginfo]` i get...
250//!
251//! ```text
252//! +---+--------------------+-------+
253//! | # | test | time |
254//! +---+--------------------+-------+
255//! | 1 | test_points() | 0.10s |
256//! | 2 | test_points_gpkg() | 5.91s |
257//! | 3 | test_points_sql() | 0.08s |
258//! +---+--------------------+-------+
259//! ```
260//!
261//!
262//! # Third-party crates
263//!
264//! This project, in addition to the external software mentioned in the [README][readme],
265//! relies on few 3<sup>rd</sup> party crates. In addition to the `csv`, and `sqlx`
266//! crates already mentioned, here are the most important ones...
267//!
268//! 1. PEG
269//! * [`peg`](https://crates.io/crates/peg): Provides a Rust macro that builds
270//! a recursive descent parser from a concise definition of a grammar.
271//!
272//! 2. JSON Deserialization:
273//! * [serde][3]: for the basic capabilities.
274//! * [serde_json][4]: for the JSON format bindings.
275//! * [serde_with][5]: for custom helpers.
276//!
277//! 3. Date + Time:
278//! * [jiff][6]: for time-zone-aware date and timestamp handling.
279//!
280//! 4. Case + Accent Insensitive Strings:
281//! * [unicase][7]: for comparing strings when case is not important.
282//! * [unicode-normalization][8]: for un-accenting strings w/ Unicode
283//! decomposition.
284//!
285//! 5. CRS Transformation:
286//! * [proj][9]: for coordinate transformation via bindings to the [PROJ][10]
287//! API.
288//!
289//!
290//! [1]: https://crates.io/crates/geos
291//! [2]: https://libgeos.org/
292//! [3]: https://crates.io/crates/serde
293//! [4]: https://crates.io/crates/serde_json
294//! [5]: https://crates.io/crates/serde_with
295//! [6]: https://crates.io/crates/jiff
296//! [7]: https://crates.io/crates/unicase
297//! [8]: https://crates.io/crates/unicode-normalization
298//! [9]: https://crates.io/crates/proj
299//! [10]: https://proj.org/
300//!
301//! [gpkg]: https://www.geopackage.org/spec140/index.html
302//! [shapefile]: https://en.wikipedia.org/wiki/Shapefile
303//! [pgis]: https://en.wikipedia.org/wiki/PostGIS
304//! [sqlx]: https://crates.io/crates/sqlx
305//! [readme]: https://crates.io/crates/xapi-rs
306//!
307
308#![doc = include_str!("../doc/FUNCTION.md")]
309#![doc = include_str!("../doc/CONFIGURATION.md")]
310
311mod bound;
312mod config;
313mod context;
314mod crs;
315mod ds;
316mod error;
317mod evaluator;
318mod expr;
319mod function;
320mod geom;
321mod json;
322mod op;
323mod qstring;
324mod queryable;
325mod srid;
326mod text;
327mod wkb;
328
329pub use bound::*;
330pub use context::*;
331pub use crs::*;
332pub use ds::*;
333pub use evaluator::*;
334pub use function::*;
335pub use geom::*;
336pub use qstring::QString;
337pub use queryable::*;
338pub use srid::*;
339
340pub mod prelude;
341
342use crate::{expr::E, text::cql2::expression};
343use core::fmt;
344pub use error::MyError;
345
346/// An instance of an OGC CQL2 filter.
347#[derive(Debug)]
348pub enum Expression {
349 /// Instance generated from a successfully parsed text-encoded input string.
350 Text(TextEncoded),
351 /// Instance generated from a successfully parsed JSON-encoded input string.
352 Json(Box<JsonEncoded>),
353}
354
355impl Expression {
356 /// Try to construct from a text-encoded string.
357 pub fn try_from_text(s: &str) -> Result<Self, MyError> {
358 let x = expression(s).map_err(MyError::Text)?;
359 Ok(Expression::Text(TextEncoded(x)))
360 }
361
362 /// Try to construct from a JSON-encoded string.
363 pub fn try_from_json(s: &str) -> Result<Self, MyError> {
364 let x = serde_json::from_str::<json::Expression>(s).map_err(MyError::Json)?;
365 Ok(Expression::Json(Box::new(JsonEncoded(x))))
366 }
367
368 /// Return a reference to the text-encoded variant as an `Option`.
369 pub fn as_text_encoded(&self) -> Option<&TextEncoded> {
370 match self {
371 Expression::Text(x) => Some(x),
372 Expression::Json(_) => None,
373 }
374 }
375
376 // convert both variants to the common `E` intermediary form.
377 pub(crate) fn to_inner(&self) -> Result<E, MyError> {
378 match self {
379 Expression::Text(x) => Ok(x.0.to_owned()),
380 Expression::Json(x) => {
381 let s = &x.0.to_string();
382 let te = Self::try_from_text(s)?;
383 let it = te
384 .as_text_encoded()
385 .ok_or_else(|| MyError::Runtime("Failed converting to TE".into()))?;
386 Ok(it.0.to_owned())
387 }
388 }
389 }
390}
391
392impl fmt::Display for Expression {
393 fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
394 match self {
395 Expression::Text(x) => write!(f, "{}", x.0),
396 Expression::Json(x) => write!(f, "{}", x.0),
397 }
398 }
399}
400
401/// Text-encoded CQL2 [`Expression`].
402#[derive(Debug, PartialEq)]
403pub struct TextEncoded(expr::E);
404
405/// JSON-encoded CQL2 [`Expression`].
406#[derive(Debug)]
407pub struct JsonEncoded(json::Expression);
408
409/// Possible outcome values when evaluating an [`Expression`] against an
410/// individual [`Resource`] from a collection.
411///
412/// From [OGC CQL2][1]:
413/// > _Each resource instance in the source collection is evaluated against
414/// > a filtering expression. The net effect of evaluating a filter
415/// > [`Expression`] is a subset of resources that satisfy the predicate(s)
416/// > in the [`Expression`]._
417///
418/// Logically connected predicates are evaluated according to the following
419/// truth table, where `T` is TRUE, `F` is FALSE and `N` is NULL.
420/// ```text
421/// +-----+-----+---------+---------+
422/// | P1 | P2 | P1 & P2 | P1 | P2 |
423/// +-----+-----+---------+---------+
424/// | T | T | T | T |
425/// | T | F | F | T |
426/// | F | T | F | T |
427/// | F | F | F | F |
428/// | T | N | N | T |
429/// | F | N | F | N |
430/// | N | T | N | T |
431/// | N | F | F | N |
432/// | N | N | N | N |
433/// +-----+-----+---------+---------+
434/// ```
435/// [1]: https://docs.ogc.org/is/21-065r2/21-065r2.html
436#[derive(Debug, PartialEq, Eq)]
437pub enum Outcome {
438 /// The input satisfies the [Expression] and should be marked as being in
439 /// the result set.
440 T,
441 /// The input does not satisfy the filter [Expression] and should not be
442 /// included the result set.
443 F,
444 /// Likewise.
445 N,
446}
447
448impl fmt::Display for Outcome {
449 fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
450 match self {
451 Outcome::T => write!(f, "T"),
452 Outcome::F => write!(f, "F",),
453 Outcome::N => write!(f, "N"),
454 }
455 }
456}
457
458impl Outcome {
459 /// Constructor from an optional boolean.
460 pub fn new(flag: Option<&bool>) -> Self {
461 match flag {
462 Some(b) => match b {
463 true => Self::T,
464 false => Self::F,
465 },
466 None => Self::N,
467 }
468 }
469}