substrait_validator/
lib.rs1# (including
34expansions of any referenced YAML files) to [another](output::tree::Node),
35using the facilities provided by the [parse module](mod@parse). This process is
36documented in much more detail [here](mod@parse). Once constructed, the
37resulting tree can then be [further converted](export) to a few export formats,
38or the validation [diagnostics](output::diagnostic) can simply be
39[extracted](ParseResult::iter_diagnostics()).
40
41This crate only supports the binary protobuf serialization format as input;
42that conversion is ultimately done [here](parse::traversal::parse_proto())
43using a combination of [prost] and some unfortunate magic in
44[substrait_validator_derive]. That is to say: it does *NOT* support JSON format
45or variations thereof. This is because support for protobuf JSON is flaky
46beyond the official bindings, likely in no small part due to all the case
47conversion magic and special cases crammed into that format. Since there are no
48official protobuf bindings for Rust, there is no way to do this from within the
49crate without reinventing the wheel as a square.
50
51Instead, the Python bindings, generated using
52[maturin](https://github.com/PyO3/maturin), include the user-facing logic for
53this. This is also the primary reason why the CLI is written in Python, rather
54than in Rust. When a format other than binary protobuf is passed to the Python
55package, it uses the official protobuf bindings for Python to (re)serialize to
56the binary format, before handing control to the Rust crate. For the return
57trip, the protobuf export format (using the message tree defined in the
58[substrait.validator](https://github.com/substrait-io/substrait-validator/blob\
59/main/proto/substrait/validator/validator.proto) protobuf namespace) is used to
60pass the parse result to Python.
61
62C bindings also exist. These are of the not-very-user-friendly sort, however;
63they exist primarily to allow the validator to be used from within the testing
64frameworks of whatever language you want, provided they support calling into
65C-like libraries.
66
67## Testing strategy
68
69Currently, this crate has (almost) no test cases of its own. This is primarily
70to do with the fact that validating only part of a plan would require complex
71context setup and that, ideally, the (bits of) plan for the test cases are
72written in either JSON or a yet-more user-friendly variant thereof. For the
73reasons given above, this can't really be done from within Rust.
74
75Instead, tests are run using the [test-runner crate](https://github.com/\
76 substrait-io/substrait-validator/tree/main/tests) and its associated Python
77frontend. The Python frontend pre-parses YAML test description files into a
78JSON file that's easy to read from within Rust via serde-json, after which the
79Rust crate takes over to run the test cases. The pre-parsing involves
80converting the JSON-as-YAML protobuf tree into the binary serialization format,
81but also allows diagnostic presence checks to be inserted in the plan where
82they are expected (rather than having to link up the tree paths manually) and
83allows YAML extensions to be specified inline (they'll be extracted and
84replaced with a special URI that the test runner understands).
85
86The APIs for the bindings on top of the Rust crate are tested using
87[pytest](https://docs.pytest.org/) (Python) and
88[googletest](https://google.github.io/googletest/) (C).
89
90## Resolving extension URIs
91
92URI resolution deserves an honorable mention here, because it unfortunately
93can't easily be hidden away in some private module: anything that uses HTTPS
94must either link into the operating system's certificate store or ship its own
95root certificates. The latter is sure to be a security issue, so let's restrict
96ourselves to the former solution.
97
98The problem with this is that it pollutes the Rust crate with runtime linking
99shenanigans that are not at all compatible from one system to another. In
100particular, we can't build universal Python packages around crates that do
101this. Since we rely on Python for the CLI, this is a bit of an issue.
102
103For this reason, URI resolution is guarded behind the `curl` feature. When the
104feature is enabled, `libcurl` will be used to resolve URIs, using the system's
105certificate store for HTTPS. When disabled, the crate will fall back to
106resolving only `file://` URIs, unless a more capable resolver is
107[installed](Config::add_uri_resolver()). The Python bindings will do just that:
108they install a resolver based on Python's own
109[urllib](https://docs.python.org/3/library/urllib.html).
110
111## Build process
112
113The build process for the crates and Python module also involves some
114not-so-obvious magic, to do with shipping the Substrait protobuf and YAML
115schema as appropriate. The problem is that Cargo and Python's packaging logic
116require that all files shipped with the package be located within the package
117source tree, which is not the case here due to the common submodule and proto
118directories.
119
120### Rust
121
122If the [`in-git-repo` file](https://github.com/substrait-io/\
123substrait-validator/blob/main/rs/in-git-repo) exists, the
124[build.rs file for this crate](https://github.com/substrait-io/\
125substrait-validator/blob/main/rs/build.rs) will copy the proto and schema files
126from their respective source locations into `src/resources`, thus keeping them
127in sync. The `in-git-repo` file is not included in the crate manifest, so this
128step is skipped when the crate is compiled after being downloaded from
129crates.io. Note however, that in order to release this crate, it must always
130first be built: the only time during the packaging process when build.rs is
131called is already on the user's machine, so the resource files won't be
132synchronized by `cargo package`.
133
134### Python
135
136The process for Python is much the same, but handled by a
137[wrapper around maturin](https://github.com/substrait-io/substrait-validator/\
138blob/main/py/substrait_validator_build/__init__.py), as maturin does not expose
139pre-build hooks of its own. The `in-git-repo` file isn't necessary here; we can
140use the `local_dependencies` file that will be generated by the packaging tools
141as part of a source distribution as a marker.
142
143Here, too, it's important that the synchronization logic is run manually prior
144to various release-like operations. This can be done by running
145[prepare_build.py](https://github.com/substrait-io/substrait-validator/blob/\
146main/py/prepare_build.py).
147
148### Protobuf
149
150Protobuf code generation is done via `prost`, which requires access to a
151`protoc` executable. This will need to be installed on your system while
152developing (e.g. via a package manager). In CI, it is installed as part of the
153Github actions.
154 "
155)]
156
157#[macro_use]
158pub mod output;
159
160#[macro_use]
161mod parse;
162
163pub mod export;
164pub mod input;
165
166mod util;
167
168use std::str::FromStr;
169
170use input::proto::substrait::Plan;
171use strum::IntoEnumIterator;
172
173pub use input::config::glob::Pattern;
175pub use input::config::Config;
176pub use output::comment::Comment;
177pub use output::diagnostic::Classification;
178pub use output::diagnostic::Diagnostic;
179pub use output::diagnostic::Level;
180pub use output::parse_result::ParseResult;
181pub use output::parse_result::Validity;
182
183pub fn parse<B: prost::bytes::Buf + Clone>(buffer: B, config: &Config) -> ParseResult {
186 parse::parse(buffer, config)
187}
188
189pub fn validate(plan: &Plan, config: &Config) -> ParseResult {
192 parse::validate(plan, config)
193}
194
195pub fn iter_diagnostics() -> impl Iterator<Item = Classification> {
197 Classification::iter()
198}
199
200pub fn version() -> semver::Version {
202 semver::Version::from_str(env!("CARGO_PKG_VERSION")).expect("invalid embedded crate version")
203}
204
205pub fn substrait_version() -> semver::Version {
208 semver::Version::from_str(include_str!("resources/substrait-version"))
209 .expect("invalid embedded Substrait version")
210}
211
212pub fn substrait_version_req() -> semver::VersionReq {
215 let version = substrait_version();
216 if version.major == 0 {
217 semver::VersionReq::parse(&format!("={}.{}", version.major, version.minor)).unwrap()
218 } else {
219 semver::VersionReq::parse(&format!("={}", version.major)).unwrap()
220 }
221}
222
223pub fn substrait_version_req_loose() -> semver::VersionReq {
226 let version = substrait_version();
227 semver::VersionReq::parse(&format!("={}", version.major)).unwrap()
228}