# `pschema-rs`
[](https://github.com/angelip2303/pschema-rs/actions/workflows/ci.yml)
[](https://codecov.io/gh/angelip2303/pschema-rs)
[](https://crates.io/crates/pschema-rs)
[](https://docs.rs/pschema-rs/latest/pschema_rs/)
`pschema-rs` is a Rust library that provides a Pregel-based schema validation algorithm for generating subsets of data
from Wikidata. It is designed to be efficient, scalable, and easy to use, making it suitable for a wide range of applications
that involve processing large amounts of data from Wikidata.
## Features
- **Pregel-based schema validation**: `pschema-rs` uses the Pregel model, a graph-based computation model, to perform
schema validation on Wikidata entities. This allows for efficient and scalable processing of large datasets.
- **Rust implementation**: `pschema-rs` is implemented in Rust, a systems programming language known for its performance,
memory safety, and concurrency features. This ensures that the library is fast, reliable, and safe to use.
- **Wikidata subset generation**: `pschema-rs` provides functionality to generate subsets of data from Wikidata based on
schema validation rules. This allows users to filter and extract relevant data from Wikidata based on their specific
requirements.
- **Customizable validation rules**: `pschema-rs` allows users to define their own validation rules using a simple and
flexible syntax. This makes it easy to customize the schema validation process according to the specific needs of a given
application.
- **Easy-to-use API**: `pschema-rs` provides a user-friendly API that makes it easy to integrate the library into any Rust
project. The API provides a high-level interface for performing schema validation and generating Wikidata subsets, with
comprehensive documentation and examples to help users get started quickly.
## Installation
To use `pschema-rs` in your Rust project, you can add it as a dependency in your `Cargo.toml` file:
```toml
[dependencies]
pschema = "0.0.2"
```
## Usage
Here's an example of how you can use `pschema-rs` to perform schema validation and generate a subset of data from Wikidata:
```rust
use pregel_rs::graph_frame::GraphFrame;
use pschema_rs::backends::duckdb::DuckDB;
use pschema_rs::backends::Backend;
use pschema_rs::pschema::PSchema;
use pschema_rs::shape::shex::Shape;
use pschema_rs::shape::shex::TripleConstraint;
use wikidata_rs::id::Id;
fn main() -> Result<(), String> {
// Define validation rules
let start = Shape::TripleConstraint(TripleConstraint::new(
1,
Id::from("P31").into(),
Id::from("Q515").into(),
));
// Load Wikidata entities
let edges = DuckDB::import("./examples/from_duckdb/3000lines.duckdb")?;
// Perform schema validation
match GraphFrame::from_edges(edges) {
Ok(graph) => match PSchema::new(start).validate(graph) {
Ok(result) => {
println!("Schema validation result:");
println!("{:?}", result);
Ok(())
}
Err(error) => Err(error.to_string()),
},
Err(error) => Err(format!("Cannot create a GraphFrame: {}", error)),
}
}
```
For more information on how to define validation rules, load entities from Wikidata, and process subsets of data, refer
to the documentation.
## Related projects
1. [wdsub](https://github.com/weso/wdsub) is an application for generating Wikidata subsets written in Scala.
2. [pschema](https://github.com/weso/pschema) is a Scala-based library which is equivalent to this.
## License
Copyright © 2023 Ángel Iglesias Préstamo (<angel.iglesias.prestamo@gmail.com>)
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
**By contributing to this project, you agree to release your
contributions under the same license.**