Crate pschema_rs
source ·Expand description
pschema-rs
pschema-rs
is a Rust library that provides a Pregel-based schema validation algorithm for generating subsets of data
from Wikidata. It is designed to be efficient, scalable, and easy to use, making it suitable for a wide range of applications
that involve processing large amounts of data from Wikidata.
Features
-
Pregel-based schema validation:
pschema-rs
uses the Pregel model, a graph-based computation model, to perform schema validation on Wikidata entities. This allows for efficient and scalable processing of large datasets. -
Rust implementation:
pschema-rs
is implemented in Rust, a systems programming language known for its performance, memory safety, and concurrency features. This ensures that the library is fast, reliable, and safe to use. -
Wikidata subset generation:
pschema-rs
provides functionality to generate subsets of data from Wikidata based on schema validation rules. This allows users to filter and extract relevant data from Wikidata based on their specific requirements. -
Customizable validation rules:
pschema-rs
allows users to define their own validation rules using a simple and flexible syntax. This makes it easy to customize the schema validation process according to the specific needs of a given application. -
Easy-to-use API:
pschema-rs
provides a user-friendly API that makes it easy to integrate the library into any Rust project. The API provides a high-level interface for performing schema validation and generating Wikidata subsets, with comprehensive documentation and examples to help users get started quickly.
Installation
To use pschema-rs
in your Rust project, you can add it as a dependency in your Cargo.toml
file:
[dependencies]
pschema = "0.0.2"
Usage
Here’s an example of how you can use pschema-rs
to perform schema validation and generate a subset of data from Wikidata:
use pregel_rs::graph_frame::GraphFrame;
use pschema_rs::backends::duckdb::DuckDB;
use pschema_rs::backends::Backend;
use pschema_rs::pschema::PSchema;
use pschema_rs::shape::shex::Shape;
use pschema_rs::shape::shex::TripleConstraint;
use wikidata_rs::id::Id;
fn main() -> Result<(), String> {
// Define validation rules
let start = Shape::TripleConstraint(TripleConstraint::new(
1,
Id::from("P31").into(),
Id::from("Q515").into(),
));
// Load Wikidata entities
let edges = DuckDB::import("./examples/from_duckdb/3000lines.duckdb")?;
// Perform schema validation
match GraphFrame::from_edges(edges) {
Ok(graph) => match PSchema::new(start).validate(graph) {
Ok(result) => {
println!("Schema validation result:");
println!("{:?}", result);
Ok(())
}
Err(error) => Err(error.to_string()),
},
Err(error) => Err(format!("Cannot create a GraphFrame: {}", error)),
}
}
For more information on how to define validation rules, load entities from Wikidata, and process subsets of data, refer to the documentation.
Related projects
- wdsub is an application for generating Wikidata subsets written in Scala.
- pschema is a Scala-based library which is equivalent to this.
License
Copyright © 2023 Ángel Iglesias Préstamo (angel.iglesias.prestamo@gmail.com)
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.
By contributing to this project, you agree to release your contributions under the same license.
Modules
pub mod backends;
is creating a public module namedbackends
. This module contains code related to different backends or databases that the program can use to store and retrieve data.pub mod pschema;
is creating a public module namedpschema
. This module contains code related to creating knowledge graphs from Wikibase data.pub mod shape;
is creating a public module namedshape
. This module contains code related to defining and manipulating shapes or structures of data in the codebase.pub mod utils;
is creating a public module namedutils
. This module contains utility functions and helper code that can be used throughout the codebase.