Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
peprs - A spicy 🌶️ library for managing biological sample metadata to enable reproducible and scalable bioinformatics
Don't let sample metadata parsing bottleneck your pipelines!
About this project
peprs is a rust implementation of the PEP specification and expanded ecosystem. In short, PEP is a framework for managing biological sample metadata. PEP is a community driven effort to create a fast, reliable, reusable, and scalable library for handling biological sample metadata.
PEP and its ecosystem is developed and maintained by the Databio team. As a challenge and learning experience, we have been rewriting the core components of the PEP ecosystem in Rust for performance and reliability.
We are starting with the core PEP specification for metadata management and will expand to include the full ecosystem (looper, pephub-client, pipestat). The core PEP specification is implemented in the peprs-core crate. The Python bindings are implemented in the peprs-py crate.
📦 Modules
- peprs-core — Core library implementing the PEP specification. With core module user can create pep objects and do all kind of manipulations.
- peprs-eido — Schema-based validation of PEP projects against JSON schemas with eido-specific extensions (imports, tangible file checks).
- peprs-cli — Command-line interface with
inspect,validate, andconvertsubcommands. - peprs-py — Python bindings via PyO3. Exposes the
Projectclass with full Polars/Pandas DataFrame interoperability. - pephub-client — Work in progress
⚙️ Installation
Python (recommended)
Python (from source)
To build and install the Python package from source (requires maturin and Rust toolchain):
Rust
Add to your Cargo.toml:
[]
= { = "https://github.com/pepkit/peprs" }
CLI
Using source
Using Python
🐍 Quick Python example
# Load a PEP from a YAML config file
=
# or
=
# Inspect the project
# number of samples
# Get samples as a Polars DataFrame
=
# Get samples as a Pandas DataFrame
=
# Look up a single sample by name
=
# Iterate over samples
# Convert projects
Benchmarks
Comparison of peppy (pure Python) vs peprs (Rust bindings). Averaged over 3 runs per sample size.
Initialization Time (seconds)
| Library | 5 | 20 | 100 | 500 | 1,000 | 5,000 | 10,000 | 50,000 | 100,000 | 600,000 |
|---|---|---|---|---|---|---|---|---|---|---|
| peppy | 0.019 | 0.026 | 0.096 | 0.428 | 0.851 | 4.226 | 8.700 | 44.017 | 87.613 | 297.433 |
| peprs | 0.003 | 0.002 | 0.002 | 0.003 | 0.004 | 0.014 | 0.036 | 0.043 | 0.068 | 0.339 |
| speedup | 7x | 15x | 50x | 149x | 196x | 306x | 244x | 1,021x | 1,288x | 877x |
Validation Time (seconds)
| Library | 5 | 20 | 100 | 500 | 1,000 | 5,000 | 10,000 | 50,000 | 100,000 | 600,000 |
|---|---|---|---|---|---|---|---|---|---|---|
| peppy | 0.004 | 0.006 | 0.017 | 0.070 | 0.166 | 0.685 | 1.380 | 6.928 | 14.208 | 84.452 |
| peprs | 0.012 | 0.001 | 0.002 | 0.008 | 0.008 | 0.038 | 0.079 | 0.423 | 0.794 | 4.339 |
| speedup | 0.4x | 9x | 10x | 9x | 20x | 18x | 17x | 16x | 18x | 19x |
🚀 Afterword
We are looking forward to integrating this project with WDL, Snakemake, and Nextflow. All contributions are welcome. Please open an issue or submit a pull request.