Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
peprs - A spicy 🌶️ library for managing biological sample metadata to enable reproducible and scalable bioinformatics
Don't let sample metadata parsing bottleneck your pipelines!
About this project
peprs is a rust implementation of the PEP specification and expanded ecosystem. In short, PEP is a framework for managing biological sample metadata. PEP is a community driven effort to create a fast, reliable, reusable, and scalable library for handling biological sample metadata.
PEP and its ecosystem is developed and maintained by the Databio team. As a challenge and learning experience, we have been rewriting the core components of the PEP ecosystem in Rust for performance and reliability.
We are starting with the core PEP specification for metadata management and will expand to include the full ecosystem (looper, pephub-client, pipestat). The core PEP specification is implemented in the peprs-core crate. The Python bindings are implemented in the peprs-py crate.
📦 Modules
- peprs-core — Core library implementing the PEP specification. With core module user can create pep objects and do all kind of manipulations.
- peprs-eido — Schema-based validation of PEP projects against JSON schemas with eido-specific extensions (imports, tangible file checks).
- peprs-cli — Command-line interface with
inspect,validate, andconvertsubcommands. - peprs-py — Python bindings via PyO3. Exposes the
Projectclass with full Polars/Pandas DataFrame interoperability. - pephub-client — Work in progress
⚙️ Installation
Python (recommended)
or with uv
Python (from source)
To build and install the Python package from source (requires maturin and Rust toolchain):
Rust
Add to your Cargo.toml:
[]
= { = "https://github.com/pepkit/peprs" }
CLI
Prebuilt binaries are published to GitHub Releases for Linux, macOS, Windows, and FreeBSD (x86_64 and aarch64).
Using ubi (cross-platform, no Rust required)
ubi auto-detects your platform, downloads the right archive, and installs peprs:
Using cargo-binstall
Manual download
Grab the archive for your platform from the releases page, extract it, and place the peprs binary on your PATH. For example on Linux x86_64:
|
From source
Using Python
🔧 Environment Variables
| Variable | Default | Description |
|---|---|---|
PH_HOME |
~/.pephubclient/ |
Directory where the PEPHub auth cache (jwt.toml) is stored. |
PEPHUB_BASE_URL |
https://pephub-api.databio.org |
Overrides the PEPHub API endpoint used for login and Project.from_pephub(...). |
🐍 Quick Python example
# Load a PEP from a YAML config file
=
# or
=
# Inspect the project
# number of samples
# Get samples as a Polars DataFrame
=
# Get samples as a Pandas DataFrame
=
# Look up a single sample by name
=
# Iterate over samples
# Convert projects
Benchmarks
Comparison of peppy (pure Python) vs peprs (Rust bindings). Averaged over 3 runs per sample size.
Initialization Time (seconds)
| Library | 5 | 20 | 100 | 500 | 1,000 | 5,000 | 10,000 | 50,000 | 100,000 | 600,000 |
|---|---|---|---|---|---|---|---|---|---|---|
| peppy | 0.019 | 0.026 | 0.096 | 0.428 | 0.851 | 4.226 | 8.700 | 44.017 | 87.613 | 297.433 |
| peprs | 0.003 | 0.002 | 0.002 | 0.003 | 0.004 | 0.014 | 0.036 | 0.043 | 0.068 | 0.339 |
| speedup | 7x | 15x | 50x | 149x | 196x | 306x | 244x | 1,021x | 1,288x | 877x |
Validation Time (seconds)
| Library | 5 | 20 | 100 | 500 | 1,000 | 5,000 | 10,000 | 50,000 | 100,000 | 600,000 |
|---|---|---|---|---|---|---|---|---|---|---|
| peppy | 0.004 | 0.006 | 0.017 | 0.070 | 0.166 | 0.685 | 1.380 | 6.928 | 14.208 | 84.452 |
| peprs | 0.012 | 0.001 | 0.002 | 0.008 | 0.008 | 0.038 | 0.079 | 0.423 | 0.794 | 4.339 |
| speedup | 0.4x | 9x | 10x | 9x | 20x | 18x | 17x | 16x | 18x | 19x |
🚀 Afterword
We are looking forward to integrating this project with WDL, Snakemake, and Nextflow. All contributions are welcome. Please open an issue or submit a pull request.