HPO
HPO, the Human Phenotype Ontology is a standard vocabulary of phenotypic abnormalities in human diseases. It is an Ontology, so all terms are connected to each other, similar to a directed graph.
This library provides convenient APIs to work with the ontology. The main goals are to compare terms - or sets of terms - to each other and run statistics for enrichment analysis.
This library is basically a Rust implementation of PyHPO, but contains some additional features as well.
Features
- π« Identify patient cohorts based on clinical features
- π¨βπ§βπ¦ Cluster patients or other clinical information for GWAS
- π©»β𧬠Phenotype to Genotype studies
- ππ HPO similarity analysis
- πΈοΈ Graph based analysis of phenotypes, genes and diseases
- π¬ Enrichment analysis of genes and diseases in sets of HPO terms
- Completely written in Rust, so it's πblazingly fastπTM (Benchmarks)
What is the current state?
The library is pretty much feature-complete, at least for my use-cases. If you have any feature-requests, please open an Issue or get in touch. I'm very much interested in getting feedback and new ideas what to improve.
The API is mostly stable, but I might refactor some parts a bit for easier use and performance gain.
If you find this project interesting and want to contribute, please get in touch, I could definitely need some help.
Documentation
The public API is fully documented on docs.rs
The main structs used in hpo are:
- The
Ontologyis the main struct and entrypoint inhpo. HpoTermrepresents a single HPO term and contains plenty of functionality around them.HpoSetis a collection ofHpoTerms, like a patient's clinical information.Generepresents a single gene, including information about associatedHpoTerms.OmimDiseaserepresents a single OMIM-diseases, including information about associatedHpoTerms.OrphaDiseaserepresents a single ORPHA-diseases, including information about associatedHpoTerms.
The most relevant modules are:
annotationscontains theGene,OmimDiseaseandOrphaDiseasestructs, and some related important types.similaritycontains structs and helper functions for similarity comparisons forHpoTermandHpoSet.statscontains functions to calculate the hypergeometric enrichment score of genes or diseases.
Examples
Some (more or less random) examples are included in the examples folder.
Ontology
use ;
use ;
HPO term
use Ontology;
Similarity
use Ontology;
use GraphIc;
use InformationContentKind;
Enrichment
Identify which genes (or diseases) are enriched in a set of HpoTerms, e.g. in
the clinical information of a patient or patient cohort
use Ontology;
use ;
use gene_enrichment;
Benchmarks
As the saying goes: "Make it work, make it good, make it fast". The work and good parts are realized in PyHPO. And even though I tried my best to make it fast, I was still hungry for more. So I started developing the hpo Rust library in December 2022. Even without micro-benchmarking and tuning performance as much as I did for PyHPO, hpo is indeed much much faster already now.
The below benchmarks were run non scientificially and your mileage may vary. I used a MacBook Air M1, rustc 1.68.0, Python 3.9 and /usr/bin/time for timing.
| Benchmark | PyHPO |
hpo (single-threaded) |
hpo (multi-threaded) |
|---|---|---|---|
| Read and Parse Ontology | 6.4 s | 0.22 s | 0.22 s |
| Similarity of 17,245 x 1,000 terms | 98.5 s | 4.6 s | 1.0 s |
| Similarity of GBA1 to all Diseases | 380 s | 15.8 s | 3.0 s |
| Disease enrichment in all Genes | 11.8 s | 0.4 s | 0.3 s |
| Common ancestors of 17,245 x 10,000 terms | 225.2 s | 10.5 | 2.1 |
Technical design
There is some info about the plans for the implementation in the Technical Design document