HPO
This library is a Rust implementation of PyHPO.
What is this?
HPO, the Human Phenotype Ontology is a standard vocabulary of phenotypic abnormalities in human diseases. It is an Ontology, so all terms are connected to each other, similar to a directed graph.
This library provides convenient APIs to work with the ontology. The main goals are to compare terms (or sets of terms) to each other and run statistics for enrichment analysis.
For example, the terms "Migraine without aura"
and "Migraine with aura"
are more similar to each other than "Migraine"
and "Seizure"
. To add more complexity, patients usually have more than one phenotypical abnormality. So in order to compare two patients to each other, we must cross-compare all individual terms. Eventually we might want to cluster hundreds or thousands of patients based on phenotypical similarity to predict diseases based on phenotypes and run statistical analyses.
The PyHPO Python library provides functionality for these comparisons, providing several different similarity and grouping algorithms. However, since its written in Python it is very slow. Unfortunately the design of PyHPO does not allow multithreading or parallel processing, which makes scaling rather difficult.
I want to overcome these limitations here with a Rust library.
There is some info about the plans for the implementation in the Technical Design document
If you find this project interesting and want to contribute, please get in touch, I could definitely need some help.
What is the current state?
At the moment, this library provides most of the functionality of PyHPO
and it does so much much faster (Blazingly fast). For example, to calculate the GraphIC
similarity for 400 x 400 terms, PyHPO
runs for about 40 seconds on my MacBook Air M1. This Rust based hpo
library finishes in less than 1 second. I can run a pairwise comparison of all 17,059 terms to each other in 13 seconds. PyHPO
would need several hours for the same task.
hpo
also allows multithreading, e.g. using rayon.
You can check out some examples, including benchmarks, in the examples
folder. I sometimes also include the corresponding Python code. As with all benchmarks, your mileage may vary, depending on your computer. But the overall trend stays the same.
API
The API is quite similar to the PyHPO
functionality, but has several adoptions for more idiomatic Rust code and performance improvements.
Some examples are below, more examples can be found in the examples
subfolder.
Most parts of the public API are documented inline and on docs.rs
Ontology
use ;
use ;
HPO term
use Ontology;
Similarity
use Ontology;
use GraphIc;
use InformationContentKind;
Enrichment
Identify which genes (or diseases) are enriched in a set of HpoTerms, e.g. in the clinical information of a patient or patient cohort
use Ontology;
use ;
use gene_enrichment;