Expand description
§HPO
HPO, the Human Phenotype Ontology is a standard vocabulary of phenotypic abnormalities in human diseases. It is an Ontology, so all terms are connected to each other, similar to a directed graph.
This library provides convenient APIs to work with the ontology. The main goals are to compare terms - or sets of terms - to each other and run statistics for enrichment analysis.
This library is basically a Rust implementation of PyHPO, but contains some additional features as well.
§Features
- 👫 Identify patient cohorts based on clinical features
- 👨👧👦 Cluster patients or other clinical information for GWAS
- 🩻→🧬 Phenotype to Genotype studies
- 🍎🍊 HPO similarity analysis
- 🕸️ Graph based analysis of phenotypes, genes and diseases
- 🔬 Enrichment analysis of genes and diseases in sets of HPO terms
- Completely written in Rust, so it’s 🚀blazingly fast🚀TM (Benchmarks)
§What is the current state?
The library is pretty much feature-complete, at least for my use-cases. If you have any feature-requests, please open an Issue or get in touch. I’m very much interested in getting feedback and new ideas what to improve.
The API is mostly stable, but I might refactor some parts a bit for easier use and performance gain.
If you find this project interesting and want to contribute, please get in touch, I could definitely need some help.
§Documentation
The public API is fully documented on docs.rs
The main structs used in hpo
are:
- The
Ontology
is the main struct and entrypoint inhpo
. HpoTerm
represents a single HPO term and contains plenty of functionality around them.HpoSet
is a collection ofHpoTerm
s, like a patient’s clinical information.Gene
represents a single gene, including information about associatedHpoTerm
s.OmimDisease
represents a single OMIM-diseases, including information about associatedHpoTerm
s.OrphaDisease
represents a single ORPHA-diseases, including information about associatedHpoTerm
s.
The most relevant modules are:
annotations
contains theGene
,OmimDisease
andOrphaDisease
structs, and some related important types.similarity
contains structs and helper functions for similarity comparisons forHpoTerm
andHpoSet
.stats
contains functions to calculate the hypergeometric enrichment score of genes or diseases.
§Examples
Some (more or less random) examples are included in the examples
folder.
§Ontology
use hpo::{Ontology, HpoTermId};
use hpo::annotations::{GeneId, OmimDiseaseId, OrphaDiseaseId};
fn example() {
let ontology = Ontology::from_binary("tests/ontology.hpo").unwrap();
// iterate HPO terms
for term in &ontology {
// do something with term
}
// iterate Genes
for gene in ontology.genes() {
// do something with gene
}
// iterate omim diseases
for disease in ontology.omim_diseases() {
// do something with disease
}
// iterate orpha diseases
for disease in ontology.orpha_diseases() {
// do something with disease
}
// get a single HPO term using HPO ID
let hpo_id = HpoTermId::try_from("HP:0000123").unwrap();
let term = ontology.hpo(hpo_id);
// get a single HPO term using `u32` part of HPO ID
let term = ontology.hpo(123u32);
// get a single Omim disease
let disease_id = OmimDiseaseId::from(12345u32);
let disease = ontology.omim_disease(&disease_id);
// get a single Orpha disease
let disease_id = OrphaDiseaseId::from(12345u32);
let disease = ontology.orpha_disease(&disease_id);
// get a single Gene
let hgnc_id = GeneId::from(12345u32);
let gene = ontology.gene(&hgnc_id);
// get a single Gene by its symbol
let gene = ontology.gene_by_name("GBA");
}
§HPO term
use hpo::Ontology;
fn example() {
let ontology = Ontology::from_binary("tests/ontology.hpo").unwrap();
let term = ontology.hpo(123u32).unwrap();
assert_eq!("Abnormality of the nervous system", term.name());
assert_eq!("HP:000123".to_string(), term.id().to_string());
// iterate all parents
for p in term.parents() {
println!("{}", p.name())
}
// iterate all children
for p in term.children() {
println!("{}", p.name())
}
let term2 = ontology.hpo(1u32).unwrap();
assert!(term2.parent_of(&term));
assert!(term.child_of(&term2));
}
§Similarity
use hpo::Ontology;
use hpo::similarity::GraphIc;
use hpo::term::InformationContentKind;
fn example() {
let ontology = Ontology::from_binary("tests/ontology.hpo").unwrap();
let term1 = ontology.hpo(123u32).unwrap();
let term2 = ontology.hpo(1u32).unwrap();
let ic = GraphIc::new(InformationContentKind::Omim);
let similarity = term1.similarity_score(&term2, &ic);
}
§Enrichment
Identify which genes (or diseases) are enriched in a set of HpoTerm
s, e.g. in
the clinical information of a patient or patient cohort
use hpo::Ontology;
use hpo::{HpoSet, term::HpoGroup};
use hpo::stats::hypergeom::gene_enrichment;
fn example() {
let ontology = Ontology::from_binary("tests/ontology.hpo").unwrap();
let mut hpos = HpoGroup::new();
hpos.insert(2943u32);
hpos.insert(8458u32);
hpos.insert(100884u32);
hpos.insert(2944u32);
hpos.insert(2751u32);
let patient_ci = HpoSet::new(&ontology, hpos);
let mut enrichments = gene_enrichment(&ontology, &patient_ci);
// the results are not sorted by default
enrichments.sort_by(|a, b| {
a.pvalue().partial_cmp(&b.pvalue()).unwrap()
});
for gene in enrichments {
println!("{}\t{}\t({})", gene.id(), gene.pvalue(), gene.enrichment());
}
}
§Benchmarks
As the saying goes: “Make it work, make it good, make it fast”. The work and good parts are realized in PyHPO. And even though I tried my best to make it fast, I was still hungry for more. So I started developing the hpo
Rust library in December 2022. Even without micro-benchmarking and tuning performance as much as I did for PyHPO
, hpo
is indeed much much faster already now.
The below benchmarks were run non scientificially and your mileage may vary. I used a MacBook Air M1, rustc 1.68.0
, Python 3.9
and /usr/bin/time
for timing.
Benchmark | PyHPO | hpo (single-threaded) | hpo (multi-threaded) |
---|---|---|---|
Read and Parse Ontology | 6.4 s | 0.22 s | 0.22 s |
Similarity of 17,245 x 1,000 terms | 98.5 s | 4.6 s | 1.0 s |
Similarity of GBA1 to all Diseases | 380 s | 15.8 s | 3.0 s |
Disease enrichment in all Genes | 11.8 s | 0.4 s | 0.3 s |
Common ancestors of 17,245 x 10,000 terms | 225.2 s | 10.5 | 2.1 |
§Technical design
There is some info about the plans for the implementation in the Technical Design document
Modules§
- annotations
- Genes and Diseases are linked to HPO terms and make up secondary annotations
- builder
Builder
can be used to manually create custom Ontologies- comparison
- Compare two versions of the HPO Ontology to each other
- matrix
- A custom matrix for quick row and column-based data access
- similarity
- Methods to calculate the Similarity between two terms or sets of terms
- stats
- Statistical analyses for
HpoTerm
annotations and enrichment - term
HpoTerm
s are the main building block of the Ontology. Each term is a descendent (child) of at least one other term (except for the root termHP:0000001 | All
). The relationship is modeled bi-drectionally inhpo
, so that every term also has one or severalchildren
(except for all leaf terms).- utils
- Utility structs and methods
Structs§
- HpoSet
- A set of unique HPO terms
- HpoTerm
- The
HpoTerm
represents a single term from the HP Ontology - HpoTerm
Id - The ID of an HPO-Term (e.g.
HP:0000123
) - Ontology
Ontology
is the main interface of thehpo
crate and contains all data
Enums§
- HpoError
- Main Error type for this crate
Constants§
- PHENOTYPE_
ID - The
HpoTermId
ofHP:0000118 | Phenotypic abnormality
Type Aliases§
- HpoResult
- Shortcut for
Result<T, HpoError>