hpo 0.6.2

Human Phenotype Ontology Similarity
Documentation

HPO

This library is a Rust implementation of PyHPO.

What is this?

HPO, the Human Phenotype Ontology is a standard vocabulary of phenotypic abnormalities in human diseases. It is an Ontology, so all terms are connected to each other, similar to a directed graph.
This library provides convenient APIs to work with the ontology. The main goals are to compare terms (or sets of terms) to each other and run statistics for enrichment analysis.

For example, the terms "Migraine without aura" and "Migraine with aura" are more similar to each other than "Migraine" and "Seizure". To add more complexity, patients usually have more than one phenotypical abnormality. So in order to compare two patients to each other, we must cross-compare all individual terms. Eventually we might want to cluster hundreds or thousands of patients based on phenotypical similarity to predict diseases based on phenotypes and run statistical analyses.

The PyHPO Python library provides functionality for these comparisons, providing several different similarity and grouping algorithms. However, since its written in Python it is very slow. Unfortunately the design of PyHPO does not allow multithreading or parallel processing, which makes scaling rather difficult.

I want to overcome these limitations here with a Rust library.

There is some info about the plans for the implementation in the Technical Design document

If you find this project interesting and want to contribute, please get in touch, I could definitely need some help.

What is the current state?

At the moment, this library provides most of the functionality of PyHPO and it does so much much faster (Blazingly fast). For example, to calculate the GraphIC similarity for 400 x 400 terms, PyHPO runs for about 40 seconds on my MacBook Air M1. This Rust based hpo library finishes in less than 1 second. I can run a pairwise comparison of all 17,059 terms to each other in 13 seconds. PyHPO would need several hours for the same task. hpo also allows multithreading, e.g. using rayon.

You can check out some examples, including benchmarks, in the examples folder. I sometimes also include the corresponding Python code. As with all benchmarks, your mileage may vary, depending on your computer. But the overall trend stays the same.

API

The API is quite similar to the PyHPO functionality, but has several adoptions for more idiomatic Rust code and performance improvements. Some examples are below, more examples can be found in the examples subfolder. Most parts of the public API are documented inline and on docs.rs

Ontology

use hpo::{Ontology, HpoTermId};
use hpo::annotations::{GeneId, OmimDiseaseId};

fn example() {
    let ontology = Ontology::from_standard("/path/to/master-data/").unwrap();

    // iterate HPO terms
    for term in &ontology {
        // do something with term
    }

    // iterate Genes
    for gene in ontology.genes() {
        // do something with gene
    }

    // iterate omim diseases
    for disease in ontology.omim_diseases() {
        // do something with disease
    }

    // get a single HPO term using HPO ID
    let hpo_id = HpoTermId::try_from("HP:0000123").unwrap();
    let term = ontology.hpo(hpo_id);

    // get a single HPO term using `u32` part of HPO ID
    let hpo_id = HpoTermId::from(123u32);
    let term = ontology.hpo(hpo_id);

    // get a single Omim disease
    let disease_id = OmimDiseaseId::from(12345u32);
    let disease = ontology.omim_disease(&disease_id);

    // get a single Gene
    let hgnc_id = GeneId::from(12345u32);
    let gene = ontology.gene(&hgnc_id);

    // get a single Gene by its symbol
    let gene = ontology.gene_by_name("GBA");

}

HPO term

use hpo::Ontology;

fn example() {
    let ontology = Ontology::from_binary("/path/to/binary.hpo").unwrap();

    let term = ontology.hpo(123u32.into()).unwrap();

    assert_eq!("Abnormality of the nervous system", term.name());
    assert_eq!("HP:000123".to_string(), term.id().to_string());

    // iterate all parents
    for p in term.parents() {
        println!("{}", p.name())
    }

    // iterate all children
    for p in term.children() {
        println!("{}", p.name())
    }

    let term2 = ontology.hpo(1u32.into()).unwrap();

    assert!(term2.parent_of(&term));
    assert!(term.child_of(&term2));
}

Similarity

use hpo::Ontology;
use hpo::similarity::GraphIc;
use hpo::term::InformationContentKind;

fn example() {
    let ontology = Ontology::from_binary("/path/to/binary.hpo").unwrap();
    let term1 = ontology.hpo(123u32.into()).unwrap();
    let term2 = ontology.hpo(1u32.into()).unwrap();

    let ic = GraphIc::new(InformationContentKind::Omim);
    let similarity = term1.similarity_score(&term2, &ic);
}

Enrichment

Identify which genes (or diseases) are enriched in a set of HpoTerms, e.g. in the clinical information of a patient or patient cohort

use hpo::Ontology;
use hpo::{HpoSet, term::HpoGroup};
use hpo::stats::hypergeom::gene_enrichment;

fn example() {
    let ontology = Ontology::from_binary("/path/to/binary.hpo").unwrap();

    let mut hpos = HpoGroup::new();
    hpos.insert(2943u32.into());
    hpos.insert(8458u32.into());
    hpos.insert(100884u32.into());
    hpos.insert(2944u32.into());
    hpos.insert(2751u32.into());
    let patient_ci = HpoSet::new(&ontology, hpos);

    let mut enrichments = gene_enrichment(&ontology, &patient_ci);

    // the results are not sorted by default
    enrichments.sort_by(|a, b| {
        a.pvalue().partial_cmp(&b.pvalue()).unwrap()
    });

    for gene in enrichments {
        println!("{}\t{}\t({})", gene.id(), gene.pvalue(), gene.enrichment());
    }
}