hpo 0.2.0

Human Phenotype Ontology Similarity
Documentation

HPO

This crate is a draft for a Rust implementation of PyHPO.

:warning: Warning: The library is a work in progress and many function signatures will change. Many functions can panic in one way or another and I used unwrap a lot. This made protyping easier, but I'm currently slowly refactoring out those panics.

If you find this project interesting and want to contribute, please get in touch, I could definitely need some help. The code is not yet well documented and does not yet have many tests. At the moment, I'm primarily trying to get a working PoC. Once I'm there, I will adjust many method names and functionality and add more documentation and tests. The library does not contain any error handling and uses unwrap a lot - I plan to change this once I am ready to stabilize the overall API a bit more.

If you have another usecase for the hpo crate namespace and would like to use it, please let me know. I don't want to block the crate name if there are better use-cases for it.

What is this?

HPO, the Human Phenotype Ontology is a standard vocabulary of phenotypic abnormalities in human diseases. It is an Ontology, so all terms are connected to each other, similar to a directed graph.
This crate should give some convinient APIs to work with the ontology. The main goals are to compare terms to each other and also compare group of terms to each other. For example, the terms Migraine without aura and Migraine with aura are more similar to each other than Migraine and Seizure. To add more complexity, patients usually have more than one phenotypical abnormality. So in order to compare two patients to each other, we must cross-compare all individual terms. Eventually we might want to cluster hundreds or thousands of patients based on phenotypical similarity.

The PyHPO Python library provides functionality for these comparisons, providing several different similarity and grouping algorithms. However, since its written in Python it is rather slow. Unfortunately the design of PyHPO does not allow multithreading or parallel processing, which makes scaling rather difficult.

I want to overcome these limitations here with a Rust library.

There is some info about the plans for the implementation in the Technical Design document

API suggestions

At the moment, not all functionality in here is working, but most of it is. Check out the examples folder in the Github repository for some ways to use it.

Ontology

use hpo::Ontology;
use hpo::annotations::{GeneId, OmimDiseaseId};

# fn foobar() {
let ontology = Ontology::from_standard("/path/to/master-data").unwrap();

// iterate HPO terms
for term in &ontology {
    // do something with term
}

// iterate Genes
for gene in ontology.genes() {
    // do something with gene
}

// iterate omim diseases
for disease in ontology.omim_diseases() {
    // do something with disease
}

// get a single HPO term
let term = ontology.hpo("HP:0000123".try_into().unwrap());

// get a single Gene
let hgnc_id = GeneId::try_from("12345").unwrap();
let gene = ontology.gene(&hgnc_id);

// get a single Omim disease
let disease_id = OmimDiseaseId::try_from("12345").unwrap();
let disease = ontology.omim_disease(&disease_id);
# }

HPO term

let term = some_function();

assert_eq!("Abnormality of the nervous system", term.name());
assert_eq!("HP:000123", term.id());

// iterate all parents
for p in term.parents() {
    println!("{}", p.name())
}

// iterate all children
for p in term.children() {
    println!("{}", p.name())
}

let term2 = some_other_function()

assert!(term.parent_of(term2));
assert!(term2.child_of(term));

assert!(!term2.parent_of(term));
assert!(!term.child_of(term2));

Similarity

let term = some_function();
let term2 = some_other_function()
let ic = GraphIc::new(hpo::InformationContentKind::Omim);
let similarity = term1.similarity_score(&term2, &ic);