Skip to main content

parse_hgvs

Function parse_hgvs 

Source
pub fn parse_hgvs(input: &str) -> Result<HgvsVariant, ParseHgvsError>
Expand description

Parses an HGVS string into the Rust HgvsVariant model.

Leading and trailing whitespace are ignored before parsing.

The returned model keeps the HGVS expression split into:

  • reference: the reference source for a variant.
  • coordinate_system: the one-letter reference coordinate type.
  • description, the nucleotide or protein variant description.

§Examples

A splice-adjacent substitution in an intron:

use tinyhgvs::{NucleotideAnchor, NucleotideEdit, VariantDescription, parse_hgvs};

let variant = parse_hgvs("  NM_004006.2:c.357+1G>A  ").unwrap();

match variant.description {
    VariantDescription::Nucleotide(nucleotide) => {
        assert_eq!(nucleotide.location.start.anchor, NucleotideAnchor::Absolute);
        assert_eq!(nucleotide.location.start.coordinate, 357);
        assert_eq!(nucleotide.location.start.offset, 1);
        assert!(matches!(
            nucleotide.edit,
            NucleotideEdit::Substitution { ref reference, ref alternate }
                if reference == "G" && alternate == "A"
        ));
    }
    _ => unreachable!("expected nucleotide variant"),
}

A 5’ UTR substitution keeps the signed coordinate from the HGVS string:

use tinyhgvs::{NucleotideAnchor, VariantDescription, parse_hgvs};

let variant = parse_hgvs("NM_007373.4:c.-1C>T").unwrap();

match variant.description {
    VariantDescription::Nucleotide(nucleotide) => {
        assert_eq!(nucleotide.location.start.anchor, NucleotideAnchor::RelativeCdsStart);
        assert_eq!(nucleotide.location.start.coordinate, -1);
        assert_eq!(nucleotide.location.start.offset, 0);
    }
    _ => unreachable!("expected nucleotide variant"),
}

A nonsense mutation leading to an early termination consequence at protein-level:

use tinyhgvs::{ProteinEffect, VariantDescription, parse_hgvs};

let variant = parse_hgvs("NP_003997.1:p.Trp24Ter").unwrap();

match variant.description {
    VariantDescription::Protein(protein) => {
        assert!(!protein.is_predicted);
        assert!(matches!(protein.effect, ProteinEffect::Edit { .. }));
    }
    _ => unreachable!("expected protein variant"),
}

An exact repeated sequence is returned as a repeat edit:

use tinyhgvs::{NucleotideEdit, VariantDescription, parse_hgvs};

let variant = parse_hgvs("NM_004006.3:r.-124_-123[14]").unwrap();

match variant.description {
    VariantDescription::Nucleotide(nucleotide) => {
        let NucleotideEdit::Repeat { blocks } = nucleotide.edit else {
            unreachable!("expected repeat edit");
        };
        assert_eq!(blocks[0].count, 14);
        assert_eq!(blocks[0].unit, None);
    }
    _ => unreachable!("expected nucleotide variant"),
}

Unsupported syntax is reported as a structured crate::ParseHgvsError:

use tinyhgvs::parse_hgvs;

let error = parse_hgvs("NM_004006.3:r.spl").unwrap_err();
assert_eq!(error.code(), "unsupported.rna_special_state");