Skip to main content

Crate tinyhgvs

Crate tinyhgvs 

Source
Expand description

Lightweight HGVS variant parser.

tinyhgvs parses a HGVS variant into explicit Rust structs and enums that describe:

  • the reference sequence context such as NM_004006.2 or NP_003997.1
  • the coordinate type such as coding DNA (c.), genomic DNA (g.), RNA (r.), or protein (p.)
  • the biological description itself, represented as either a nucleotide variant, a nucleotide allele, or a protein consequence

The crate is intentionally small. It aims to represent common, high-value HGVS syntax clearly, while returning structured errors for syntax families tracked in the unsupported inventory.

The main entry points are:

§Reading the Parsed Model

The HgvsVariant separates a HGVS syntax into three top-level parts:

  • reference: the reference source for a variant.
  • coordinate_system: the one-letter HGVS coordinate type.
  • description: the nucleotide or protein variant description, including location and base edits or effects.

§Examples

A substitution crossing exon/intron border (intronic):

use tinyhgvs::{NucleotideAnchor, NucleotideEdit, VariantDescription, parse_hgvs};

let variant = parse_hgvs("NM_004006.2:c.357+1G>A").unwrap();
let description = variant.description;

match description {
    VariantDescription::Nucleotide(nucleotide) => {
        assert_eq!(nucleotide.location.start().unwrap().anchor().unwrap(), NucleotideAnchor::Absolute);
        assert_eq!(nucleotide.location.start().unwrap().coordinate().unwrap(), 357);
        assert_eq!(nucleotide.location.start().unwrap().offset().unwrap(), 1);
        assert!(matches!(
            nucleotide.edit,
            NucleotideEdit::Substitution { ref reference, ref alternate }
                if reference == "G" && alternate == "A"
        ));
    }
    _ => unreachable!("expected nucleotide variant"),
}

A nucleotide allele keeps one first allele, an optional second established allele, and any later unphased additions:

use tinyhgvs::{AllelePhase, VariantDescription, parse_hgvs};

let variant = parse_hgvs("NM_004006.2:c.[2376G>C];[2376=]").unwrap();

match variant.description {
    VariantDescription::NucleotideAllele(allele) => {
        assert_eq!(allele.allele_one.variants.len(), 1);
        assert!(allele.allele_two.is_some());
        assert_eq!(allele.phase, Some(AllelePhase::Trans));
    }
    _ => unreachable!("expected nucleotide allele"),
}

A nonsense mutation leading to an early termination at protein-level:

use tinyhgvs::{CoordinateSystem, ProteinEffect, VariantDescription, parse_hgvs};

let variant = parse_hgvs("NP_003997.1:p.Trp24Ter").unwrap();
assert_eq!(variant.coordinate_system, CoordinateSystem::Protein);

match variant.description {
    VariantDescription::Protein(protein) => {
        assert!(!protein.is_predicted);
        assert!(matches!(protein.effect, ProteinEffect::Edit { .. }));
    }
    _ => unreachable!("expected protein variant"),
}

Unsupported syntax is reported with a stable diagnostic code:

use tinyhgvs::parse_hgvs;

let error = parse_hgvs("NM_004006.2:c.[2376G>C];[?]").unwrap_err();
assert_eq!(error.code(), "unsupported.allele_unknown_variant");

Structs§

Accession
A parsed accession with optional version.
Allele
One allele containing one or more inner variants.
AlleleVariant
Allele container holding an initial allele, an optional second established allele, and any later unphased alleles.
CopiedSequenceItem
Sequence copied from the same or another reference.
HgvsVariant
A parsed HGVS variant.
Interval
Inclusive interval used for nucleotide and protein locations.
LiteralSequenceItem
Literal inserted or replacement bases such as A or AGGG.
NucleotideRepeatBlock
One repeated block/unit in a nucleotide repeat variant description.
NucleotideVariant
Parsed nucleotide location and edit.
ParseHgvsError
A structured error returned when an HGVS string cannot be parsed.
ProteinCoordinate
Protein coordinate written as amino-acid symbol plus ordinal.
ProteinExtensionEdit
Model describing a protein extension consequence.
ProteinFrameshiftStop
Model describing stop codon information in a protein frameshift edit.
ProteinSequence
Ordered protein insertion or replacement sequence.
ProteinVariant
Parsed protein consequence.
ReferenceSpec
Reference metadata preceding the : in an HGVS expression.
RepeatSequenceItem
Repeated inserted or replacement sequence such as N[12].

Enums§

AllelePhase
Phase relationship between two established alleles.
CoordinateSystem
HGVS coordinate system.
Location
Main edited location on a nucleotide or protein variant/effect.
NucleotideAnchor
Anchor used by nucleotide coordinates.
NucleotideCoordinate
Nucleotide coordinate written as a known position or ?.
NucleotideEdit
Supported nucleotide edit families.
NucleotideSequenceItem
A single sequence item inside a nucleotide insertion or deletion-insertion.
ParseHgvsErrorKind
High-level classes of parse failures exposed by tinyhgvs.
ProteinEdit
Supported protein edit families in the first release.
ProteinEffect
Supported protein consequence forms.
ProteinExtensionTerminal
Protein terminus toward which an extension variant extends.
ProteinFrameshiftStopKind
Model describing a stop codon is known (long-form), or omitted (short-form), or unknown (not encountered) due to a frameshift event.
VariantDescription
Top-level variant description for nucleotide or protein syntax.

Functions§

parse_hgvs
Parses an HGVS string into the Rust HgvsVariant model.