Module genepredext

Source
Expand description

Convert from/to GenePredExt

The GenePred(ext) format is described by UCSC

§Schema for NCBI RefSeq - RefSeq genes from NCBI

ColumnTypeExampleDescription
namestrNR_046018.2Name of gene (usually transcript_id from GTF)
chromstrchr1Reference sequence chromosome or scaffold
strandenum(“+”, “-”)++ or - for strand
txStartint11873Transcription start position (or end position for minus strand item)
txEndint14409Transcription end position (or start position for minus strand item)
cdsStartint14409Coding region start (or end position for minus strand item)
cdsEndint4409Coding region end (or start position for minus strand item)
exonCountint3Number of exons
exonStartsList of int1873,12612,13220,Exon start positions (or end positions for minus strand item) (with trailing comma)
exonEndsList of int12227,12721,14409,Exon end positions (or start positions for minus strand item) (with trailing comma)
scoreint0The score field indicates a degree of confidence in the feature’s existence and coordinates
name2strDDX11L1Alternate name (e.g. gene_id from GTF)
cdsStartStatenum(“none”, “unk”, “incmpl”, “cmpl”)complStatus of CDS start annotation (none, unknown, incomplete, or complete)
cdsEndStatenum(“none”, “unk”, “incmpl”, “cmpl”)complStatus of CDS end annotation (none, unknown, incomplete, or complete)
exonFramesList of enum(-1, 0, 1, 2)-1,0,2,1Exon frame {0,1,2}, or -1 if no frame for exon

The format is almost identical to RefGene, it’s only missing the bin column. So, instead of reinventing the wheel, we copied most of refgene Reader and Writer code and just added/removed the bin column.

Structs§

Reader
Parses GenePredExt data and creates Transcripts.
Writer
Writes Transcripts into a BufWriter