Skip to main content

Module parsing

Module parsing 

Source
Expand description

Parsers for extracting sequence dictionaries from various file formats.

This module provides parsers for:

  • SAM/BAM/CRAM files: Extract @SQ lines from alignment file headers
  • Picard .dict files: Parse sequence dictionary files
  • FASTA index (.fai) files: Parse FASTA index files
  • NCBI assembly reports: Parse NCBI assembly reports with multiple naming conventions
  • VCF headers: Extract ##contig lines from VCF files
  • TSV/CSV files: Parse tabular contig definitions

§Example

use ref_solver::parsing::sam::{parse_file, parse_header_text};
use std::path::Path;

// Parse from a BAM file
let query = parse_file(Path::new("sample.bam")).unwrap();

// Or parse from raw header text
let header = "@SQ\tSN:chr1\tLN:248_956_422\tM5:6aef897c3d6ff0c78aff06ac189178dd\n";
let query = parse_header_text(header).unwrap();

§Supported Tags

From SAM @SQ lines, the following tags are extracted:

TagDescriptionRequired
SNSequence nameYes
LNSequence lengthYes
M5MD5 checksumNo
ASAssembly identifierNo
URURI for sequenceNo
SPSpeciesNo
ANAlternate names (aliases)No

Modules§

dict
fai
Parser for FASTA index (.fai) files using noodles.
fasta
Parser for FASTA files using noodles.
ncbi_report
Parser for NCBI assembly report files.
sam
tsv
vcf
Parser for VCF header contig lines.