Skip to main content

Crate rsomics_read_nvc

Crate rsomics_read_nvc 

Source
Expand description

Per-cycle nucleotide composition (NVC) from a BAM file.

For each read cycle position (0-based, 0..read_length-1), counts how many reads have each nucleotide (A, C, G, T, N, X) at that position, where X is any base not in {A, C, G, T, N}.

Reverse-strand reads (FLAG 0x10) are reverse-complemented before counting, so position 0 always corresponds to the first sequenced base.

§Filters applied

  • Skip unmapped reads (FLAG 0x0004).
  • Skip QC-fail reads (FLAG 0x0200).
  • Skip reads with MAPQ < mapq_cut (default 30).
  • Secondary and supplementary reads are not filtered.

§Output format

<prefix>.NVC.xls: tab-separated, columns Position, A, C, G, T, N, X. Each count value has a leading space (e.g. " 10991"), matching the exact byte format of RSeQC read_NVC.py. Each data row ends with a trailing tab before the newline. The header row has no leading spaces.

§Origin

This crate is an independent Rust reimplementation of RSeQC read_NVC.py based on:

  • The published method: Wang et al. 2012 https://doi.org/10.1093/bioinformatics/bts356
  • The public SAM/BAM format specification
  • Black-box behaviour testing against RSeQC 5.0.4 (read_NVC.py — GPL-v3; source not read; clean-room implementation)

No source code from the GPL upstream was used as reference during implementation. Test fixtures are independently generated.

License: MIT OR Apache-2.0. Upstream credit: RSeQC https://rseqc.sourceforge.net/ (GPL-v3).

Structs§

NvcTable
Per-cycle NVC table: counts[pos][base] where base index is A=0, C=1, G=2, T=3, N=4, X=5.

Functions§

compute_nvc
Scan bam_path and compute the NVC table.
run_nvc
Run the full NVC analysis and write the .NVC.xls output file.
write_nvc_xls
Write <prefix>.NVC.xls matching the exact byte format of read_NVC.py.