Expand description
Per-cycle nucleotide composition (NVC) from a BAM file.
For each read cycle position (0-based, 0..read_length-1), counts how many reads have each nucleotide (A, C, G, T, N, X) at that position, where X is any base not in {A, C, G, T, N}.
Reverse-strand reads (FLAG 0x10) are reverse-complemented before counting, so position 0 always corresponds to the first sequenced base.
§Filters applied
- Skip unmapped reads (FLAG 0x0004).
- Skip QC-fail reads (FLAG 0x0200).
- Skip reads with MAPQ <
mapq_cut(default 30). - Secondary and supplementary reads are not filtered.
§Output format
<prefix>.NVC.xls: tab-separated, columns Position, A, C, G, T,
N, X. Each count value has a leading space (e.g. " 10991"), matching
the exact byte format of RSeQC read_NVC.py. Each data row ends with a
trailing tab before the newline. The header row has no leading spaces.
§Origin
This crate is an independent Rust reimplementation of RSeQC
read_NVC.py based on:
- The published method: Wang et al. 2012 https://doi.org/10.1093/bioinformatics/bts356
- The public SAM/BAM format specification
- Black-box behaviour testing against
RSeQC5.0.4 (read_NVC.py— GPL-v3; source not read; clean-room implementation)
No source code from the GPL upstream was used as reference during implementation. Test fixtures are independently generated.
License: MIT OR Apache-2.0.
Upstream credit: RSeQC https://rseqc.sourceforge.net/ (GPL-v3).
Structs§
- NvcTable
- Per-cycle NVC table: counts[pos][base] where base index is A=0, C=1, G=2, T=3, N=4, X=5.
Functions§
- compute_
nvc - Scan
bam_pathand compute the NVC table. - run_nvc
- Run the full NVC analysis and write the
.NVC.xlsoutput file. - write_
nvc_ xls - Write
<prefix>.NVC.xlsmatching the exact byte format ofread_NVC.py.