rsomics-tabix 0.1.1

Build a coordinate index (.tbi/.csi) for a bgzipped, position-sorted tab-delimited file and query regions — Rust port of htslib tabix
Documentation

rsomics-tabix

Build a coordinate index (.tbi / .csi) for a bgzip-compressed, position-sorted tab-delimited file, and query it by region — a Rust port of htslib tabix. Enables fast region-restricted access to large GFF/BED/SAM/VCF and arbitrary tab files.

Install

cargo install rsomics-tabix

Usage

rsomics-tabix -p bed regions.bed.gz        # writes regions.bed.gz.tbi
rsomics-tabix -C -p vcf calls.vcf.gz       # writes calls.vcf.gz.csi
rsomics-tabix calls.vcf.gz chr1:1000-2000  # query a region
rsomics-tabix -l calls.vcf.gz              # list sequence names
flag meaning default
-p, --preset gff|bed|sam|vcf input format preset gff
-s, --seq-col INT 1-based sequence-name column from preset
-b, --begin-col INT 1-based region-begin column from preset
-e, --end-col INT 1-based region-end column (0 = same as begin) from preset
-S, --skip-lines INT header lines to skip 0
-c, --comment CHAR comment line marker #
-0, --zero-based coordinates are 0-based (BED-style) off
-C, --csi emit a .csi index instead of .tbi tbi
-l, --list-chroms list sequence names in the index off
-f, --force overwrite an existing index off

The preset constants match htslib's tbx_conf_*: gff {sc=1,bc=4,ec=5,#}, bed {sc=1,bc=2,ec=3,#, 0-based}, sam {sc=3,bc=4,@}, vcf {sc=1,bc=2,#}.

Origin

Independent Rust reimplementation of htslib tabix based on the public BED/GFF/ SAM/VCF formats, the CSI/TBI index format specifications, htslib's MIT-licensed tbx.c / tbx.h (preset column layouts and the tbx_parse1 coordinate logic), and black-box testing against the tabix binary.

Index construction uses noodles (csi / tabix / bgzf, pure Rust, Quadrant ①). BGZF inflate uses the bundled libdeflate that htslib also uses.

License: MIT OR Apache-2.0. Upstream credit: htslib / tabix (MIT/Expat), Li, Bioinformatics 2011, doi:10.1093/bioinformatics/btq671.