Crate chromsize

Crate chromsize 

Source
Expand description

chromsize: fast chromosome size extraction from FASTA and 2bit.

chromsize reads sequence data (FASTA or 2bit) from a file or stdin, detects the input format by content, and returns chromosome sizes as (name, size) pairs. Gzip-compressed input is auto-detected and decompressed for both files and stdin.

§CLI

chromsize --sequence <SEQUENCE> --output <OUTPUT> [-t <THREADS>]

-s, --sequence <SEQUENCE>  Sequence file (FASTA/2bit, use '-' or omit to read stdin)
-o, --output <OUTPUT>      Output path for chrom.sizes
-t, --threads <THREADS>    Number of threads (default: all cores)

Examples:

  • stream FASTA: cat genome.fa | chromsize -o chrom.sizes
  • stream gzip FASTA: zcat genome.fa.gz | chromsize -o chrom.sizes
  • file input: chromsize -s genome.fa -o chrom.sizes
  • 2bit from stdin: cat genome.2bit | chromsize -s - -o chrom.sizes

§Library

use std::path::PathBuf;

let input = PathBuf::from("/path/to/genome.fa");
let output = PathBuf::from("/path/to/chrom.sizes");

let sizes = chromsize::get_sizes(&input).expect("failed to read input");
chromsize::writer(&sizes, &output).expect("failed to write sizes");

The get_sizes function auto-detects FASTA vs 2bit by content and supports stdin when the input path is -.

Re-exports§

pub use size::*;

Modules§

size
chromsize Alejandro Gonzales-Irribarren, 2024