Expand description
chromsize Alejandro Gonzales-Irribarren, 2024
chromsize
is a utility designed to extract chromosome names
and their corresponding lengths from FASTA files. It supports
both plain and gzipped FASTA formats and offers an option to
include only the accession ID from the FASTA headers.
§Usage
To use chromsize
, you typically provide the input FASTA file
and specify the desired output file.
chromsize [OPTIONS] --fasta <FASTA> --output <OUTPUT>
§Options
Here’s a breakdown of the available command-line options:
-
-f
,--fasta <FASTA>
- Purpose: Specifies the path to the input FASTA file. This is a required option.
- Example:
--fasta genome.fasta
or--fasta sequences.fasta.gz
-
-o
,--output <OUTPUT>
- Purpose: Specifies the path where the output chromosome sizes will be written. This is a required option. The output will typically be a tab-separated file with chromosome names and their lengths.
- Example:
--output chrom_lengths.txt
-
-t
,--threads <THREADS>
- Purpose: Sets the number of threads to use for processing. This can speed up processing for large FASTA files.
- Default:
8
- Example:
--threads 4
(to use 4 threads)
-
-a
,--accession-only
- Purpose: A flag that, when present, instructs
chromsize
to only keep the accession ID part of the FASTA header. This means it will stop reading the header at the first blank space. If omitted, the entire header line up to the first newline character will be used as the chromosome name. - Example:
--accession-only
- Purpose: A flag that, when present, instructs
-
-h
,--help
- Purpose: Displays a help message with usage information and available options.
- Example:
chromsize --help
-
-V
,--version
- Purpose: Prints the version information of the
chromsize
tool. - Example:
chromsize --version
- Purpose: Prints the version information of the
§Example Usage Scenarios
-
Get chromosome sizes from a plain FASTA file, using full headers, with default threads:
chromsize --fasta input.fa --output chrom_sizes.txt
-
Get chromosome sizes from a gzipped FASTA file, extracting only accession IDs, using 4 threads:
chromsize --fasta input.fasta.gz --output accession_sizes.txt --accession-only --threads 4
Re-exports§
pub use size::*;
Modules§
- size
- chromsize Alejandro Gonzales-Irribarren, 2024