Seqsum
Robust checksums for nucleotide sequences. Accepts input from either standard input or fast[a|q][.gz|.zst|.xz|.bz2] files. Generates individual checksums for each sequence, plus an aggregate checksum for a collection. Warnings are shown for duplicate sequences and within-collection checksum collisions at the selected bit depth. Sequences are uppercased before hashing with rapidhash (v3) and may be normalised (with -n) to use only ACGTN-. Read IDs and FASTQ base quality scores do not inform the checksum. Output is tab-delimited text to stdout.
By default, seqsum outputs individual checksums and, when there is more than one sequence, an aggregate checksum. This can be modified with --individual (-i) or --aggregate (-a).
Uses paraseq for efficient FASTA/FASTQ parsing.
Install
Development
Command line usage
# Fasta with one record
# Fasta with two records
# Fasta with two records, only show aggregate checksum
# Fasta via stdin
|
Built-in help