bqtools
A command-line utility for working with BINSEQ files.
Overview
bqtools provides tools to encode, decode, concatenate, and analyze BINSEQ (.bq) and VBINSEQ (.vbq) files. BINSEQ is a binary format designed for efficient storage of fixed-length DNA sequences, using 2-bit encoding for nucleotides. VBINSEQ is a binary format designed for efficient storage of variable-length DNA sequences with optional quality score support.
Features
- Encode: Convert FASTA or FASTQ files to a BINSEQ format
- Decode: Convert a BINSEQ file back to FASTA, FASTQ, or TSV format
- Cat: Concatenate multiple BINSEQ files
- Count: Count records in a BINSEQ file
Installation
From Cargo
bqtools can be installed using cargo, the Rust package manager:
To install cargo you can follow the instructions on the official Rust website.
From Source
# Clone the repository
# Install
# Check installation
Usage
# Get help information
# Get help for specific commands
Encoding
Convert FASTA/FASTQ files to BINSEQ format:
# Encode a single file to binseq
# Encode a single file to vbinseq
# Encode paired-end reads
# Encode paired-end reads to vbinseq
# Specify a policy for handling non-ATCG nucleotides
# Use multiple threads for parallel processing
Available policies for handling non-ATCG nucleotides:
i: Ignore sequences with non-ATCG charactersp: Break on invalid sequencesr: Randomly draw a nucleotide for each N (default)a: Set all Ns to Ac: Set all Ns to Cg: Set all Ns to Gt: Set all Ns to T
Note: Input FASTQ files may be compressed.
Decoding
Convert BINSEQ files back to FASTA/FASTQ/TSV:
# Decode to FASTQ (default)
# Decode to FASTA
# Decode paired-end reads into separate files
# Creates output_R1.fastq and output_R2.fastq
# Specify which read of a pair to output
# Specify output format
Concatenating
Combine multiple BINSEQ files:
Counting
Count records in a BINSEQ file:
Searching
You can easily search for specific subsequences or regular expressions within BINSEQ files:
# See full options list
# Search for a specific subsequence (in primary sequence)
# Search for a regular expression (in primary)
# Search for both a subsequence (in extended sequence) and a regular expression (in either)