Expand description
Perform random operations on fastq files, using unix streaming. Secure your analysis with Fasten!
§Synopsis
§read metrics
$ cat testdata/R1.fastq testdata/R2.fastq | \
fasten_shuffle | fasten_metrics | column -t
totalLength numReads avgReadLength avgQual
800 8 100 19.53875
§read cleaning
$ cat testdata/R1.fastq testdata/R2.fastq | \
fasten_shuffle | \
fasten_clean --paired-end --min-length 2 | \
gzip -c > cleaned.shuffled.fastq.gz
$ zcat cleaned.shuffled.fastq.gz | fasten_metrics | column -t
totalLength numReads avgReadLength avgQual
800 8 100 19.53875
NOTE: No reads were actually filtered with cleaning, with –min-length=2
§Kmer counting
$ cat testdata/R1.fastq | \
fasten_kmer -k 21 > 21mers.tsv
§Read sampling
$ cat testdata/R1.fastq testdata/R2.fastq | \
fasten_shuffle | \
fasten_sample --paired-end --frequency 0.1 > 10percent.fastq
§Advanced
§Set of downsampled reads
Create a set of downsampled reads for a titration experiment and clean them
for frequency in 0.1 0.2 0.3 0.4 0.5; do
cat testdata/R1.fastq testdata/R2.fastq | \
fasten_shuffle | \
fasten_clean --paired-end --min-length 50 --min-trim-quality 25
fasten_sample --paired-end --frequency $frequency > cleaned.$frequency.fastq
done
§Validate a whole directory of fastq reads
\ls *_1.fastq.gz | xargs -n 1 -P 4 bash -c '
echo -n "." >&2 # progress bar
R1=$0
R2=${0/_1.fastq.gz/_2.fastq.gz}
zcat $R1 $R2 | fasten_shuffle | fasten_validate --paired-end
'
Modules§
- io
- input/output methods
Macros§
- Rewrite print!() so that it doesn’t panic on broken pipe.
Functions§
- eexit
- Propagate an error by printing invalid read(s)
- fasten_
base_ options - a function that reads an options object and adds fasten default options.
- fasten_
base_ options_ matches - a function that processes the options on the command line The brief is a str that describes the program without using the program name, e.g., “counts kmers” for fasten_kmer. This function also takes care of –version. If –help is invoked, then the program name, the brief, and the usage() are all printed to stdout and then the program exits with 0.
- logmsg
- Print a formatted message to stderr
- reverse_
complement - Reverse complement a DNA sequence. Take into account lowercase vs uppercase. Ambiguity codes are also handled.