tool,scenario_id,reference_args,task_description,category
admixture,admixture_01,data.bed 5 --cv=10 -j8,run ADMIXTURE for K=5 ancestral populations,population-genomics
admixture,admixture_02,data.bed 3 --seed=42 --cv=10 -j8,run ADMIXTURE with reproducible seed,population-genomics
admixture,admixture_03,data.bed 3 --supervised -j8,run supervised ADMIXTURE with known reference populations,population-genomics
admixture,admixture_04,data.bed K --cv=10 -j8 > admixture_K.log,run ADMIXTURE across multiple K values (shell loop),population-genomics
admixture,admixture_05,data.bed 5 -B100 -j8,run ADMIXTURE with 100 bootstrap replicates for standard errors,population-genomics
admixture,admixture_06,data.bed 5 -P -j8,run projection analysis onto a fixed P-matrix,population-genomics
admixture,admixture_07,data.bed 4 --seed=1 --cv=10 -j8 > run1.log,run multiple replicates for K=4 with different seeds to check convergence,population-genomics
admixture,admixture_08,data.bed 6 --cv=10 -j8 | tee admixture_K6.log,compare cross-validation errors across K values,population-genomics
admixture,admixture_09,data.bed 5 --maf=0.05 --cv=10 -j8,filter for minor allele frequency before running ADMIXTURE,population-genomics
admixture,admixture_10,data.bed 5 --em --cv=10 -j8,run ADMIXTURE with accelerated EM for faster convergence,population-genomics
agat,agat_01,agat_convert_sp_gff2gtf.pl --gff annotation.gff3 -o annotation.gtf,convert GFF3 to GTF format,annotation
agat,agat_02,agat_sp_statistics.pl --gff annotation.gff3 -o statistics_report.txt,get annotation statistics from a GFF3 file,annotation
agat,agat_03,agat_sp_filter_gene_by_length.pl --gff annotation.gff3 --size 300 -o filtered_annotation.gff3,filter genes by minimum length,annotation
agat,agat_04,agat_convert_sp_gxf2gxf.pl -g malformed.gff3 -o fixed.gff3,fix and standardize a malformed GFF3 file,annotation
agat,agat_05,agat_convert_sp_gff2gtf.pl --gff annotation.gff3 -o annotation.gtf,convert GFF3 to GTF format with default parameters,annotation
agat,agat_06,agat_sp_statistics.pl --gff annotation.gff3 -o statistics_report.txt --verbose,get annotation statistics from a GFF3 file with verbose output,annotation
agat,agat_07,agat_sp_filter_gene_by_length.pl --gff annotation.gff3 --size 300 -o filtered_annotation.gff3 -t 4,filter genes by minimum length using multiple threads,annotation
agat,agat_08,agat_convert_sp_gxf2gxf.pl -g malformed.gff3 -o fixed.gff3,fix and standardize a malformed GFF3 file and write output to a file,annotation
agat,agat_09,agat_convert_sp_gff2gtf.pl --gff annotation.gff3 -o annotation.gtf --quiet,convert GFF3 to GTF format in quiet mode,annotation
agat,agat_10,agat_sp_statistics.pl --gff annotation.gff3 -o statistics_report.txt,get annotation statistics from a GFF3 file with default parameters,annotation
angsd,angsd_01,-bam bam_list.txt -GL 1 -doGlf 2 -doMaf 1 -SNP_pval 1e-6 -minMapQ 30 -minQ 20 -nThreads 16 -out output,compute genotype likelihoods and allele frequencies for a set of BAMs,population-genomics
angsd,angsd_02,-bam pop1_bams.txt -GL 1 -doSaf 1 -anc ancestral.fasta -minMapQ 30 -minQ 20 -nThreads 16 -out pop1,compute per-site allele frequency spectrum for a single population,population-genomics
angsd,angsd_03,realSFS pop1.saf.idx -P 16 > pop1.sfs,estimate 1D site frequency spectrum from doSaf output,population-genomics
angsd,angsd_04,-bam pop1_bams.txt -GL 1 -doSaf 1 -doThetas 1 -pest pop1.sfs -anc ancestral.fasta -minMapQ 30 -minQ 20 -out pop1_thetas,estimate Watterson's theta and Tajima's D in sliding windows,population-genomics
angsd,angsd_05,realSFS pop1.saf.idx pop2.saf.idx -P 16 > pop1_pop2.2dsfs && realSFS fst index pop1.saf.idx pop2.saf.idx -sfs pop1_pop2.2dsfs -fstout pop1_pop2,compute Fst between two populations using 2D SFS,population-genomics
angsd,angsd_06,-bam bam_list.txt -GL 1 -doGlf 2 -doMaf 1 -SNP_pval 1e-6 -minMapQ 30 -minQ 20 -nInd 50 -minInd 40 -nThreads 16 -out snps_for_pca,call SNPs and compute principal component analysis input,population-genomics
angsd,angsd_07,-bam bam_list.txt -GL 1 -doGlf 2 -doMaf 1 -SNP_pval 1e-6 -minMapQ 30 -minQ 20 -nThreads 16 -out output -t 4,compute genotype likelihoods and allele frequencies for a set of BAMs using multiple threads,population-genomics
angsd,angsd_08,-bam pop1_bams.txt -GL 1 -doSaf 1 -anc ancestral.fasta -minMapQ 30 -minQ 20 -nThreads 16 -out pop1 -o output.txt,compute per-site allele frequency spectrum for a single population and write output to a file,population-genomics
angsd,angsd_09,realSFS pop1.saf.idx -P 16 > pop1.sfs,estimate 1D site frequency spectrum from doSaf output,population-genomics
angsd,angsd_10,-bam pop1_bams.txt -GL 1 -doSaf 1 -doThetas 1 -pest pop1.sfs -anc ancestral.fasta -minMapQ 30 -minQ 20 -out pop1_thetas,estimate Watterson's theta and Tajima's D in sliding windows with default parameters,population-genomics
arriba,arriba_01,--runMode alignReads --genomeDir /star_index/ --readFilesIn R1.fastq.gz R2.fastq.gz --readFilesCommand zcat --runThreadN 8 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample/ --chimSegmentMin 10 --chimOutType WithinBAM --chimJunctionOverhangMin 10 --chimScoreDropMax 30 --peOverlapNbasesMin 12,run STAR with chimeric output for Arriba fusion detection,rna-seq
arriba,arriba_02,-x sample/Aligned.sortedByCoord.out.bam -o fusions.tsv -O discarded_fusions.tsv -g genome.fa -a genes.gtf -b blacklist_hg38_GRCh38_v2.4.0.tsv.gz,detect gene fusions with Arriba,rna-seq
arriba,arriba_03,draw_fusions.R --fusions=fusions.tsv --alignments=sample/Aligned.sortedByCoord.out.bam --genome=genome.fa --annotation=genes.gtf --output=fusion_plots.pdf,visualize detected fusions with Arriba draw_fusions,rna-seq
arriba,arriba_04,--runMode alignReads --genomeDir /star_index/ --readFilesIn R1.fastq.gz R2.fastq.gz --readFilesCommand zcat --runThreadN 8 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample/ --chimSegmentMin 10 --chimOutType WithinBAM --chimJunctionOverhangMin 10 --chimScoreDropMax 30 --peOverlapNbasesMin 12 --quiet,run STAR with chimeric output for Arriba fusion detection in quiet mode,rna-seq
arriba,arriba_05,-x sample/Aligned.sortedByCoord.out.bam -o fusions.tsv -O discarded_fusions.tsv -g genome.fa -a genes.gtf -b blacklist_hg38_GRCh38_v2.4.0.tsv.gz,detect gene fusions with Arriba with default parameters,rna-seq
arriba,arriba_06,draw_fusions.R --fusions=fusions.tsv --alignments=sample/Aligned.sortedByCoord.out.bam --genome=genome.fa --annotation=genes.gtf --output=fusion_plots.pdf --verbose,visualize detected fusions with Arriba draw_fusions with verbose output,rna-seq
arriba,arriba_07,--runMode alignReads --genomeDir /star_index/ --readFilesIn R1.fastq.gz R2.fastq.gz --readFilesCommand zcat --runThreadN 8 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample/ --chimSegmentMin 10 --chimOutType WithinBAM --chimJunctionOverhangMin 10 --chimScoreDropMax 30 --peOverlapNbasesMin 12 -t 4,run STAR with chimeric output for Arriba fusion detection using multiple threads,rna-seq
arriba,arriba_08,-x sample/Aligned.sortedByCoord.out.bam -o fusions.tsv -O discarded_fusions.tsv -g genome.fa -a genes.gtf -b blacklist_hg38_GRCh38_v2.4.0.tsv.gz,detect gene fusions with Arriba and write output to a file,rna-seq
arriba,arriba_09,draw_fusions.R --fusions=fusions.tsv --alignments=sample/Aligned.sortedByCoord.out.bam --genome=genome.fa --annotation=genes.gtf --output=fusion_plots.pdf --quiet,visualize detected fusions with Arriba draw_fusions in quiet mode,rna-seq
arriba,arriba_10,--runMode alignReads --genomeDir /star_index/ --readFilesIn R1.fastq.gz R2.fastq.gz --readFilesCommand zcat --runThreadN 8 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample/ --chimSegmentMin 10 --chimOutType WithinBAM --chimJunctionOverhangMin 10 --chimScoreDropMax 30 --peOverlapNbasesMin 12,run STAR with chimeric output for Arriba fusion detection with default parameters,rna-seq
augustus,augustus_01,--species=human genome.fasta --gff3=on > augustus_predictions.gff3,predict genes in a eukaryotic genome using human parameters,annotation
augustus,augustus_02,--species=arabidopsis --hintsfile=rnaseq_hints.gff --extrinsicCfgFile=extrinsic.cfg genome.fasta --gff3=on > improved_predictions.gff3,predict genes with RNA-seq hints for improved accuracy,annotation
augustus,augustus_03,--species=fly --gff3=on --protein=on --codingseq=on genome.fasta > fly_predictions.gff3,predict genes and output protein sequences,annotation
augustus,augustus_04,--species=zebrafish zebrafish_masked.fasta --gff3=on --softmasking=1 > zebrafish_genes.gff3,run Augustus on a repeat-masked genome,annotation
augustus,augustus_05,--species=human genome.fasta --gff3=on > augustus_predictions.gff3,predict genes in a eukaryotic genome using human parameters with default parameters,annotation
augustus,augustus_06,--species=arabidopsis --hintsfile=rnaseq_hints.gff --extrinsicCfgFile=extrinsic.cfg genome.fasta --gff3=on > improved_predictions.gff3,predict genes with RNA-seq hints for improved accuracy,annotation
augustus,augustus_07,--species=fly --gff3=on --protein=on --codingseq=on genome.fasta > fly_predictions.gff3,predict genes and output protein sequences,annotation
augustus,augustus_08,--species=zebrafish zebrafish_masked.fasta --gff3=on --softmasking=1 > zebrafish_genes.gff3,run Augustus on a repeat-masked genome,annotation
augustus,augustus_09,--species=human genome.fasta --gff3=on > augustus_predictions.gff3,predict genes in a eukaryotic genome using human parameters,annotation
augustus,augustus_10,--species=arabidopsis --hintsfile=rnaseq_hints.gff --extrinsicCfgFile=extrinsic.cfg genome.fasta --gff3=on > improved_predictions.gff3,predict genes with RNA-seq hints for improved accuracy with default parameters,annotation
awk,awk_01,"-F ',' '{print $1"",""$3}' file.csv",print specific columns from a CSV file,text-processing
awk,awk_02,"'{sum+=$2} END{print ""Total:"", sum}' data.txt",sum values in a column and print the total,text-processing
awk,awk_03,'$3 > 100 {print $0}' data.tsv,filter and print lines where a column exceeds a threshold,text-processing
awk,awk_04,"'{count[$1]++} END{for(k in count) print k, count[k]}' data.txt",count occurrences of each unique value in a column,text-processing
awk,awk_05,"'/START/,/END/{print}' file.txt",print lines between two patterns (inclusive),text-processing
awk,awk_06,'prev!=$0{print; prev=$0}' file.txt,remove duplicate consecutive lines,text-processing
awk,awk_07,"'{print NR, $0}' file.txt",add line numbers to output,text-processing
awk,awk_08,"-F '\t' 'BEGIN{OFS="",""} {$1=$1; print}' input.tsv",convert tab-separated to comma-separated,text-processing
awk,awk_09,"'{sum+=$1; n++} END{if(n>0) print ""Average:"", sum/n}' values.txt",calculate average of a column,text-processing
awk,awk_10,'{print $NF}' file.txt,print the last field of each line regardless of column count,text-processing
bakta,bakta_01,--db /path/to/bakta_db/ --threads 8 --output annotation/ --prefix genome_annotation genome.fasta,annotate a bacterial genome with Bakta,annotation
bakta,bakta_02,--db /path/to/bakta_db/ --compliant --locus-tag MYORG --genus Escherichia --species coli --threads 8 --output ncbi_annotation/ --prefix ecoli_K12 genome.fasta,annotate genome for NCBI submission,annotation
bakta,bakta_03,--db /path/to/bakta_db/ --plasmid --threads 4 --output plasmid_annotation/ --prefix plasmid plasmid.fasta,annotate plasmid sequence,annotation
bakta,bakta_04,--db /path/to/bakta_db/ --threads 8 --output annotation/ --prefix genome_annotation genome.fasta --quiet,annotate a bacterial genome with Bakta in quiet mode,annotation
bakta,bakta_05,--db /path/to/bakta_db/ --compliant --locus-tag MYORG --genus Escherichia --species coli --threads 8 --output ncbi_annotation/ --prefix ecoli_K12 genome.fasta,annotate genome for NCBI submission with default parameters,annotation
bakta,bakta_06,--db /path/to/bakta_db/ --plasmid --threads 4 --output plasmid_annotation/ --prefix plasmid plasmid.fasta --verbose,annotate plasmid sequence with verbose output,annotation
bakta,bakta_07,--db /path/to/bakta_db/ --threads 8 --output annotation/ --prefix genome_annotation genome.fasta,annotate a bacterial genome with Bakta using multiple threads,annotation
bakta,bakta_08,--db /path/to/bakta_db/ --compliant --locus-tag MYORG --genus Escherichia --species coli --threads 8 --output ncbi_annotation/ --prefix ecoli_K12 genome.fasta,annotate genome for NCBI submission and write output to a file,annotation
bakta,bakta_09,--db /path/to/bakta_db/ --plasmid --threads 4 --output plasmid_annotation/ --prefix plasmid plasmid.fasta --quiet,annotate plasmid sequence in quiet mode,annotation
bakta,bakta_10,--db /path/to/bakta_db/ --threads 8 --output annotation/ --prefix genome_annotation genome.fasta,annotate a bacterial genome with Bakta with default parameters,annotation
bamtools,bamtools_01,stats -in input.bam > alignment_stats.txt,get alignment statistics from a BAM file,utilities
bamtools,bamtools_02,count -in input.bam,count aligned reads in a BAM file,utilities
bamtools,bamtools_03,filter -in input.bam -out filtered.bam -isMapped true -isProperPair true,"filter BAM to keep only mapped, properly paired reads",utilities
bamtools,bamtools_04,merge -in sample1.bam -in sample2.bam -in sample3.bam -out merged.bam,merge multiple BAM files,utilities
bamtools,bamtools_05,convert -in input.bam -format fastq -out reads.fastq,convert BAM to FASTQ,utilities
bamtools,bamtools_06,stats -in input.bam > alignment_stats.txt,get alignment statistics from a BAM file,utilities
bamtools,bamtools_07,count -in input.bam -t 4,count aligned reads in a BAM file using multiple threads,utilities
bamtools,bamtools_08,filter -in input.bam -out filtered.bam -isMapped true -isProperPair true -o output.txt,"filter BAM to keep only mapped, properly paired reads and write output to a file",utilities
bamtools,bamtools_09,merge -in sample1.bam -in sample2.bam -in sample3.bam -out merged.bam --quiet,merge multiple BAM files in quiet mode,utilities
bamtools,bamtools_10,convert -in input.bam -format fastq -out reads.fastq,convert BAM to FASTQ with default parameters,utilities
bash,bash_01,script.sh arg1 arg2,run a bash script,programming
bash,bash_02,-euo pipefail -c 'command1 | command2',run a script with strict error handling,programming
bash,bash_03,-c 'source ~/.bashrc && printenv',source a configuration file into the current shell,programming
bash,bash_04,--version,check bash version,programming
bash,bash_05,-x script.sh,run a script and print each command as it executes (debugging),programming
bash,bash_06,-c 'declare -f; alias',list all loaded functions and aliases,programming
bash,bash_07,-c 'export MY_VAR=test && echo $MY_VAR',execute a command in a subshell without affecting the current environment,programming
bash,bash_08,-c 'diff <(sort file1.txt) <(sort file2.txt)',write a multi-command pipeline using process substitution,programming
bash,bash_09,"-c 'long_running_command &; PID=$!; wait $PID; echo ""exit: $?""'",run a background pipeline job and capture its PID,programming
bash,bash_10,"-c 'for f in *.bam; do samtools flagstat ""$f"" > ""${f%.bam}.stats""; done'",loop over a list of files and process each,programming
bbtools,bbtools_01,bbduk.sh in=R1.fastq.gz in2=R2.fastq.gz out=R1_trimmed.fastq.gz out2=R2_trimmed.fastq.gz ref=adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo qtrim=r trimq=20 minlen=50,trim adapters and quality-filter with BBDuk,sequence-utilities
bbtools,bbtools_02,bbduk.sh in=R1.fastq.gz in2=R2.fastq.gz out=clean_R1.fastq.gz out2=clean_R2.fastq.gz ref=phix174_ill.ref.fa.gz k=31 hdist=1,remove PhiX contamination from a FASTQ,sequence-utilities
bbtools,bbtools_03,bbmap.sh in=reads.fastq.gz ref=genome.fa out=aligned.sam,align reads to a reference genome,sequence-utilities
bbtools,bbtools_04,bbmerge.sh in=R1.fastq.gz in2=R2.fastq.gz out=merged.fastq.gz outu=unmerged_R1.fastq.gz outu2=unmerged_R2.fastq.gz,merge overlapping paired-end reads,sequence-utilities
bbtools,bbtools_05,reformat.sh in=large.fastq.gz out=subset.fastq.gz samplereadstarget=1000000,subsample a FASTQ file to a specific number of reads,sequence-utilities
bbtools,bbtools_06,reformat.sh in=reads.fastq.gz out=reads.fa,convert FASTQ to FASTA,sequence-utilities
bbtools,bbtools_07,bbmap.sh in=sample.fastq.gz ref=human_genome.fa outm=host_reads.fastq.gz outu=non_host_reads.fastq.gz nodisk=t,remove host reads before metagenomics analysis,sequence-utilities
bbtools,bbtools_08,reformat.sh in=reads.fastq.gz,get detailed statistics for a FASTQ file,sequence-utilities
bbtools,bbtools_09,dedupe.sh in=reads.fastq.gz out=deduped.fastq.gz,remove duplicate reads with dedupe.sh,sequence-utilities
bbtools,bbtools_10,"bbsplit.sh in=sample.fastq.gz ref=genome1.fa,genome2.fa out_genome1=reads_genome1.fastq.gz out_genome2=reads_genome2.fastq.gz",split reads by genome of origin for metagenomics,sequence-utilities
bcftools,bcftools_01,mpileup -f reference.fa -O u input.bam | bcftools call -m -v -O z -o variants.vcf.gz,call variants from a BAM file against a reference genome,variant-calling
bcftools,bcftools_02,"view -i 'QUAL>30 && INFO/DP>10 && TYPE=""snp""' -O z -o filtered.vcf.gz input.vcf.gz","filter VCF to keep only high-quality SNPs (QUAL > 30, depth > 10)",variant-calling
bcftools,bcftools_03,merge -O z -o merged.vcf.gz sample1.vcf.gz sample2.vcf.gz sample3.vcf.gz,merge multiple VCF files from different samples,variant-calling
bcftools,bcftools_04,view -s SAMPLE_NAME -O z -o sample.vcf.gz multisample.vcf.gz,extract a specific sample from a multi-sample VCF,variant-calling
bcftools,bcftools_05,norm -m -any -f reference.fa -O z -o normalized.vcf.gz input.vcf.gz,normalize indels and split multi-allelic variants,variant-calling
bcftools,bcftools_06,stats input.vcf.gz > stats.txt,compute variant statistics for a VCF file,variant-calling
bcftools,bcftools_07,view -v snps -O z -o snps.vcf.gz input.vcf.gz,select only SNPs from a VCF file,variant-calling
bcftools,bcftools_08,annotate -a dbsnp.vcf.gz -c ID -O z -o annotated.vcf.gz input.vcf.gz,annotate VCF with a reference VCF (add ID field from dbSNP),variant-calling
bcftools,bcftools_09,mpileup -f reference.fa -O u input.bam | bcftools call -m -v -O z -o variants.vcf.gz,call variants from a BAM file against a reference genome,variant-calling
bcftools,bcftools_10,"view -i 'QUAL>30 && INFO/DP>10 && TYPE=""snp""' -O z -o filtered.vcf.gz input.vcf.gz","filter VCF to keep only high-quality SNPs (QUAL > 30, depth > 10) with default parameters",variant-calling
bedops,bedops_01,sort-bed input.bed > input.sorted.bed,sort a BED file for use with BEDOPS tools,genomic-intervals
bedops,bedops_02,--intersect a.sorted.bed b.sorted.bed > intersection.bed,intersect two sorted BED files (intervals present in both),genomic-intervals
bedops,bedops_03,--difference a.sorted.bed b.sorted.bed > a_not_b.bed,find intervals in file A that do not overlap file B,genomic-intervals
bedops,bedops_04,bedmap --echo --sum --delim '\t' genes.sorted.bed signal.sorted.bedgraph > genes_with_coverage.bed,compute coverage (sum of signal) from bigwig/bedgraph mapped to gene windows,genomic-intervals
bedops,bedops_05,starch input.sorted.bed > input.starch,compress a sorted BED file to starch format,genomic-intervals
bedops,bedops_06,bedextract chr1:100000-200000 input.sorted.bed,extract all intervals overlapping a specific region,genomic-intervals
bedops,bedops_07,--merge a.sorted.bed b.sorted.bed c.sorted.bed > merged_union.bed,merge overlapping intervals and compute union across three BED files,genomic-intervals
bedops,bedops_08,sort-bed input.bed > input.sorted.bed,sort a BED file for use with BEDOPS tools,genomic-intervals
bedops,bedops_09,--intersect a.sorted.bed b.sorted.bed > intersection.bed,intersect two sorted BED files (intervals present in both),genomic-intervals
bedops,bedops_10,--difference a.sorted.bed b.sorted.bed > a_not_b.bed,find intervals in file A that do not overlap file B with default parameters,genomic-intervals
bedtools,bedtools_01,intersect -a query.bed -b features.bed -wa,find intervals in file A that overlap with file B,genomic-intervals
bedtools,bedtools_02,subtract -a regions.bed -b blacklist.bed,subtract regions in B from regions in A,genomic-intervals
bedtools,bedtools_03,merge -i input.bed,merge overlapping intervals in a BED file,genomic-intervals
bedtools,bedtools_04,genomecov -ibam sorted.bam -bg > coverage.bedgraph,compute per-base coverage from a BAM file,genomic-intervals
bedtools,bedtools_05,closest -a query.bed -b annotations.bed -d,find closest non-overlapping feature in B for each interval in A,genomic-intervals
bedtools,bedtools_06,intersect -a genes.bed -b reads.bam -c,count overlaps between A intervals and B features,genomic-intervals
bedtools,bedtools_07,getfasta -fi reference.fa -bed intervals.bed -fo output.fa,get sequences for intervals in a BED file,genomic-intervals
bedtools,bedtools_08,genomecov -ibam sorted.bam -bga > coverage_all.bedgraph,compute coverage including zero-coverage positions,genomic-intervals
bedtools,bedtools_09,intersect -a query.bed -b features.bed -wb,intersect two BED files and report original B intervals that overlap A,genomic-intervals
bedtools,bedtools_10,makewindows -g genome.txt -w 1000 > windows.bed,make windows of fixed size across a genome,genomic-intervals
bismark,bismark_01,bismark_genome_preparation /path/to/genome_directory/,prepare bisulfite genome index for alignment,epigenomics
bismark,bismark_02,--genome /path/to/genome_dir/ -1 R1.fastq.gz -2 R2.fastq.gz --output_dir bismark_output/ -p 4,align paired-end WGBS reads to bisulfite genome,epigenomics
bismark,bismark_03,deduplicate_bismark --paired --bam sample_bismark_bt2_pe.bam,deduplicate bismark-aligned paired-end BAM file,epigenomics
bismark,bismark_04,bismark_methylation_extractor --paired-end --comprehensive --CX_context --genome_folder /path/to/genome_dir/ --output_dir methylation/ sample_deduplicated.bam,extract methylation information from deduplicated BAM,epigenomics
bismark,bismark_05,--genome /path/to/genome_dir/ --rrbs -1 R1.fastq.gz -2 R2.fastq.gz --output_dir rrbs_output/ -p 4,align RRBS data with MspI site handling,epigenomics
bismark,bismark_06,--genome /path/to/genome_dir/ --hisat2 reads.fastq.gz --output_dir bismark_output/ -p 4,align single-end WGBS reads with HISAT2 aligner,epigenomics
bismark,bismark_07,--genome /path/to/genome_dir/ --non_directional -1 R1.fastq.gz -2 R2.fastq.gz --output_dir pbat_output/ -p 4,align PBAT or scBS-seq (non-directional) library,epigenomics
bismark,bismark_08,bismark_methylation_extractor --paired-end --comprehensive --bedGraph --CX_context --genome_folder /path/to/genome_dir/ --output_dir methylation/ sample_deduplicated.bam,extract CpG methylation and generate bedGraph coverage file,epigenomics
bismark,bismark_09,bismark_genome_preparation --hisat2 /path/to/genome_directory/,prepare bisulfite index for HISAT2-based alignment,epigenomics
bismark,bismark_10,bismark_methylation_extractor --paired-end --mbias_only --genome_folder /path/to/genome_dir/ --output_dir mbias/ sample.bam,generate M-bias plot to identify read-end bias in methylation calls,epigenomics
blast,blast_01,-in genome.fasta -dbtype nucl -out genome_db -title 'Genome Database' -parse_seqids,build a nucleotide BLAST database from a FASTA file,sequence-utilities
blast,blast_02,-query query.fasta -db genome_db -out blast_results.txt -outfmt 6 -evalue 1e-5 -num_threads 8,run blastn to find similar nucleotide sequences,sequence-utilities
blast,blast_03,-query proteins.faa -db /path/to/nr -out blastp_results.txt -outfmt '6 std stitle staxids' -evalue 1e-5 -num_threads 16 -max_target_seqs 5,search protein sequences against NR database,sequence-utilities
blast,blast_04,-query contigs.fasta -db /path/to/swissprot -out blastx_results.txt -outfmt 6 -evalue 1e-5 -num_threads 8 -max_target_seqs 1,run blastx to annotate nucleotide sequences against protein database,sequence-utilities
blast,blast_05,-query query.fasta -db nr -out remote_blast.txt -outfmt 6 -remote -max_target_seqs 10,perform remote BLAST search against NCBI nr database,sequence-utilities
blast,blast_06,-in genome.fasta -dbtype nucl -out genome_db -title 'Genome Database' -parse_seqids --verbose,build a nucleotide BLAST database from a FASTA file with verbose output,sequence-utilities
blast,blast_07,-query query.fasta -db genome_db -out blast_results.txt -outfmt 6 -evalue 1e-5 -num_threads 8 -t 4,run blastn to find similar nucleotide sequences using multiple threads,sequence-utilities
blast,blast_08,-query proteins.faa -db /path/to/nr -out blastp_results.txt -outfmt '6 std stitle staxids' -evalue 1e-5 -num_threads 16 -max_target_seqs 5 -o output.txt,search protein sequences against NR database and write output to a file,sequence-utilities
blast,blast_09,-query contigs.fasta -db /path/to/swissprot -out blastx_results.txt -outfmt 6 -evalue 1e-5 -num_threads 8 -max_target_seqs 1 --quiet,run blastx to annotate nucleotide sequences against protein database in quiet mode,sequence-utilities
blast,blast_10,-query query.fasta -db nr -out remote_blast.txt -outfmt 6 -remote -max_target_seqs 10,perform remote BLAST search against NCBI nr database with default parameters,sequence-utilities
bowtie2,bowtie2_01,bowtie2-build reference.fa reference_index,build a bowtie2 index from a reference FASTA file,alignment
bowtie2,bowtie2_02,bowtie2-build --threads 8 reference.fa reference_index,build a bowtie2 index using multiple threads for a large genome,alignment
bowtie2,bowtie2_03,-x reference_index -1 R1.fastq.gz -2 R2.fastq.gz -p 8 | samtools view -b -o aligned.bam,align paired-end reads to a reference genome using 8 threads,alignment
bowtie2,bowtie2_04,-x reference_index -U reads.fastq.gz --very-sensitive -p 8 | samtools sort -o sorted.bam,align single-end reads with sensitive settings,alignment
bowtie2,bowtie2_05,-x reference_index -1 R1.fq.gz -2 R2.fq.gz -p 8 --no-unal -S aligned.sam 2> align_stats.txt,align paired-end reads and save the alignment statistics,alignment
bowtie2,bowtie2_06,-x reference_index -1 R1.fastq.gz -2 R2.fastq.gz -p 8 --rg-id sample1 --rg SM:sample1 --rg LB:lib1 --rg PL:ILLUMINA | samtools view -b -o sample1.bam,align paired-end reads with read group tags for GATK downstream analysis,alignment
bowtie2,bowtie2_07,-x reference_index -1 R1.fastq.gz -2 R2.fastq.gz --local --very-sensitive-local -p 8 | samtools view -b -o local_aligned.bam,align in local mode to allow soft-clipping of read ends,alignment
bowtie2,bowtie2_08,-x reference_index -1 R1.fastq.gz -2 R2.fastq.gz -p 16 --no-unal | samtools sort -@ 4 -o sorted.bam,align paired-end RNA-seq reads discarding unaligned reads,alignment
bowtie2,bowtie2_09,-x reference_index -U reads.fastq.gz --fast -p 4 -S quick_check.sam,align single-end reads in fast mode for a quick quality check,alignment
bowtie2,bowtie2_10,-x reference_index -1 R1.fastq.gz -2 R2.fastq.gz -p 8 --un-conc unmapped_%.fq | samtools view -b -o aligned.bam,align paired-end reads writing unmapped reads to separate files,alignment
bracken,bracken_01,-d /path/to/kraken2_db -i kraken_report.txt -o bracken_output.bracken -l S -r 150 -t 10,run Bracken on a Kraken2 report for species-level abundance estimation,metagenomics
bracken,bracken_02,-d /path/to/kraken2_db -i kraken_report.txt -o bracken_genus.bracken -l G -r 150 -t 5,run Bracken for genus-level abundance estimation,metagenomics
bracken,bracken_03,--files sample1.bracken sample2.bracken sample3.bracken --output combined_abundance.txt,combine Bracken results from multiple samples into one table,metagenomics
bracken,bracken_04,-d /path/to/kraken2_db -i kraken_report.txt -o bracken_75bp.bracken -l S -r 75 -t 10,run Bracken on short reads (75 bp),metagenomics
bracken,bracken_05,-d /path/to/kraken2_db -i kraken_report.txt -o bracken_output.bracken -l S -r 150 -t 10,run Bracken on a Kraken2 report for species-level abundance estimation with default parameters,metagenomics
bracken,bracken_06,-d /path/to/kraken2_db -i kraken_report.txt -o bracken_genus.bracken -l G -r 150 -t 5 --verbose,run Bracken for genus-level abundance estimation with verbose output,metagenomics
bracken,bracken_07,--files sample1.bracken sample2.bracken sample3.bracken --output combined_abundance.txt -t 4,combine Bracken results from multiple samples into one table using multiple threads,metagenomics
bracken,bracken_08,-d /path/to/kraken2_db -i kraken_report.txt -o bracken_75bp.bracken -l S -r 75 -t 10,run Bracken on short reads (75 bp) and write output to a file,metagenomics
bracken,bracken_09,-d /path/to/kraken2_db -i kraken_report.txt -o bracken_output.bracken -l S -r 150 -t 10 --quiet,run Bracken on a Kraken2 report for species-level abundance estimation in quiet mode,metagenomics
bracken,bracken_10,-d /path/to/kraken2_db -i kraken_report.txt -o bracken_genus.bracken -l G -r 150 -t 5,run Bracken for genus-level abundance estimation with default parameters,metagenomics
busco,busco_01,-i genome_assembly.fasta -o busco_bacteria -l bacteria_odb10 -m genome -c 8,assess completeness of a bacterial genome assembly,assembly
busco,busco_02,-i eukaryote_assembly.fasta -o busco_euk -l eukaryota_odb10 -m genome -c 16,assess completeness of a eukaryotic genome assembly,assembly
busco,busco_03,-i proteins.faa -o busco_proteome -l fungi_odb10 -m proteins -c 8,assess proteome completeness from predicted proteins,assembly
busco,busco_04,-i transcriptome.fasta -o busco_transcriptome -l vertebrata_odb10 -m transcriptome -c 8,assess transcriptome completeness,assembly
busco,busco_05,-i genome.fasta -o busco_autolineage -m genome --auto-lineage -c 16,run BUSCO with automatic lineage detection,assembly
busco,busco_06,-i genome_assembly.fasta -o busco_bacteria -l bacteria_odb10 -m genome -c 8 --verbose,assess completeness of a bacterial genome assembly with verbose output,assembly
busco,busco_07,-i eukaryote_assembly.fasta -o busco_euk -l eukaryota_odb10 -m genome -c 16 -t 4,assess completeness of a eukaryotic genome assembly using multiple threads,assembly
busco,busco_08,-i proteins.faa -o busco_proteome -l fungi_odb10 -m proteins -c 8,assess proteome completeness from predicted proteins and write output to a file,assembly
busco,busco_09,-i transcriptome.fasta -o busco_transcriptome -l vertebrata_odb10 -m transcriptome -c 8 --quiet,assess transcriptome completeness in quiet mode,assembly
busco,busco_10,-i genome.fasta -o busco_autolineage -m genome --auto-lineage -c 16,run BUSCO with automatic lineage detection with default parameters,assembly
bwa-mem2,bwa-mem2_01,index reference.fa,build BWA-MEM2 index from reference genome,alignment
bwa-mem2,bwa-mem2_02,mem -t 16 reference.fa R1.fastq.gz R2.fastq.gz | samtools sort -@ 4 -o sorted.bam,align paired-end reads to reference using 16 threads,alignment
bwa-mem2,bwa-mem2_03,mem -t 16 -R '@RG\tID:sample1\tSM:sample1\tLB:lib1\tPL:ILLUMINA' reference.fa R1.fastq.gz R2.fastq.gz | samtools view -b -o aligned.bam,align paired-end reads with GATK read group,alignment
bwa-mem2,bwa-mem2_04,index reference.fa --quiet,build BWA-MEM2 index from reference genome in quiet mode,alignment
bwa-mem2,bwa-mem2_05,mem -t 16 reference.fa R1.fastq.gz R2.fastq.gz | samtools sort -@ 4 -o sorted.bam,align paired-end reads to reference using 16 threads with default parameters,alignment
bwa-mem2,bwa-mem2_06,mem -t 16 -R '@RG\tID:sample1\tSM:sample1\tLB:lib1\tPL:ILLUMINA' reference.fa R1.fastq.gz R2.fastq.gz | samtools view -b -o aligned.bam,align paired-end reads with GATK read group,alignment
bwa-mem2,bwa-mem2_07,index reference.fa -t 4,build BWA-MEM2 index from reference genome using multiple threads,alignment
bwa-mem2,bwa-mem2_08,mem -t 16 reference.fa R1.fastq.gz R2.fastq.gz | samtools sort -@ 4 -o sorted.bam,align paired-end reads to reference using 16 threads,alignment
bwa-mem2,bwa-mem2_09,mem -t 16 -R '@RG\tID:sample1\tSM:sample1\tLB:lib1\tPL:ILLUMINA' reference.fa R1.fastq.gz R2.fastq.gz | samtools view -b -o aligned.bam,align paired-end reads with GATK read group,alignment
bwa-mem2,bwa-mem2_10,index reference.fa,build BWA-MEM2 index from reference genome with default parameters,alignment
bwa,bwa_01,index reference.fa,index a reference genome FASTA file,alignment
bwa,bwa_02,mem -t 8 reference.fa R1.fastq.gz R2.fastq.gz,align paired-end reads to a reference genome using 8 threads,alignment
bwa,bwa_03,mem -t 4 -R '@RG\tID:sample1\tSM:sample1\tLB:lib1\tPL:ILLUMINA' reference.fa reads.fastq.gz,align single-end reads and save as BAM with read group for GATK,alignment
bwa,bwa_04,mem -x ont2d reference.fa reads.fastq,align long reads (PacBio/Oxford Nanopore) to reference,alignment
bwa,bwa_05,mem -t 8 reference.fa R1.fastq.gz R2.fastq.gz | samtools sort -@ 4 -o sorted.bam,align paired-end reads and sort the output directly to a BAM file,alignment
bwa,bwa_06,mem -t 8 -R '@RG\tID:run1\tSM:patient1\tLB:lib1\tPL:ILLUMINA\tPU:unit1' reference.fa R1.fastq.gz R2.fastq.gz | samtools view -b -o sample1.bam,align paired-end reads with complete read group for GATK HaplotypeCaller,alignment
bwa,bwa_07,mem -t 8 reference.fa R1.fastq.gz R2.fastq.gz | samtools view -b -F 4 -o mapped.bam,align paired-end reads and report only mapped reads,alignment
bwa,bwa_08,mem -t 4 -B 4 -O 6 -E 1 reference.fa reads.fastq.gz > aligned.sam,align with specific gap extension and mismatch penalties,alignment
bwa,bwa_09,mem -t 8 -R '@RG\tID:sample2\tSM:sample2\tLB:lib2\tPL:ILLUMINA' reference.fa R1.fastq.gz R2.fastq.gz | samtools sort -@ 4 -o sample2_sorted.bam && samtools index sample2_sorted.bam,align paired-end reads in a pipeline saving both BAM and stats,alignment
bwa,bwa_10,mem -t 8 -Y reference.fa R1.fastq.gz R2.fastq.gz | samtools view -b -o sv_aligned.bam,align with soft-clipping allowed for structural variant discovery,alignment
canu,canu_01,-p ecoli_assembly -d canu_ecoli/ genomeSize=5m -nanopore reads.fastq.gz maxMemory=16g maxThreads=8,assemble bacterial genome from ONT reads,assembly
canu,canu_02,-p hifi_assembly -d canu_hifi/ genomeSize=3g -pacbio-hifi hifi_reads.fastq.gz maxMemory=64g maxThreads=32,assemble genome from PacBio HiFi reads,assembly
canu,canu_03,-p metagenome -d canu_meta/ genomeSize=100m -nanopore meta_reads.fastq.gz maxMemory=128g maxThreads=32 useGrid=false,assemble metagenome from ONT reads,assembly
canu,canu_04,-p assembly_only -d canu_assembly_only/ -assemble genomeSize=5m -nanopore-corrected corrected_reads.fasta maxMemory=16g maxThreads=8,run only the assembly stage (skip correction and trimming),assembly
canu,canu_05,-p ecoli_assembly -d canu_ecoli/ genomeSize=5m -nanopore reads.fastq.gz maxMemory=16g maxThreads=8,assemble bacterial genome from ONT reads with default parameters,assembly
canu,canu_06,-p hifi_assembly -d canu_hifi/ genomeSize=3g -pacbio-hifi hifi_reads.fastq.gz maxMemory=64g maxThreads=32 --verbose,assemble genome from PacBio HiFi reads with verbose output,assembly
canu,canu_07,-p metagenome -d canu_meta/ genomeSize=100m -nanopore meta_reads.fastq.gz maxMemory=128g maxThreads=32 useGrid=false,assemble metagenome from ONT reads using multiple threads,assembly
canu,canu_08,-p assembly_only -d canu_assembly_only/ -assemble genomeSize=5m -nanopore-corrected corrected_reads.fasta maxMemory=16g maxThreads=8 -o output.txt,run only the assembly stage (skip correction and trimming) and write output to a file,assembly
canu,canu_09,-p ecoli_assembly -d canu_ecoli/ genomeSize=5m -nanopore reads.fastq.gz maxMemory=16g maxThreads=8 --quiet,assemble bacterial genome from ONT reads in quiet mode,assembly
canu,canu_10,-p hifi_assembly -d canu_hifi/ genomeSize=3g -pacbio-hifi hifi_reads.fastq.gz maxMemory=64g maxThreads=32,assemble genome from PacBio HiFi reads with default parameters,assembly
cellsnp-lite,cellsnp-lite_01,-s possorted_genome_bam.bam -b barcodes.tsv -O cellsnp_out -R common_snps.vcf.gz -p 16 --minMAF 0.1 --minCOUNT 20,pileup known SNPs in a 10x Chromium scRNA-seq BAM with cell barcodes,single-cell
cellsnp-lite,cellsnp-lite_02,-s bulk.bam -O bulk_snp_out -R common_snps.vcf.gz -p 16 --minMAF 0.05 --minCOUNT 10,pileup SNPs in a bulk BAM without cell barcodes,single-cell
cellsnp-lite,cellsnp-lite_03,-s possorted_genome_bam.bam -b barcodes.tsv -O denovo_snp_out -p 16 --minMAF 0.1 --minCOUNT 100 --gzip,de novo SNP discovery in single-cell BAM (Mode 2),single-cell
cellsnp-lite,cellsnp-lite_04,"-s sample1.bam,sample2.bam,sample3.bam -O multi_sample_out -R common_snps.vcf.gz -p 16 --minMAF 0.1 --minCOUNT 20",pileup multiple BAMs from different samples at shared SNP positions,single-cell
cellsnp-lite,cellsnp-lite_05,-s possorted_genome_bam.bam -b barcodes.tsv -O chr1_out -R chr1_snps.vcf.gz --chrom 1 -p 8 --minMAF 0.1 --minCOUNT 20,restrict pileup to specific chromosomes to reduce runtime,single-cell
cellsnp-lite,cellsnp-lite_06,-s possorted_genome_bam.bam -b barcodes.tsv -O hq_out -R snps.vcf.gz -p 16 --minMAF 0.1 --minCOUNT 20 --minBQ 30 --minMAPQ 30,pileup with strict base quality filter for high-confidence allele counts,single-cell
cellsnp-lite,cellsnp-lite_07,-s possorted_genome_bam.bam -b barcodes.tsv -O cellsnp_out -R common_snps.vcf.gz -p 16 --minMAF 0.1 --minCOUNT 20,pileup known SNPs in a 10x Chromium scRNA-seq BAM with cell barcodes using multiple threads,single-cell
cellsnp-lite,cellsnp-lite_08,-s bulk.bam -O bulk_snp_out -R common_snps.vcf.gz -p 16 --minMAF 0.05 --minCOUNT 10 -o output.txt,pileup SNPs in a bulk BAM without cell barcodes and write output to a file,single-cell
cellsnp-lite,cellsnp-lite_09,-s possorted_genome_bam.bam -b barcodes.tsv -O denovo_snp_out -p 16 --minMAF 0.1 --minCOUNT 100 --gzip --quiet,de novo SNP discovery in single-cell BAM (Mode 2) in quiet mode,single-cell
cellsnp-lite,cellsnp-lite_10,"-s sample1.bam,sample2.bam,sample3.bam -O multi_sample_out -R common_snps.vcf.gz -p 16 --minMAF 0.1 --minCOUNT 20",pileup multiple BAMs from different samples at shared SNP positions with default parameters,single-cell
centrifuge,centrifuge_01,-x /databases/bv_bacteria -1 R1.fastq.gz -2 R2.fastq.gz -S classifications.tsv --report-file report.tsv -p 16,classify paired-end reads against a pre-built bacterial/viral database,metagenomics
centrifuge,centrifuge_02,-x /databases/nt -U reads.fastq.gz -S classifications.tsv --report-file report.tsv -p 16,classify single-end reads against the NT database,metagenomics
centrifuge,centrifuge_03,centrifuge-build -p 16 --taxonomy-tree nodes.dmp --name-table names.dmp --conversion-table seqid2taxid.map genomes.fasta custom_db,build a custom centrifuge index from bacterial reference genomes,metagenomics
centrifuge,centrifuge_04,-x /databases/viral -U reads.fastq.gz -S viral_hits.tsv --report-file viral_report.tsv -p 8 --min-hitlen 16,classify reads with increased sensitivity for viral detection,metagenomics
centrifuge,centrifuge_05,centrifuge-kreport -x /databases/bv_bacteria classifications.tsv > kraken_report.txt,convert centrifuge output to Kraken-style report for Pavian/Krona,metagenomics
centrifuge,centrifuge_06,-x /databases/hg38 -1 R1.fastq.gz -2 R2.fastq.gz -S human_classifications.tsv -p 16 --un-conc non_human_%.fastq.gz,remove human reads by classifying against human genome and excluding matches,metagenomics
centrifuge,centrifuge_07,centrifuge-build -p 8 --taxonomy-tree nodes.dmp --name-table names.dmp --conversion-table seqid2taxid.map viral_sequences.fasta viral_db,build centrifuge index from viral reference sequences,metagenomics
centrifuge,centrifuge_08,-x /databases/bv_bacteria -1 R1.fastq.gz -2 R2.fastq.gz -S classifications.tsv --report-file report.tsv -p 16 --un-conc unclassified_%.fastq.gz,classify reads and save unclassified reads for downstream assembly,metagenomics
centrifuge,centrifuge_09,-x /databases/nt -U reads.fastq.gz -S classifications.tsv --report-file report.tsv -p 16 --min-hitlen 30,use high minimum hit length for precision metagenomic classification,metagenomics
centrifuge,centrifuge_10,-x /databases/custom_microbiome -1 R1.fastq.gz -2 R2.fastq.gz -S classifications.tsv --report-file report.tsv -p 16 -k 5,classify paired-end metagenome against custom host-depleted database,metagenomics
checkm2,checkm2_01,predict --input bins_directory/ --output-directory checkm2_results/ --threads 16,assess quality of all MAG bins in a directory,metagenomics
checkm2,checkm2_02,predict --input bins_directory/ --output-directory checkm2_output/ --threads 16 --database_path /path/to/checkm2_database/,assess genome quality with custom database path,metagenomics
checkm2,checkm2_03,predict --input bins_directory/ --output-directory checkm2_results/ --threads 16 --allmodels,assess quality and produce detailed outputs including protein predictions,metagenomics
checkm2,checkm2_04,database --download --path /path/to/databases/,download the CheckM2 database,metagenomics
checkm2,checkm2_05,predict --input bins_directory/ --output-directory checkm2_results/ --threads 16,assess quality of all MAG bins in a directory with default parameters,metagenomics
checkm2,checkm2_06,predict --input bins_directory/ --output-directory checkm2_output/ --threads 16 --database_path /path/to/checkm2_database/ --verbose,assess genome quality with custom database path with verbose output,metagenomics
checkm2,checkm2_07,predict --input bins_directory/ --output-directory checkm2_results/ --threads 16 --allmodels,assess quality and produce detailed outputs including protein predictions using multiple threads,metagenomics
checkm2,checkm2_08,database --download --path /path/to/databases/ -o output.txt,download the CheckM2 database and write output to a file,metagenomics
checkm2,checkm2_09,predict --input bins_directory/ --output-directory checkm2_results/ --threads 16 --quiet,assess quality of all MAG bins in a directory in quiet mode,metagenomics
checkm2,checkm2_10,predict --input bins_directory/ --output-directory checkm2_output/ --threads 16 --database_path /path/to/checkm2_database/,assess genome quality with custom database path with default parameters,metagenomics
chopper,chopper_01,-q 10 -l 1000 --threads 8,filter ONT reads by minimum quality Q10 and minimum length 1000 bp,qc
chopper,chopper_02,-q 15 -l 500 --threads 8,"filter high-quality ONT reads for variant calling (Q15, min 500 bp)",qc
chopper,chopper_03,-q 10 -l 1000 --headcrop 30 --tailcrop 30 --threads 8,filter reads and remove low-quality ends,qc
chopper,chopper_04,-q 8 -l 200 --maxlength 50000 --threads 4,filter reads with maximum length cutoff for specific applications,qc
chopper,chopper_05,-q 10 -l 1000 --threads 8,filter ONT reads by minimum quality Q10 and minimum length 1000 bp with default parameters,qc
chopper,chopper_06,-q 15 -l 500 --threads 8 --verbose,"filter high-quality ONT reads for variant calling (Q15, min 500 bp) with verbose output",qc
chopper,chopper_07,-q 10 -l 1000 --headcrop 30 --tailcrop 30 --threads 8,filter reads and remove low-quality ends using multiple threads,qc
chopper,chopper_08,-q 8 -l 200 --maxlength 50000 --threads 4 -o output.txt,filter reads with maximum length cutoff for specific applications and write output to a file,qc
chopper,chopper_09,-q 10 -l 1000 --threads 8 --quiet,filter ONT reads by minimum quality Q10 and minimum length 1000 bp in quiet mode,qc
chopper,chopper_10,-q 15 -l 500 --threads 8,"filter high-quality ONT reads for variant calling (Q15, min 500 bp) with default parameters",qc
chromap,chromap_01,-i -r genome.fa -o genome.index,build Chromap genome index,epigenomics
chromap,chromap_02,--preset atac -x genome.index -r genome.fa -1 R1.fastq.gz -2 R2.fastq.gz -o fragments.bed -t 16,align paired-end ATAC-seq reads with Chromap,epigenomics
chromap,chromap_03,--preset atac -x genome.index -r genome.fa -1 R1.fastq.gz -2 R2.fastq.gz -b barcode.fastq.gz --barcode-whitelist whitelist.txt -o scatac_fragments.bed -t 16,process single-cell ATAC-seq with barcodes,epigenomics
chromap,chromap_04,--preset chip -x genome.index -r genome.fa -1 R1.fastq.gz -2 R2.fastq.gz -o chip_aligned.bed -t 16,align ChIP-seq reads with Chromap,epigenomics
chromap,chromap_05,-i -r genome.fa -o genome.index,build Chromap genome index with default parameters,epigenomics
chromap,chromap_06,--preset atac -x genome.index -r genome.fa -1 R1.fastq.gz -2 R2.fastq.gz -o fragments.bed -t 16 --verbose,align paired-end ATAC-seq reads with Chromap with verbose output,epigenomics
chromap,chromap_07,--preset atac -x genome.index -r genome.fa -1 R1.fastq.gz -2 R2.fastq.gz -b barcode.fastq.gz --barcode-whitelist whitelist.txt -o scatac_fragments.bed -t 16,process single-cell ATAC-seq with barcodes using multiple threads,epigenomics
chromap,chromap_08,--preset chip -x genome.index -r genome.fa -1 R1.fastq.gz -2 R2.fastq.gz -o chip_aligned.bed -t 16,align ChIP-seq reads with Chromap and write output to a file,epigenomics
chromap,chromap_09,-i -r genome.fa -o genome.index --quiet,build Chromap genome index in quiet mode,epigenomics
chromap,chromap_10,--preset atac -x genome.index -r genome.fa -1 R1.fastq.gz -2 R2.fastq.gz -o fragments.bed -t 16,align paired-end ATAC-seq reads with Chromap with default parameters,epigenomics
cnvkit,cnvkit_01,batch tumor.bam --normal normal.bam --targets targets.bed --annotate refFlat.txt --fasta reference.fa --access access.hg38.bed --output-reference normal_reference.cnn --output-dir cnvkit_output/ -p 8,run CNVkit batch workflow for tumor-normal WES,variant-calling
cnvkit,cnvkit_02,batch tumor.bam --reference normal_reference.cnn --targets targets.bed --output-dir cnvkit_tumor_only/ -p 4,run CNVkit on tumor-only WES with pre-built reference,variant-calling
cnvkit,cnvkit_03,scatter tumor.cnr -s tumor.cns -o cnv_scatter.pdf,visualize CNV scatter plot,variant-calling
cnvkit,cnvkit_04,call tumor.cns -o tumor.call.cns --center median --purity 0.8,call integer copy numbers from segments,variant-calling
cnvkit,cnvkit_05,batch tumor.bam --normal normal.bam --targets targets.bed --annotate refFlat.txt --fasta reference.fa --access access.hg38.bed --output-reference normal_reference.cnn --output-dir cnvkit_output/ -p 8,run CNVkit batch workflow for tumor-normal WES with default parameters,variant-calling
cnvkit,cnvkit_06,batch tumor.bam --reference normal_reference.cnn --targets targets.bed --output-dir cnvkit_tumor_only/ -p 4 --verbose,run CNVkit on tumor-only WES with pre-built reference with verbose output,variant-calling
cnvkit,cnvkit_07,scatter tumor.cnr -s tumor.cns -o cnv_scatter.pdf -t 4,visualize CNV scatter plot using multiple threads,variant-calling
cnvkit,cnvkit_08,call tumor.cns -o tumor.call.cns --center median --purity 0.8,call integer copy numbers from segments and write output to a file,variant-calling
cnvkit,cnvkit_09,batch tumor.bam --normal normal.bam --targets targets.bed --annotate refFlat.txt --fasta reference.fa --access access.hg38.bed --output-reference normal_reference.cnn --output-dir cnvkit_output/ -p 8 --quiet,run CNVkit batch workflow for tumor-normal WES in quiet mode,variant-calling
cnvkit,cnvkit_10,batch tumor.bam --reference normal_reference.cnn --targets targets.bed --output-dir cnvkit_tumor_only/ -p 4,run CNVkit on tumor-only WES with pre-built reference with default parameters,variant-calling
crossmap,crossmap_01,bed hg19ToHg38.over.chain.gz input_hg19.bed output_hg38.bed,convert BED file from hg19 to hg38 coordinates,utilities
crossmap,crossmap_02,vcf hg19ToHg38.over.chain.gz input_hg19.vcf hg38_reference.fa output_hg38.vcf,convert VCF file from hg19 to hg38 with target reference,utilities
crossmap,crossmap_03,gff hg19ToHg38.over.chain.gz annotation_hg19.gtf output_hg38.gtf,convert GFF/GTF annotation from one assembly to another,utilities
crossmap,crossmap_04,bam hg19ToHg38.over.chain.gz input_hg19.bam output_hg38.bam,convert BAM file from one genome build to another,utilities
crossmap,crossmap_05,bed hg19ToHg38.over.chain.gz input_hg19.bed output_hg38.bed,convert BED file from hg19 to hg38 coordinates with default parameters,utilities
crossmap,crossmap_06,vcf hg19ToHg38.over.chain.gz input_hg19.vcf hg38_reference.fa output_hg38.vcf --verbose,convert VCF file from hg19 to hg38 with target reference with verbose output,utilities
crossmap,crossmap_07,gff hg19ToHg38.over.chain.gz annotation_hg19.gtf output_hg38.gtf -t 4,convert GFF/GTF annotation from one assembly to another using multiple threads,utilities
crossmap,crossmap_08,bam hg19ToHg38.over.chain.gz input_hg19.bam output_hg38.bam -o output.txt,convert BAM file from one genome build to another and write output to a file,utilities
crossmap,crossmap_09,bed hg19ToHg38.over.chain.gz input_hg19.bed output_hg38.bed --quiet,convert BED file from hg19 to hg38 coordinates in quiet mode,utilities
crossmap,crossmap_10,vcf hg19ToHg38.over.chain.gz input_hg19.vcf hg38_reference.fa output_hg38.vcf,convert VCF file from hg19 to hg38 with target reference with default parameters,utilities
curl,curl_01,-L -O https://example.com/files/archive.tar.gz,download a file and save with its original filename,networking
curl,curl_02,-L -o /data/dataset.csv https://example.com/dataset.csv,download a file and save to a specific local filename,networking
curl,curl_03,"-X POST -H 'Content-Type: application/json' -d '{""name"":""test"",""value"":42}' https://api.example.com/endpoint",send a JSON POST request to an API,networking
curl,curl_04,-H 'Authorization: Bearer TOKEN' https://api.example.com/data,authenticate with a Bearer token and call an API,networking
curl,curl_05,-L -C - -O https://example.com/large-file.iso,resume an interrupted download,networking
curl,curl_06,-I https://example.com,fetch only HTTP response headers,networking
curl,curl_07,-X POST -F 'file=@/local/path/data.txt' -F 'name=upload' https://api.example.com/upload,send a multipart form upload,networking
curl,curl_08,-L --progress-bar -o output.zip https://example.com/file.zip,download with progress bar and follow redirects silently,networking
curl,curl_09,--connect-timeout 10 --retry 3 --retry-delay 5 -L -O https://example.com/file.tar.gz,set a connection timeout and retry on failure,networking
curl,curl_10,-u alice:password123 https://protected.example.com/api,pass basic authentication credentials,networking
cutadapt,cutadapt_01,-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -o R1_trimmed.fastq.gz -p R2_trimmed.fastq.gz R1.fastq.gz R2.fastq.gz,remove Illumina TruSeq adapters from paired-end reads,qc
cutadapt,cutadapt_02,-a AGATCGGAAGAGC -A AGATCGGAAGAGC -q 20 --minimum-length 36 -j 8 -o R1_trimmed.fastq.gz -p R2_trimmed.fastq.gz R1.fastq.gz R2.fastq.gz,"trim adapters and quality-filter, discarding short reads",qc
cutadapt,cutadapt_03,-a A{20} -q 20 --minimum-length 30 -j 4 -o trimmed.fastq.gz reads.fastq.gz,remove polyA tail from single-end RNA-seq reads,qc
cutadapt,cutadapt_04,-a CTGTCTCTTATA -A CTGTCTCTTATA -q 20 --minimum-length 20 -j 8 -o R1_trimmed.fastq.gz -p R2_trimmed.fastq.gz R1.fastq.gz R2.fastq.gz,trim Nextera transposase adapters from paired-end ATAC-seq data,qc
cutadapt,cutadapt_05,-g ACACTGACGACATGGTTCTACA --discard-untrimmed -o trimmed.fastq.gz reads.fastq.gz,remove 5' primer from single-end amplicon reads,qc
cutadapt,cutadapt_06,-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -o R1_trimmed.fastq.gz -p R2_trimmed.fastq.gz R1.fastq.gz R2.fastq.gz --verbose,remove Illumina TruSeq adapters from paired-end reads with verbose output,qc
cutadapt,cutadapt_07,-a AGATCGGAAGAGC -A AGATCGGAAGAGC -q 20 --minimum-length 36 -j 8 -o R1_trimmed.fastq.gz -p R2_trimmed.fastq.gz R1.fastq.gz R2.fastq.gz,"trim adapters and quality-filter, discarding short reads using multiple threads",qc
cutadapt,cutadapt_08,-a A{20} -q 20 --minimum-length 30 -j 4 -o trimmed.fastq.gz reads.fastq.gz,remove polyA tail from single-end RNA-seq reads and write output to a file,qc
cutadapt,cutadapt_09,-a CTGTCTCTTATA -A CTGTCTCTTATA -q 20 --minimum-length 20 -j 8 -o R1_trimmed.fastq.gz -p R2_trimmed.fastq.gz R1.fastq.gz R2.fastq.gz --quiet,trim Nextera transposase adapters from paired-end ATAC-seq data in quiet mode,qc
cutadapt,cutadapt_10,-g ACACTGACGACATGGTTCTACA --discard-untrimmed -o trimmed.fastq.gz reads.fastq.gz,remove 5' primer from single-end amplicon reads with default parameters,qc
deeptools,deeptools_01,bamCoverage -b sorted.bam -o output.bw --normalizeUsing RPKM --binSize 10 -p 8,generate normalized bigWig coverage track from a BAM file,epigenomics
deeptools,deeptools_02,bamCompare -b1 chip.bam -b2 input.bam -o chip_vs_input_log2.bw --normalizeUsing RPKM --binSize 10 -p 8,create log2 ratio (ChIP/Input) bigWig track,epigenomics
deeptools,deeptools_03,computeMatrix reference-point -S chip.bw -R genes.bed --referencePoint TSS -b 3000 -a 3000 -o matrix.gz -p 8,compute signal matrix around TSS for heatmap visualization,epigenomics
deeptools,deeptools_04,plotHeatmap -m matrix.gz -out heatmap.png --colorMap RdBu_r --whatToShow 'heatmap and colorbar',plot heatmap of signal around genomic regions,epigenomics
deeptools,deeptools_05,multiBamSummary bins -b sample1.bam sample2.bam sample3.bam -o readCounts.npz -p 8,compute read count correlation between multiple BAM files,epigenomics
deeptools,deeptools_06,bamCoverage -b atac_sorted.bam -o atac_signal.bw --ATACshift --normalizeUsing RPGC --effectiveGenomeSize 2913022398 --binSize 10 -p 8,generate ATAC-seq normalized bigWig with Tn5 shift correction,epigenomics
deeptools,deeptools_07,bamCoverage -b sorted.bam -o output.bw --normalizeUsing RPKM --binSize 10 -p 8,generate normalized bigWig coverage track from a BAM file using multiple threads,epigenomics
deeptools,deeptools_08,bamCompare -b1 chip.bam -b2 input.bam -o chip_vs_input_log2.bw --normalizeUsing RPKM --binSize 10 -p 8,create log2 ratio (ChIP/Input) bigWig track and write output to a file,epigenomics
deeptools,deeptools_09,computeMatrix reference-point -S chip.bw -R genes.bed --referencePoint TSS -b 3000 -a 3000 -o matrix.gz -p 8 --quiet,compute signal matrix around TSS for heatmap visualization in quiet mode,epigenomics
deeptools,deeptools_10,plotHeatmap -m matrix.gz -out heatmap.png --colorMap RdBu_r --whatToShow 'heatmap and colorbar',plot heatmap of signal around genomic regions with default parameters,epigenomics
delly,delly_01,call -g reference.fa -o sample_svs.bcf sample.bam,call structural variants from a single sample,variant-calling
delly,delly_02,call -g reference.fa -x hg38.excl -o sample_svs.bcf sample.bam,call SVs with repetitive region exclusion list,variant-calling
delly,delly_03,call -g reference.fa -x hg38.excl -o somatic_svs.bcf tumor.bam normal.bam,call somatic SVs from tumor-normal pair,variant-calling
delly,delly_04,filter -f somatic -o somatic_filtered.bcf -s samples.tsv somatic_svs.bcf,filter somatic SVs from DELLY output,variant-calling
delly,delly_05,merge -o merged_sites.bcf sample1.bcf sample2.bcf sample3.bcf,merge per-sample SV calls for population analysis,variant-calling
delly,delly_06,call -g reference.fa -o sample_svs.bcf sample.bam --verbose,call structural variants from a single sample with verbose output,variant-calling
delly,delly_07,call -g reference.fa -x hg38.excl -o sample_svs.bcf sample.bam -t 4,call SVs with repetitive region exclusion list using multiple threads,variant-calling
delly,delly_08,call -g reference.fa -x hg38.excl -o somatic_svs.bcf tumor.bam normal.bam,call somatic SVs from tumor-normal pair and write output to a file,variant-calling
delly,delly_09,filter -f somatic -o somatic_filtered.bcf -s samples.tsv somatic_svs.bcf --quiet,filter somatic SVs from DELLY output in quiet mode,variant-calling
delly,delly_10,merge -o merged_sites.bcf sample1.bcf sample2.bcf sample3.bcf,merge per-sample SV calls for population analysis with default parameters,variant-calling
diamond,diamond_01,makedb --in nr.faa -d nr_diamond --threads 8,build a DIAMOND protein database from a FASTA file,metagenomics
diamond,diamond_02,blastp -q proteins.faa -d nr_diamond -o blastp_results.tsv --outfmt 6 --threads 8 --evalue 1e-5,search protein sequences against a DIAMOND database (blastp),metagenomics
diamond,diamond_03,blastx -q reads.fastq.gz -d nr_diamond -o blastx_results.tsv --outfmt 6 --threads 16 --evalue 1e-5 --max-target-seqs 1,search DNA reads against protein database using blastx (translated search),metagenomics
diamond,diamond_04,blastp -q proteins.faa -d uniprot_diamond -o detailed_results.tsv --outfmt '6 qseqid sseqid pident length evalue bitscore stitle' --more-sensitive --threads 8,sensitive mode search with custom output fields,metagenomics
diamond,diamond_05,blastx -q metagenome.faa -d nr_diamond --taxonmap prot.accession2taxid.gz --taxonnodes nodes.dmp -o results_tax.tsv --outfmt '6 qseqid sseqid pident evalue bitscore staxids sscinames' --threads 16,search with taxonomy-aware output for functional annotation,metagenomics
diamond,diamond_06,makedb --in nr.faa -d nr_diamond --threads 8 --verbose,build a DIAMOND protein database from a FASTA file with verbose output,metagenomics
diamond,diamond_07,blastp -q proteins.faa -d nr_diamond -o blastp_results.tsv --outfmt 6 --threads 8 --evalue 1e-5,search protein sequences against a DIAMOND database (blastp) using multiple threads,metagenomics
diamond,diamond_08,blastx -q reads.fastq.gz -d nr_diamond -o blastx_results.tsv --outfmt 6 --threads 16 --evalue 1e-5 --max-target-seqs 1,search DNA reads against protein database using blastx (translated search) and write output to a file,metagenomics
diamond,diamond_09,blastp -q proteins.faa -d uniprot_diamond -o detailed_results.tsv --outfmt '6 qseqid sseqid pident length evalue bitscore stitle' --more-sensitive --threads 8 --quiet,sensitive mode search with custom output fields in quiet mode,metagenomics
diamond,diamond_10,blastx -q metagenome.faa -d nr_diamond --taxonmap prot.accession2taxid.gz --taxonnodes nodes.dmp -o results_tax.tsv --outfmt '6 qseqid sseqid pident evalue bitscore staxids sscinames' --threads 16,search with taxonomy-aware output for functional annotation with default parameters,metagenomics
fastp,fastp_01,-i R1.fastq.gz -I R2.fastq.gz -o clean_R1.fastq.gz -O clean_R2.fastq.gz -h report.html -j report.json -w 8,quality trim and filter paired-end FASTQ reads with 8 threads,qc
fastp,fastp_02,-i reads.fastq.gz -o clean_reads.fastq.gz -l 50 -h report.html -j report.json,trim adapters from single-end reads and filter reads shorter than 50 bp,qc
fastp,fastp_03,-i R1.fq.gz -I R2.fq.gz -o out_R1.fq.gz -O out_R2.fq.gz -q 20 -l 36 -w 8 -h qc.html -j qc.json,quality trim paired-end reads and set minimum quality to 20,qc
fastp,fastp_04,-i R1.fq.gz -I R2.fq.gz -o out_R1.fq.gz -O out_R2.fq.gz --trim_poly_a -w 8 -h rna_qc.html -j rna_qc.json,run fastp on paired-end RNA-seq data with polyA trimming,qc
fastp,fastp_05,-i R1.fq.gz -I R2.fq.gz -o /dev/null -O /dev/null --disable_adapter_trimming --disable_quality_filtering -h qc_report.html -j qc_report.json,"quality control only (no trimming, just generate the QC report)",qc
fastp,fastp_06,-i R1.fastq.gz -I R2.fastq.gz -o clean_R1.fastq.gz -O clean_R2.fastq.gz -h report.html -j report.json -w 8 --verbose,quality trim and filter paired-end FASTQ reads with 8 threads with verbose output,qc
fastp,fastp_07,-i reads.fastq.gz -o clean_reads.fastq.gz -l 50 -h report.html -j report.json,trim adapters from single-end reads and filter reads shorter than 50 bp using multiple threads,qc
fastp,fastp_08,-i R1.fq.gz -I R2.fq.gz -o out_R1.fq.gz -O out_R2.fq.gz -q 20 -l 36 -w 8 -h qc.html -j qc.json,quality trim paired-end reads and set minimum quality to 20 and write output to a file,qc
fastp,fastp_09,-i R1.fq.gz -I R2.fq.gz -o out_R1.fq.gz -O out_R2.fq.gz --trim_poly_a -w 8 -h rna_qc.html -j rna_qc.json --quiet,run fastp on paired-end RNA-seq data with polyA trimming in quiet mode,qc
fastp,fastp_10,-i R1.fq.gz -I R2.fq.gz -o /dev/null -O /dev/null --disable_adapter_trimming --disable_quality_filtering -h qc_report.html -j qc_report.json,"quality control only (no trimming, just generate the QC report) with default parameters",qc
fastq-screen,fastq-screen_01,--conf fastq_screen.conf --outdir results/ --threads 8 sample_R1.fastq.gz,screen a FASTQ file against default databases,qc
fastq-screen,fastq-screen_02,--conf fastq_screen.conf --subset 0 --outdir results/ --threads 8 sample_R1.fastq.gz,screen all reads (no subsampling) for thorough contamination check,qc
fastq-screen,fastq-screen_03,--conf fastq_screen.conf --aligner bismark --paired --outdir results/ --threads 8 R1.fastq.gz R2.fastq.gz,screen paired-end reads and report bisulfite alignment stats,qc
fastq-screen,fastq-screen_04,--conf fastq_screen.conf --no_html --outdir results/ --threads 8 sample.fastq.gz,screen reads and get only the table output without generating plots,qc
fastq-screen,fastq-screen_05,--conf custom_screen.conf --outdir results/ --threads 8 sample_R1.fastq.gz,add a custom database to the config and screen for mycoplasma contamination,qc
fastq-screen,fastq-screen_06,for f in *.fastq.gz; do fastq_screen --conf fastq_screen.conf --outdir screen_results/ --threads 4 $f; done && multiqc screen_results/ -o multiqc_report/,screen multiple samples in a loop and collect MultiQC report,qc
fastq-screen,fastq-screen_07,--conf fastq_screen.conf --outdir results/ --threads 8 sample_R1.fastq.gz,screen a FASTQ file against default databases using multiple threads,qc
fastq-screen,fastq-screen_08,--conf fastq_screen.conf --subset 0 --outdir results/ --threads 8 sample_R1.fastq.gz -o output.txt,screen all reads (no subsampling) for thorough contamination check and write output to a file,qc
fastq-screen,fastq-screen_09,--conf fastq_screen.conf --aligner bismark --paired --outdir results/ --threads 8 R1.fastq.gz R2.fastq.gz --quiet,screen paired-end reads and report bisulfite alignment stats in quiet mode,qc
fastq-screen,fastq-screen_10,--conf fastq_screen.conf --no_html --outdir results/ --threads 8 sample.fastq.gz,screen reads and get only the table output without generating plots with default parameters,qc
fastqc,fastqc_01,reads.fastq.gz -o qc_results/,run quality control on a single FASTQ file,qc
fastqc,fastqc_02,-t 4 -o qc_results/ R1.fastq.gz R2.fastq.gz,run quality control on paired-end FASTQ files using 4 threads,qc
fastqc,fastqc_03,--noextract -t 8 -o qc_output/ sample1_R1.fastq.gz sample1_R2.fastq.gz sample2_R1.fastq.gz sample2_R2.fastq.gz,run quality control on multiple samples and keep zip files without extracting,qc
fastqc,fastqc_04,-t 4 -o qc_results/ aligned.bam,run fastqc on a BAM file,qc
fastqc,fastqc_05,-f fastq -a adapters.txt -t 4 -o qc_results/ reads.fastq.gz,run fastqc with custom adapter sequences and format specification,qc
fastqc,fastqc_06,reads.fastq.gz -o qc_results/ --verbose,run quality control on a single FASTQ file with verbose output,qc
fastqc,fastqc_07,-t 4 -o qc_results/ R1.fastq.gz R2.fastq.gz,run quality control on paired-end FASTQ files using 4 threads using multiple threads,qc
fastqc,fastqc_08,--noextract -t 8 -o qc_output/ sample1_R1.fastq.gz sample1_R2.fastq.gz sample2_R1.fastq.gz sample2_R2.fastq.gz,run quality control on multiple samples and keep zip files without extracting and write output to a file,qc
fastqc,fastqc_09,-t 4 -o qc_results/ aligned.bam --quiet,run fastqc on a BAM file in quiet mode,qc
fastqc,fastqc_10,-f fastq -a adapters.txt -t 4 -o qc_results/ reads.fastq.gz,run fastqc with custom adapter sequences and format specification with default parameters,qc
fasttree,fasttree_01,-nt -gtr aligned_sequences.fasta > nucleotide_tree.nwk,infer phylogenetic tree from nucleotide alignment,phylogenetics
fasttree,fasttree_02,aligned_proteins.fasta > protein_tree.nwk,infer phylogenetic tree from protein alignment,phylogenetics
fasttree,fasttree_03,-wag aligned_proteins.fasta > wag_tree.nwk,infer tree with WAG protein substitution model,phylogenetics
fasttree,fasttree_04,-nt -gtr -boot 1000 -seed 42 aligned_sequences.fasta > tree_with_support.nwk,infer tree with local support values,phylogenetics
fasttree,fasttree_05,-nt -gtr aligned_sequences.fasta > tree.nwk,infer tree using multithreaded FastTreeMP,phylogenetics
fasttree,fasttree_06,-lg aligned_proteins.fasta > lg_tree.nwk,infer protein tree with LG substitution model,phylogenetics
fasttree,fasttree_07,-nt -gtr -fastest aligned_sequences.fasta > fast_tree.nwk,run faster but less thorough tree search,phylogenetics
fasttree,fasttree_08,-nt -gtr -gamma aligned_sequences.fasta > gamma_tree.nwk,infer tree with gamma-distributed rate variation,phylogenetics
fasttree,fasttree_09,-nt -gtr -n 1 alignment.phy > phylip_tree.nwk,infer tree from PHYLIP format input,phylogenetics
fasttree,fasttree_10,-nt -gtr -slownni aligned_sequences.fasta > thorough_tree.nwk,infer tree with more thorough nearest-neighbor interchange search,phylogenetics
featurecounts,featurecounts_01,-T 8 -a genes.gtf -o counts.txt -p -s 2 sample1.bam sample2.bam sample3.bam,count reads per gene for paired-end RNA-seq with reverse-strand library,rna-seq
featurecounts,featurecounts_02,-T 8 -a genes.gtf -o counts.txt -s 0 sample.bam,count reads per gene for unstranded single-end RNA-seq,rna-seq
featurecounts,featurecounts_03,-T 8 -a genes.gtf -o counts.txt -p -s 2 --primary -M -O sample.bam,count reads allowing multi-mapping reads to be counted,rna-seq
featurecounts,featurecounts_04,-T 4 -a peaks.saf -F SAF -o chip_counts.txt sample_sorted.bam,count ChIP-seq reads per peak region using BED file,rna-seq
featurecounts,featurecounts_05,-T 8 -f -a genes.gtf -o exon_counts.txt -p -s 2 sample.bam,count exon-level reads for exon usage analysis,rna-seq
featurecounts,featurecounts_06,-T 8 -a genes.gtf -o counts.txt -p -s 2 sample1.bam sample2.bam sample3.bam --verbose,count reads per gene for paired-end RNA-seq with reverse-strand library with verbose output,rna-seq
featurecounts,featurecounts_07,-T 8 -a genes.gtf -o counts.txt -s 0 sample.bam -t 4,count reads per gene for unstranded single-end RNA-seq using multiple threads,rna-seq
featurecounts,featurecounts_08,-T 8 -a genes.gtf -o counts.txt -p -s 2 --primary -M -O sample.bam,count reads allowing multi-mapping reads to be counted and write output to a file,rna-seq
featurecounts,featurecounts_09,-T 4 -a peaks.saf -F SAF -o chip_counts.txt sample_sorted.bam --quiet,count ChIP-seq reads per peak region using BED file in quiet mode,rna-seq
featurecounts,featurecounts_10,-T 8 -f -a genes.gtf -o exon_counts.txt -p -s 2 sample.bam,count exon-level reads for exon usage analysis with default parameters,rna-seq
find,find_01,. -type f -size +100M,find all files larger than 100 MB in the current directory tree,filesystem
find,find_02,. -name '*.py' -mtime -7,find all Python files modified in the last 7 days,filesystem
find,find_03,/tmp -name '*.tmp' -type f -delete,find and delete all .tmp files in a directory tree,filesystem
find,find_04,. -iname 'readme*',find files by name case-insensitively,filesystem
find,find_05,. -maxdepth 1 -type d,find all directories in the current directory (depth 1 only),filesystem
find,find_06,. -empty,find empty files and directories,filesystem
find,find_07,. -name '*.log' -exec gzip {} \;,find files and execute a command on each match,filesystem
find,find_08,/home -user alice -type f,find files owned by a specific user,filesystem
find,find_09,. -type f -newer reference_file.txt,find recently modified files and sort by modification time,filesystem
find,find_10,. -type f -perm /o+w,find files with specific permissions,filesystem
flye,flye_01,--nano-raw reads.fastq.gz --genome-size 5m --out-dir flye_output/ --threads 16,assemble bacterial genome from Oxford Nanopore reads,assembly
flye,flye_02,--pacbio-hifi hifi_reads.fastq.gz --genome-size 3g --out-dir hifi_assembly/ --threads 32,assemble genome from PacBio HiFi reads,assembly
flye,flye_03,--meta --nano-raw meta_reads.fastq.gz --out-dir meta_flye/ --threads 32,assemble metagenomic community from ONT reads,assembly
flye,flye_04,--nano-hq hq_reads.fastq.gz --genome-size 4.5m --out-dir hq_assembly/ --threads 16 --iterations 2,"assemble with high-quality ONT reads (R10, Q20+)",assembly
flye,flye_05,--nano-raw reads.fastq.gz --genome-size 5m --out-dir flye_output/ --threads 16 --resume,resume an interrupted Flye assembly,assembly
flye,flye_06,--nano-raw reads.fastq.gz --genome-size 5m --out-dir flye_output/ --threads 16 --verbose,assemble bacterial genome from Oxford Nanopore reads with verbose output,assembly
flye,flye_07,--pacbio-hifi hifi_reads.fastq.gz --genome-size 3g --out-dir hifi_assembly/ --threads 32,assemble genome from PacBio HiFi reads using multiple threads,assembly
flye,flye_08,--meta --nano-raw meta_reads.fastq.gz --out-dir meta_flye/ --threads 32 -o output.txt,assemble metagenomic community from ONT reads and write output to a file,assembly
flye,flye_09,--nano-hq hq_reads.fastq.gz --genome-size 4.5m --out-dir hq_assembly/ --threads 16 --iterations 2 --quiet,"assemble with high-quality ONT reads (R10, Q20+) in quiet mode",assembly
flye,flye_10,--nano-raw reads.fastq.gz --genome-size 5m --out-dir flye_output/ --threads 16 --resume,resume an interrupted Flye assembly with default parameters,assembly
freebayes,freebayes_01,-f reference.fa -b sample.bam > variants.vcf,call germline variants from a single sample BAM file,variant-calling
freebayes,freebayes_02,-f reference.fa --min-alternate-count 3 --min-alternate-fraction 0.2 -b sample.bam > filtered_variants.vcf,call variants with minimum coverage and allele frequency filters,variant-calling
freebayes,freebayes_03,-f reference.fa sample1.bam sample2.bam sample3.bam > cohort_variants.vcf,call variants jointly from multiple samples,variant-calling
freebayes,freebayes_04,-f reference.fa -r chr1 -b sample.bam > chr1_variants.vcf,call variants restricted to a specific genomic region,variant-calling
freebayes,freebayes_05,-f reference.fa --variant-input known_variants.vcf --only-use-input-alleles -b sample.bam > genotyped.vcf,call variants with population priors from a VCF,variant-calling
freebayes,freebayes_06,-f reference.fa -b sample.bam > variants.vcf,call germline variants from a single sample BAM file,variant-calling
freebayes,freebayes_07,-f reference.fa --min-alternate-count 3 --min-alternate-fraction 0.2 -b sample.bam > filtered_variants.vcf,call variants with minimum coverage and allele frequency filters,variant-calling
freebayes,freebayes_08,-f reference.fa sample1.bam sample2.bam sample3.bam > cohort_variants.vcf,call variants jointly from multiple samples,variant-calling
freebayes,freebayes_09,-f reference.fa -r chr1 -b sample.bam > chr1_variants.vcf,call variants restricted to a specific genomic region,variant-calling
freebayes,freebayes_10,-f reference.fa --variant-input known_variants.vcf --only-use-input-alleles -b sample.bam > genotyped.vcf,call variants with population priors from a VCF with default parameters,variant-calling
gatk,gatk_01,HaplotypeCaller -R reference.fa -I sorted_markdup.bam -O output.g.vcf.gz -ERC GVCF,call germline variants from a BAM file using HaplotypeCaller,variant-calling
gatk,gatk_02,HaplotypeCaller -R reference.fa -I sorted_markdup.bam -O variants.vcf.gz,genotype a single sample directly (not GVCF mode),variant-calling
gatk,gatk_03,MarkDuplicates -I input.bam -O markdup.bam -M metrics.txt,mark PCR duplicates in a BAM file,variant-calling
gatk,gatk_04,Mutect2 -R reference.fa -I tumor.bam -I normal.bam -normal normal_sample_name -O somatic.vcf.gz,call somatic mutations with Mutect2 using matched normal,variant-calling
gatk,gatk_05,FilterMutectCalls -R reference.fa -V somatic.vcf.gz -O filtered_somatic.vcf.gz,filter Mutect2 variants with FilterMutectCalls,variant-calling
gatk,gatk_06,CreateSequenceDictionary -R reference.fa,create a sequence dictionary for a reference FASTA,variant-calling
gatk,gatk_07,AddOrReplaceReadGroups -I input.bam -O output_rg.bam -RGID sample1 -RGLB lib1 -RGPL ILLUMINA -RGPU unit1 -RGSM sample1,add read group to a BAM file (required before GATK variant calling),variant-calling
gatk,gatk_08,BaseRecalibrator -R hg38.fa -I markdup.bam --known-sites dbsnp.vcf -O recal.table,perform base quality score recalibration (BQSR step 1) on markdup BAM with hg38 reference and dbSNP known sites,variant-calling
gatk,gatk_09,ApplyBQSR -R hg38.fa -I markdup.bam --bqsr-recal-file recal.table -O recal.bam,apply base quality score recalibration (BQSR step 2) to produce recalibrated BAM,variant-calling
gatk,gatk_10,SelectVariants -V variants.vcf -O SNPs.vcf --select-type-to-include SNP,select only SNPs from a variants VCF,variant-calling
git,git_01,clone --depth 1 --branch main https://github.com/user/repo.git,clone a repository with shallow history (last commit only) on a specific branch,version-control
git,git_02,"commit -a -m ""fix: resolve null pointer in parser""",stage all changes and commit with a message,version-control
git,git_03,push -u origin main,push the current branch to origin and set upstream tracking,version-control
git,git_04,checkout -b feature/new-api,create and switch to a new branch,version-control
git,git_05,log --oneline --graph --decorate --all,view the commit log with one-line summaries and branch graph,version-control
git,git_06,diff HEAD,show unstaged and staged changes,version-control
git,git_07,"stash push -m ""WIP: experiment with new feature""",stash current working tree changes to switch branches cleanly,version-control
git,git_08,rebase origin/main,rebase current branch onto main to update with upstream changes,version-control
git,git_09,rm --cached secrets.env,stop tracking a file without deleting it from disk,version-control
git,git_10,pull --rebase origin main,pull latest changes from remote and rebase local commits on top,version-control
grep,grep_01,"-in ""error"" application.log","search for a keyword in a file, ignoring case, with line numbers",text-processing
grep,grep_02,"-rn ""def connect"" --include='*.py' src/",recursively search all Python files for a function definition,text-processing
grep,grep_03,"-C 3 ""NullPointerException"" error.log",show context lines around each match,text-processing
grep,grep_04,"-c ""^ERROR"" server.log",count the number of matching lines in a file,text-processing
grep,grep_05,"-E ""(error|warning|fatal)"" app.log",search for multiple patterns using extended regex,text-processing
grep,grep_06,"-rl ""TODO"" src/",find files containing a pattern (list filenames only),text-processing
grep,grep_07,"-v ""^#"" config.ini",invert match: show lines that do NOT contain a pattern,text-processing
grep,grep_08,"-oE ""[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+"" access.log",extract only the matching part of each line,text-processing
grep,grep_09,"-Hn ""import"" *.py",search in multiple files and show filename with each match,text-processing
grep,grep_10,"-F ""error[0]"" debug.log",search for a fixed string (no regex interpretation),text-processing
gtdbtk,gtdbtk_01,classify_wf --genome_dir bins/ --out_dir gtdbtk_output/ --cpus 32 --extension fa,classify a directory of genome bins with GTDB-Tk,metagenomics
gtdbtk,gtdbtk_02,classify_wf --genome_dir bins/ --out_dir gtdbtk_output/ --cpus 32 --extension fasta --skip_ani_screen,classify genomes with custom GTDB database path,metagenomics
gtdbtk,gtdbtk_03,identify --genome_dir bins/ --out_dir gtdbtk_identify/ --cpus 16 --extension fa,run only the identification step (marker gene identification),metagenomics
gtdbtk,gtdbtk_04,classify_wf --genome_dir bins/ --out_dir gtdbtk_output/ --cpus 32 --extension fa --quiet,classify a directory of genome bins with GTDB-Tk in quiet mode,metagenomics
gtdbtk,gtdbtk_05,classify_wf --genome_dir bins/ --out_dir gtdbtk_output/ --cpus 32 --extension fasta --skip_ani_screen,classify genomes with custom GTDB database path with default parameters,metagenomics
gtdbtk,gtdbtk_06,identify --genome_dir bins/ --out_dir gtdbtk_identify/ --cpus 16 --extension fa --verbose,run only the identification step (marker gene identification) with verbose output,metagenomics
gtdbtk,gtdbtk_07,classify_wf --genome_dir bins/ --out_dir gtdbtk_output/ --cpus 32 --extension fa -t 4,classify a directory of genome bins with GTDB-Tk using multiple threads,metagenomics
gtdbtk,gtdbtk_08,classify_wf --genome_dir bins/ --out_dir gtdbtk_output/ --cpus 32 --extension fasta --skip_ani_screen -o output.txt,classify genomes with custom GTDB database path and write output to a file,metagenomics
gtdbtk,gtdbtk_09,identify --genome_dir bins/ --out_dir gtdbtk_identify/ --cpus 16 --extension fa --quiet,run only the identification step (marker gene identification) in quiet mode,metagenomics
gtdbtk,gtdbtk_10,classify_wf --genome_dir bins/ --out_dir gtdbtk_output/ --cpus 32 --extension fa,classify a directory of genome bins with GTDB-Tk with default parameters,metagenomics
hap_py,hap_py_01,-r reference.fa GIAB_truth.vcf.gz query_calls.vcf.gz -o benchmark_results --engine vcfeval -f HG001_highconf.bed --threads 8,benchmark a variant caller VCF against GIAB truth set,variant-calling
hap_py,hap_py_02,-r reference.fa truth.vcf.gz query.vcf.gz -o results --engine vcfeval -f confident.bed --report-prefix detailed_report --threads 8,benchmark with Stratification for SNPs and indels separately,variant-calling
hap_py,hap_py_03,-r reference.fa GIAB_truth.vcf.gz query_calls.vcf.gz -o benchmark_results --engine vcfeval -f HG001_highconf.bed --threads 8,benchmark a variant caller VCF against GIAB truth set and write output to a file,variant-calling
hap_py,hap_py_04,-r reference.fa truth.vcf.gz query.vcf.gz -o results --engine vcfeval -f confident.bed --report-prefix detailed_report --threads 8 --quiet,benchmark with Stratification for SNPs and indels separately in quiet mode,variant-calling
hap_py,hap_py_05,-r reference.fa GIAB_truth.vcf.gz query_calls.vcf.gz -o benchmark_results --engine vcfeval -f HG001_highconf.bed --threads 8,benchmark a variant caller VCF against GIAB truth set with default parameters,variant-calling
hap_py,hap_py_06,-r reference.fa truth.vcf.gz query.vcf.gz -o results --engine vcfeval -f confident.bed --report-prefix detailed_report --threads 8 --verbose,benchmark with Stratification for SNPs and indels separately with verbose output,variant-calling
hap_py,hap_py_07,-r reference.fa GIAB_truth.vcf.gz query_calls.vcf.gz -o benchmark_results --engine vcfeval -f HG001_highconf.bed --threads 8,benchmark a variant caller VCF against GIAB truth set using multiple threads,variant-calling
hap_py,hap_py_08,-r reference.fa truth.vcf.gz query.vcf.gz -o results --engine vcfeval -f confident.bed --report-prefix detailed_report --threads 8,benchmark with Stratification for SNPs and indels separately and write output to a file,variant-calling
hap_py,hap_py_09,-r reference.fa GIAB_truth.vcf.gz query_calls.vcf.gz -o benchmark_results --engine vcfeval -f HG001_highconf.bed --threads 8 --quiet,benchmark a variant caller VCF against GIAB truth set in quiet mode,variant-calling
hap_py,hap_py_10,-r reference.fa truth.vcf.gz query.vcf.gz -o results --engine vcfeval -f confident.bed --report-prefix detailed_report --threads 8,benchmark with Stratification for SNPs and indels separately with default parameters,variant-calling
hifiasm,hifiasm_01,-o assembly -t 32 hifi_reads.fastq.gz,assemble genome from PacBio HiFi reads,assembly
hifiasm,hifiasm_02,-o phased_assembly -t 32 --h1 hic_R1.fastq.gz --h2 hic_R2.fastq.gz hifi_reads.fastq.gz,haplotype-resolved assembly with Hi-C phasing data,assembly
hifiasm,hifiasm_03,-o assembly -t 32 --n-hap 4 hifi_reads.fastq.gz,assemble genome with custom number of haplotype rounds,assembly
hifiasm,hifiasm_04,-o assembly -t 32 --ul ultralong_reads.fastq.gz hifi_reads.fastq.gz,assemble with ultra-long ONT reads for improved scaffolding,assembly
hifiasm,hifiasm_05,-o assembly -t 32 -l 3 hifi_reads.fastq.gz,assemble with aggressive duplicate purging,assembly
hifiasm,hifiasm_06,-o assembly -t 32 hifi_reads.fastq.gz --verbose,assemble genome from PacBio HiFi reads with verbose output,assembly
hifiasm,hifiasm_07,-o phased_assembly -t 32 --h1 hic_R1.fastq.gz --h2 hic_R2.fastq.gz hifi_reads.fastq.gz,haplotype-resolved assembly with Hi-C phasing data using multiple threads,assembly
hifiasm,hifiasm_08,-o assembly -t 32 --n-hap 4 hifi_reads.fastq.gz,assemble genome with custom number of haplotype rounds and write output to a file,assembly
hifiasm,hifiasm_09,-o assembly -t 32 --ul ultralong_reads.fastq.gz hifi_reads.fastq.gz --quiet,assemble with ultra-long ONT reads for improved scaffolding in quiet mode,assembly
hifiasm,hifiasm_10,-o assembly -t 32 -l 3 hifi_reads.fastq.gz,assemble with aggressive duplicate purging with default parameters,assembly
hisat2,hisat2_01,hisat2-build -p 8 genome.fa genome_index,build a HISAT2 genome index from a reference FASTA,alignment
hisat2,hisat2_02,-p 8 -x genome_index -1 R1.fastq.gz -2 R2.fastq.gz --dta -S aligned.sam,align paired-end RNA-seq reads to the genome with 8 threads,alignment
hisat2,hisat2_03,-p 8 -x genome_index -1 R1.fastq.gz -2 R2.fastq.gz --dta | samtools sort -@ 4 -o sorted.bam,align paired-end RNA-seq reads and output sorted BAM directly,alignment
hisat2,hisat2_04,-p 8 -x genome_index -1 R1.fastq.gz -2 R2.fastq.gz --rna-strandness RF --dta -S aligned.sam,align strand-specific paired-end RNA-seq (reverse-strand library),alignment
hisat2,hisat2_05,-p 4 -x genome_index -U reads.fastq.gz --dta -S aligned.sam,align single-end RNA-seq reads,alignment
hisat2,hisat2_06,hisat2-build -p 8 genome.fa genome_spliceaware_index --ss splice_sites.txt --exon exons.txt,build splice-site aware index using GTF annotation for improved RNA-seq,alignment
hisat2,hisat2_07,-p 8 -x genome_index -1 R1.fastq.gz -2 R2.fastq.gz --rna-strandness RF --dta -S aligned.sam 2> align_summary.txt,align paired-end reads with strand information and save alignment statistics,alignment
hisat2,hisat2_08,-p 8 -x genome_index -U reads.fastq.gz --no-spliced-alignment -S aligned.sam,align single-end reads in genomic (non-spliced) mode for DNA-seq,alignment
hisat2,hisat2_09,-p 8 -x genome_index -1 R1.fastq.gz -2 R2.fastq.gz --dta --no-unal -S aligned.sam,align paired-end reads and discard unmapped reads,alignment
hisat2,hisat2_10,-p 8 -x genome_index -1 R1.fastq.gz -2 R2.fastq.gz --dta | samtools view -b -q 1 -o unique_aligned.bam,align paired-end reads and output only uniquely mapped reads,alignment
hmmer,hmmer_01,hmmscan --cpu 8 --tblout pfam_hits.tbl --domtblout pfam_domains.tbl -E 1e-5 Pfam-A.hmm proteins.faa > pfam_output.txt,search a protein database against Pfam HMM profiles (domain annotation),sequence-utilities
hmmer,hmmer_02,hmmsearch --cpu 8 --tblout hits.tbl --domtblout domain_hits.tbl -E 1e-10 gene_family.hmm sequences.faa > hmmsearch_out.txt,search a protein HMM profile against a sequence database,sequence-utilities
hmmer,hmmer_03,hmmbuild --cpu 8 gene_family.hmm aligned_sequences.sto,build a profile HMM from a multiple sequence alignment,sequence-utilities
hmmer,hmmer_04,hmmpress Pfam-A.hmm,press Pfam database for hmmscan indexing,sequence-utilities
hmmer,hmmer_05,phmmer --cpu 8 --tblout phmmer_hits.tbl -E 1e-5 query_protein.faa target_database.faa > phmmer_out.txt,search proteins with phmmer (BLAST-like single sequence query),sequence-utilities
hmmer,hmmer_06,hmmscan --cpu 8 --tblout pfam_hits.tbl --domtblout pfam_domains.tbl -E 1e-5 Pfam-A.hmm proteins.faa > pfam_output.txt,search a protein database against Pfam HMM profiles (domain annotation),sequence-utilities
hmmer,hmmer_07,hmmsearch --cpu 8 --tblout hits.tbl --domtblout domain_hits.tbl -E 1e-10 gene_family.hmm sequences.faa > hmmsearch_out.txt,search a protein HMM profile against a sequence database,sequence-utilities
hmmer,hmmer_08,hmmbuild --cpu 8 gene_family.hmm aligned_sequences.sto -o output.txt,build a profile HMM from a multiple sequence alignment and write output to a file,sequence-utilities
hmmer,hmmer_09,hmmpress Pfam-A.hmm --quiet,press Pfam database for hmmscan indexing in quiet mode,sequence-utilities
hmmer,hmmer_10,phmmer --cpu 8 --tblout phmmer_hits.tbl -E 1e-5 query_protein.faa target_database.faa > phmmer_out.txt,search proteins with phmmer (BLAST-like single sequence query) with default parameters,sequence-utilities
homer,homer_01,makeTagDirectory chipseq_tags/ sample.bam -genome hg38 -checkGC,create a HOMER tag directory from a BAM file,epigenomics
homer,homer_02,findPeaks chipseq_tags/ -style factor -i input_tags/ -o peaks.txt,call narrow transcription factor peaks with an input control,epigenomics
homer,homer_03,findPeaks chipseq_tags/ -style histone -i input_tags/ -o broad_peaks.txt,"call broad histone modification peaks (e.g., H3K27me3)",epigenomics
homer,homer_04,annotatePeaks.pl peaks.txt hg38 -gtf genes.gtf > annotated_peaks.txt,annotate peaks with genomic features using hg38 RefSeq annotation,epigenomics
homer,homer_05,findMotifsGenome.pl peaks.txt hg38 motif_output/ -size 200 -mask -p 8,run de novo and known motif analysis on ChIP-seq peaks,epigenomics
homer,homer_06,mergePeaks rep1_peaks.txt rep2_peaks.txt -d 100 -prefix merged_peaks -venn venn.txt,merge peak files from two ChIP-seq replicates,epigenomics
homer,homer_07,pos2bed.pl peaks.txt > peaks.bed,convert HOMER peak file to BED format,epigenomics
homer,homer_08,makeTagDirectory chipseq_tags/ sample.bam -genome hg38 -checkGC -o output.txt,create a HOMER tag directory from a BAM file and write output to a file,epigenomics
homer,homer_09,findPeaks chipseq_tags/ -style factor -i input_tags/ -o peaks.txt --quiet,call narrow transcription factor peaks with an input control in quiet mode,epigenomics
homer,homer_10,findPeaks chipseq_tags/ -style histone -i input_tags/ -o broad_peaks.txt,"call broad histone modification peaks (e.g., H3K27me3) with default parameters",epigenomics
igvtools,igvtools_01,count -z 5 -w 25 sorted.bam coverage.tdf hg38,create coverage TDF track from BAM file,utilities
igvtools,igvtools_02,index variants.vcf,index a VCF file for IGV,utilities
igvtools,igvtools_03,sort input.bed sorted.bed,sort a BED file for IGV indexing,utilities
igvtools,igvtools_04,count -z 5 -w 25 sorted.bam coverage.tdf hg38 --quiet,create coverage TDF track from BAM file in quiet mode,utilities
igvtools,igvtools_05,index variants.vcf,index a VCF file for IGV with default parameters,utilities
igvtools,igvtools_06,sort input.bed sorted.bed --verbose,sort a BED file for IGV indexing with verbose output,utilities
igvtools,igvtools_07,count -z 5 -w 25 sorted.bam coverage.tdf hg38 -t 4,create coverage TDF track from BAM file using multiple threads,utilities
igvtools,igvtools_08,index variants.vcf -o output.txt,index a VCF file for IGV and write output to a file,utilities
igvtools,igvtools_09,sort input.bed sorted.bed --quiet,sort a BED file for IGV indexing in quiet mode,utilities
igvtools,igvtools_10,count -z 5 -w 25 sorted.bam coverage.tdf hg38,create coverage TDF track from BAM file with default parameters,utilities
iqtree2,iqtree2_01,-s alignment.fasta -m MFP --prefix my_tree -T AUTO,infer maximum-likelihood tree with automatic model selection,phylogenetics
iqtree2,iqtree2_02,-s alignment.fasta -m MFP -B 1000 --bnni --prefix bootstrap_tree -T 8,infer tree with ultrafast bootstrap and model selection,phylogenetics
iqtree2,iqtree2_03,-s protein_alignment.fasta -st AA -m TEST -B 1000 --bnni --prefix protein_tree -T 8,infer phylogenetic tree for protein sequences,phylogenetics
iqtree2,iqtree2_04,-s alignment.fasta -m MFP -B 1000 --prefix main_tree -T 8 --gcf gene_trees.txt --scfl 100,infer concordance factor analysis for assessing gene tree discordance,phylogenetics
iqtree2,iqtree2_05,-s alignment.fasta -m MFP -b 100 -o outgroup_taxon --prefix rooted_tree -T 8,infer tree with standard bootstrap and specified outgroup,phylogenetics
iqtree2,iqtree2_06,-s alignment.fasta -m MFP --prefix my_tree -T AUTO --verbose,infer maximum-likelihood tree with automatic model selection with verbose output,phylogenetics
iqtree2,iqtree2_07,-s alignment.fasta -m MFP -B 1000 --bnni --prefix bootstrap_tree -T 8 -t 4,infer tree with ultrafast bootstrap and model selection using multiple threads,phylogenetics
iqtree2,iqtree2_08,-s protein_alignment.fasta -st AA -m TEST -B 1000 --bnni --prefix protein_tree -T 8 -o output.txt,infer phylogenetic tree for protein sequences and write output to a file,phylogenetics
iqtree2,iqtree2_09,-s alignment.fasta -m MFP -B 1000 --prefix main_tree -T 8 --gcf gene_trees.txt --scfl 100 --quiet,infer concordance factor analysis for assessing gene tree discordance in quiet mode,phylogenetics
iqtree2,iqtree2_10,-s alignment.fasta -m MFP -b 100 -o outgroup_taxon --prefix rooted_tree -T 8,infer tree with standard bootstrap and specified outgroup with default parameters,phylogenetics
java,java_01,-version,check installed Java version,programming
java,java_02,-Xmx16g -jar picard.jar SortSam I=input.bam O=sorted.bam SORT_ORDER=coordinate,run a JAR-based tool with increased heap memory,programming
java,java_03,-Xmx8g -XX:+UseG1GC -Djava.io.tmpdir=/scratch/tmp -jar gatk.jar HaplotypeCaller -R ref.fa -I input.bam -O out.vcf,run GATK with custom tmp directory and GC settings,programming
java,java_04,-Xmx2g -jar fastqc.jar --threads 4 sample.fastq.gz,run FastQC via its JAR directly,programming
java,java_05,-XshowSettings:all -version,show all system properties and JVM settings,programming
java,java_06,-XX:+PrintFlagsFinal -version,list available JVM garbage collectors and tuning flags,programming
java,java_07,-Xmx4g -jar trimmomatic.jar PE -threads 8 R1.fastq.gz R2.fastq.gz R1_trimmed.fastq.gz R1_unpaired.fastq.gz R2_trimmed.fastq.gz R2_unpaired.fastq.gz ILLUMINACLIP:adapters.fa:2:30:10,run Trimmomatic via its JAR,programming
java,java_08,-XX:+PrintFlagsFinal -version 2>&1,check available JVM memory settings,programming
java,java_09,-cp /path/to/lib1.jar:/path/to/lib2.jar com.example.MainClass arg1 arg2,run a JAR with a custom classpath,programming
java,java_10,-version,check installed Java version with default parameters,programming
julia,julia_01,script.jl,run a Julia script,programming
julia,julia_02,--project=. script.jl,run a script in a specific project environment,programming
julia,julia_03,--threads auto script.jl,run a script with multiple threads,programming
julia,julia_04,"-e 'using Pkg; Pkg.add(""BioSequences"")'",install a package from the Julia REPL (batch mode),programming
julia,julia_05,-e 'using Pkg; Pkg.status()',show installed packages in the current environment,programming
julia,julia_06,"-e 'using Pkg; Pkg.add([""BioSequences"",""FASTX"",""GenomicFeatures""])'",add BioJulia packages for bioinformatics,programming
julia,julia_07,--startup-file=no --project=. script.jl,run script without loading startup.jl (for CI/pipelines),programming
julia,julia_08,-e 'println(VERSION); println(DEPOT_PATH)',check Julia version and depot paths,programming
julia,julia_09,--compile=all -O2 script.jl,compile a script ahead of time to reduce startup latency,programming
julia,julia_10,-e 'import Pluto; Pluto.run(port=1234)',run a Pluto notebook server on a specific port,programming
kallisto,kallisto_01,index -i transcriptome.idx transcriptome.fa,build a kallisto index from a transcriptome FASTA,rna-seq
kallisto,kallisto_02,quant -i transcriptome.idx -o sample_output -b 100 --threads 8 R1.fastq.gz R2.fastq.gz,quantify paired-end RNA-seq reads,rna-seq
kallisto,kallisto_03,quant -i transcriptome.idx -o sample_output --single -l 200 -s 20 -b 100 --threads 8 reads.fastq.gz,quantify single-end RNA-seq reads with fragment length parameters,rna-seq
kallisto,kallisto_04,quant -i transcriptome.idx -o sample_output --rf-stranded -b 100 --threads 8 R1.fastq.gz R2.fastq.gz,quantify strand-specific reverse-strand paired-end RNA-seq,rna-seq
kallisto,kallisto_05,quant -i transcriptome.idx -o sample1_out -b 50 --threads 4 sample1_R1.fq.gz sample1_R2.fq.gz,quantify multiple samples in batch,rna-seq
kallisto,kallisto_06,index -i transcriptome.idx transcriptome.fa --verbose,build a kallisto index from a transcriptome FASTA with verbose output,rna-seq
kallisto,kallisto_07,quant -i transcriptome.idx -o sample_output -b 100 --threads 8 R1.fastq.gz R2.fastq.gz,quantify paired-end RNA-seq reads using multiple threads,rna-seq
kallisto,kallisto_08,quant -i transcriptome.idx -o sample_output --single -l 200 -s 20 -b 100 --threads 8 reads.fastq.gz,quantify single-end RNA-seq reads with fragment length parameters and write output to a file,rna-seq
kallisto,kallisto_09,quant -i transcriptome.idx -o sample_output --rf-stranded -b 100 --threads 8 R1.fastq.gz R2.fastq.gz --quiet,quantify strand-specific reverse-strand paired-end RNA-seq in quiet mode,rna-seq
kallisto,kallisto_10,quant -i transcriptome.idx -o sample1_out -b 50 --threads 4 sample1_R1.fq.gz sample1_R2.fq.gz,quantify multiple samples in batch with default parameters,rna-seq
kb,kb_01,ref -i index.idx -g t2g.txt -f1 cdna.fasta genome.fa genes.gtf,build kb reference from genome and GTF,single-cell
kb,kb_02,count -i index.idx -g t2g.txt -x 10xv3 -o output_dir/ -t 16 R1.fastq.gz R2.fastq.gz,process 10x Chromium v3 scRNA-seq FASTQ files,single-cell
kb,kb_03,count -i spliced_unspliced.idx -g t2g.txt -x 10xv3 --workflow lamanno -o velocity_output/ -t 16 R1.fastq.gz R2.fastq.gz,process scRNA-seq with RNA velocity output,single-cell
kb,kb_04,count -i index.idx -g t2g.txt -x 10xv3 --h5ad -o output_dir/ -t 16 R1.fastq.gz R2.fastq.gz,process 10x Chromium v3 and output AnnData for Scanpy,single-cell
kb,kb_05,ref -i index.idx -g t2g.txt -f1 cdna.fasta genome.fa genes.gtf,build kb reference from genome and GTF with default parameters,single-cell
kb,kb_06,count -i index.idx -g t2g.txt -x 10xv3 -o output_dir/ -t 16 R1.fastq.gz R2.fastq.gz --verbose,process 10x Chromium v3 scRNA-seq FASTQ files with verbose output,single-cell
kb,kb_07,count -i spliced_unspliced.idx -g t2g.txt -x 10xv3 --workflow lamanno -o velocity_output/ -t 16 R1.fastq.gz R2.fastq.gz,process scRNA-seq with RNA velocity output using multiple threads,single-cell
kb,kb_08,count -i index.idx -g t2g.txt -x 10xv3 --h5ad -o output_dir/ -t 16 R1.fastq.gz R2.fastq.gz,process 10x Chromium v3 and output AnnData for Scanpy and write output to a file,single-cell
kb,kb_09,ref -i index.idx -g t2g.txt -f1 cdna.fasta genome.fa genes.gtf --quiet,build kb reference from genome and GTF in quiet mode,single-cell
kb,kb_10,count -i index.idx -g t2g.txt -x 10xv3 -o output_dir/ -t 16 R1.fastq.gz R2.fastq.gz,process 10x Chromium v3 scRNA-seq FASTQ files with default parameters,single-cell
kraken2,kraken2_01,--db /path/to/kraken2_db --paired --threads 8 --output kraken_output.txt --report kraken_report.txt R1.fastq.gz R2.fastq.gz,classify paired-end metagenomic reads against the standard database,metagenomics
kraken2,kraken2_02,--db /path/to/kraken2_db --paired --confidence 0.1 --threads 8 --output kraken_out.txt --report kraken_report.txt --unclassified-out unclassified#.fastq R1.fastq.gz R2.fastq.gz,classify reads with confidence threshold and save unclassified reads,metagenomics
kraken2,kraken2_03,--db /path/to/kraken2_db --threads 8 --output kraken_out.txt --report kraken_report.txt reads.fastq.gz,classify single-end reads and generate report,metagenomics
kraken2,kraken2_04,--db /path/to/kraken2_db --paired --threads 8 --output kraken_out.txt --report kraken_report.txt --classified-out classified#.fastq R1.fastq.gz R2.fastq.gz,classify reads and extract classified reads for downstream analysis,metagenomics
kraken2,kraken2_05,--db /path/to/kraken2_db --paired --threads 8 --output kraken_output.txt --report kraken_report.txt R1.fastq.gz R2.fastq.gz,classify paired-end metagenomic reads against the standard database with default parameters,metagenomics
kraken2,kraken2_06,--db /path/to/kraken2_db --paired --confidence 0.1 --threads 8 --output kraken_out.txt --report kraken_report.txt --unclassified-out unclassified#.fastq R1.fastq.gz R2.fastq.gz --verbose,classify reads with confidence threshold and save unclassified reads with verbose output,metagenomics
kraken2,kraken2_07,--db /path/to/kraken2_db --threads 8 --output kraken_out.txt --report kraken_report.txt reads.fastq.gz,classify single-end reads and generate report using multiple threads,metagenomics
kraken2,kraken2_08,--db /path/to/kraken2_db --paired --threads 8 --output kraken_out.txt --report kraken_report.txt --classified-out classified#.fastq R1.fastq.gz R2.fastq.gz,classify reads and extract classified reads for downstream analysis and write output to a file,metagenomics
kraken2,kraken2_09,--db /path/to/kraken2_db --paired --threads 8 --output kraken_output.txt --report kraken_report.txt R1.fastq.gz R2.fastq.gz --quiet,classify paired-end metagenomic reads against the standard database in quiet mode,metagenomics
kraken2,kraken2_10,--db /path/to/kraken2_db --paired --confidence 0.1 --threads 8 --output kraken_out.txt --report kraken_report.txt --unclassified-out unclassified#.fastq R1.fastq.gz R2.fastq.gz,classify reads with confidence threshold and save unclassified reads with default parameters,metagenomics
liftoff,liftoff_01,target.fasta reference.fasta -g reference.gff3 -o lifted.gff3 -u unmapped.txt,lift annotations from reference GFF3 to a new assembly,annotation
liftoff,liftoff_02,target.fasta reference.fasta -g reference.gff3 -o lifted.gff3 -copies -u unmapped.txt,lift annotations and copy multi-copy gene families,annotation
liftoff,liftoff_03,target.fasta reference.fasta -g reference.gff3 -o lifted.gff3 -s 0.85 -a 0.85 -u unmapped.txt,lift annotations between closely related species with lower identity threshold,annotation
liftoff,liftoff_04,target.fasta reference.fasta -db reference.db -o lifted.gff3 -u unmapped.txt,speed up repeated runs using a pre-built gffutils database,annotation
liftoff,liftoff_05,target.fasta reference.fasta -g reference.gff3 -o lifted.gff3 -dir scratch_dir/ -p 16 -u unmapped.txt,lift annotations and write output to a specific directory with minimap2 intermediates,annotation
liftoff,liftoff_06,target.fasta reference.fasta -g reference.gff3 -o lifted.gff3 -f gene -u unmapped.txt,"lift only specific feature types (e.g., just genes)",annotation
liftoff,liftoff_07,target.fasta reference.fasta -g reference.gff3 -o lifted.gff3 -u unmapped.txt -t 4,lift annotations from reference GFF3 to a new assembly using multiple threads,annotation
liftoff,liftoff_08,target.fasta reference.fasta -g reference.gff3 -o lifted.gff3 -copies -u unmapped.txt,lift annotations and copy multi-copy gene families and write output to a file,annotation
liftoff,liftoff_09,target.fasta reference.fasta -g reference.gff3 -o lifted.gff3 -s 0.85 -a 0.85 -u unmapped.txt --quiet,lift annotations between closely related species with lower identity threshold in quiet mode,annotation
liftoff,liftoff_10,target.fasta reference.fasta -db reference.db -o lifted.gff3 -u unmapped.txt,speed up repeated runs using a pre-built gffutils database with default parameters,annotation
longshot,longshot_01,-b sorted.bam -f reference.fa -o snps.vcf,call SNPs from Oxford Nanopore aligned reads,variant-calling
longshot,longshot_02,-b sorted.bam -f reference.fa -o chr1_snps.vcf -r chr1:1000000-2000000,call SNPs restricted to a specific region,variant-calling
longshot,longshot_03,-b sorted.bam -f reference.fa -o snps_filtered.vcf -m 10 -q 20,call SNPs with minimum coverage filter,variant-calling
longshot,longshot_04,-b sorted.bam -f reference.fa -o snps.vcf --quiet,call SNPs from Oxford Nanopore aligned reads in quiet mode,variant-calling
longshot,longshot_05,-b sorted.bam -f reference.fa -o chr1_snps.vcf -r chr1:1000000-2000000,call SNPs restricted to a specific region with default parameters,variant-calling
longshot,longshot_06,-b sorted.bam -f reference.fa -o snps_filtered.vcf -m 10 -q 20 --verbose,call SNPs with minimum coverage filter with verbose output,variant-calling
longshot,longshot_07,-b sorted.bam -f reference.fa -o snps.vcf -t 4,call SNPs from Oxford Nanopore aligned reads using multiple threads,variant-calling
longshot,longshot_08,-b sorted.bam -f reference.fa -o chr1_snps.vcf -r chr1:1000000-2000000,call SNPs restricted to a specific region and write output to a file,variant-calling
longshot,longshot_09,-b sorted.bam -f reference.fa -o snps_filtered.vcf -m 10 -q 20 --quiet,call SNPs with minimum coverage filter in quiet mode,variant-calling
longshot,longshot_10,-b sorted.bam -f reference.fa -o snps.vcf,call SNPs from Oxford Nanopore aligned reads with default parameters,variant-calling
macs2,macs2_01,callpeak -t chip.bam -c input.bam -f BAM -g hs -n sample_chip -q 0.05 --outdir chip_peaks/,call narrow peaks from ChIP-seq data with input control,epigenomics
macs2,macs2_02,callpeak -t h3k27me3.bam -c input.bam -f BAM -g hs --broad --broad-cutoff 0.1 -n h3k27me3 --outdir broad_peaks/,call broad peaks for histone mark (H3K27me3) ChIP-seq,epigenomics
macs2,macs2_03,callpeak -t atac.bam -f BAM -g hs --nomodel --shift -100 --extsize 200 -n atac_sample -q 0.05 --outdir atac_peaks/,call ATAC-seq peaks using nucleosome-free region model,epigenomics
macs2,macs2_04,callpeak -t atac_pe.bam -f BAMPE -g hs -n atac_pe_sample -q 0.05 --outdir atac_pe_peaks/,call peaks from paired-end ATAC-seq BAM,epigenomics
macs2,macs2_05,callpeak -t atac.bam -f BAM -g hs --nomodel --shift -100 --extsize 200 --keep-dup all -n open_chromatin --outdir atac_out/,call peaks without control for ATAC-seq open chromatin,epigenomics
macs2,macs2_06,callpeak -t chip.bam -c input.bam -f BAM -g hs -n sample_chip -q 0.05 --outdir chip_peaks/ --verbose,call narrow peaks from ChIP-seq data with input control with verbose output,epigenomics
macs2,macs2_07,callpeak -t h3k27me3.bam -c input.bam -f BAM -g hs --broad --broad-cutoff 0.1 -n h3k27me3 --outdir broad_peaks/,call broad peaks for histone mark (H3K27me3) ChIP-seq using multiple threads,epigenomics
macs2,macs2_08,callpeak -t atac.bam -f BAM -g hs --nomodel --shift -100 --extsize 200 -n atac_sample -q 0.05 --outdir atac_peaks/ -o output.txt,call ATAC-seq peaks using nucleosome-free region model and write output to a file,epigenomics
macs2,macs2_09,callpeak -t atac_pe.bam -f BAMPE -g hs -n atac_pe_sample -q 0.05 --outdir atac_pe_peaks/ --quiet,call peaks from paired-end ATAC-seq BAM in quiet mode,epigenomics
macs2,macs2_10,callpeak -t atac.bam -f BAM -g hs --nomodel --shift -100 --extsize 200 --keep-dup all -n open_chromatin --outdir atac_out/,call peaks without control for ATAC-seq open chromatin with default parameters,epigenomics
mafft,mafft_01,--auto --thread 8 proteins.fasta > aligned_proteins.fasta,align multiple protein sequences with automatic algorithm selection,phylogenetics
mafft,mafft_02,--localpair --maxiterate 1000 --thread 8 sequences.fasta > aligned_localpair.fasta,highly accurate multiple sequence alignment for fewer than 200 sequences,phylogenetics
mafft,mafft_03,--auto --adjustdirectionaccurately --thread 8 rna_sequences.fasta > aligned_rna.fasta,align RNA sequences adjusting for strand orientation,phylogenetics
mafft,mafft_04,--auto --thread 8 --phylipout sequences.fasta > aligned.phy,align sequences and output in PHYLIP format for phylogenetic analysis,phylogenetics
mafft,mafft_05,--add new_sequences.fasta --thread 8 existing_alignment.fasta > updated_alignment.fasta,add new sequences to existing alignment,phylogenetics
mafft,mafft_06,--auto --thread 8 proteins.fasta > aligned_proteins.fasta,align multiple protein sequences with automatic algorithm selection,phylogenetics
mafft,mafft_07,--localpair --maxiterate 1000 --thread 8 sequences.fasta > aligned_localpair.fasta,highly accurate multiple sequence alignment for fewer than 200 sequences,phylogenetics
mafft,mafft_08,--auto --adjustdirectionaccurately --thread 8 rna_sequences.fasta > aligned_rna.fasta,align RNA sequences adjusting for strand orientation,phylogenetics
mafft,mafft_09,--auto --thread 8 --phylipout sequences.fasta > aligned.phy,align sequences and output in PHYLIP format for phylogenetic analysis,phylogenetics
mafft,mafft_10,--add new_sequences.fasta --thread 8 existing_alignment.fasta > updated_alignment.fasta,add new sequences to existing alignment with default parameters,phylogenetics
mash,mash_01,sketch -o genomes_db *.fasta,sketch a collection of genome FASTA files into a single database,sequence-utilities
mash,mash_02,dist genome1.fasta genome2.fasta,compute pairwise distances between two genome sketches,sequence-utilities
mash,mash_03,dist -p 16 genomes_db.msh query.fasta | sort -k3 -n | head -20,query all genomes in a database against a query genome,sequence-utilities
mash,mash_04,sketch -m 2 -s 10000 -o reads_sketch reads.fastq.gz,sketch raw sequencing reads with error filtering,sequence-utilities
mash,mash_05,screen -w -p 8 refdb.msh metagenome.fastq.gz | sort -gr -k1 > screen_results.txt,screen a metagenome for known reference genomes,sequence-utilities
mash,mash_06,triangle -p 16 genomes_db.msh > distances.tsv,compute all-vs-all distance triangle for genome clustering,sequence-utilities
mash,mash_07,sketch -o genomes_db *.fasta -t 4,sketch a collection of genome FASTA files into a single database using multiple threads,sequence-utilities
mash,mash_08,dist genome1.fasta genome2.fasta -o output.txt,compute pairwise distances between two genome sketches and write output to a file,sequence-utilities
mash,mash_09,dist -p 16 genomes_db.msh query.fasta | sort -k3 -n | head -20,query all genomes in a database against a query genome,sequence-utilities
mash,mash_10,sketch -m 2 -s 10000 -o reads_sketch reads.fastq.gz,sketch raw sequencing reads with error filtering with default parameters,sequence-utilities
medaka,medaka_01,medaka_consensus -i reads.fastq.gz -d draft_assembly.fasta -o medaka_output/ -t 8 -m r941_min_hac_g507,polish an ONT assembly with Medaka (all-in-one pipeline),variant-calling
medaka,medaka_02,medaka_haploid_variant -i reads.fastq.gz -r reference.fasta -o medaka_variants/ -t 8 -m r941_min_hac_g507,call variants from ONT reads (haploid),variant-calling
medaka,medaka_03,tools list_models,list available Medaka models,variant-calling
medaka,medaka_04,medaka_consensus -i reads.fastq.gz -d draft.fasta -o medaka_gpu/ -t 2 -m r1041_e82_400bps_hac_v4.2.0 --gpu,run Medaka consensus with GPU acceleration,variant-calling
medaka,medaka_05,medaka_consensus -i reads.fastq.gz -d draft_assembly.fasta -o medaka_output/ -t 8 -m r941_min_hac_g507,polish an ONT assembly with Medaka (all-in-one pipeline) with default parameters,variant-calling
medaka,medaka_06,medaka_haploid_variant -i reads.fastq.gz -r reference.fasta -o medaka_variants/ -t 8 -m r941_min_hac_g507 --verbose,call variants from ONT reads (haploid) with verbose output,variant-calling
medaka,medaka_07,tools list_models -t 4,list available Medaka models using multiple threads,variant-calling
medaka,medaka_08,medaka_consensus -i reads.fastq.gz -d draft.fasta -o medaka_gpu/ -t 2 -m r1041_e82_400bps_hac_v4.2.0 --gpu,run Medaka consensus with GPU acceleration and write output to a file,variant-calling
medaka,medaka_09,medaka_consensus -i reads.fastq.gz -d draft_assembly.fasta -o medaka_output/ -t 8 -m r941_min_hac_g507 --quiet,polish an ONT assembly with Medaka (all-in-one pipeline) in quiet mode,variant-calling
medaka,medaka_10,medaka_haploid_variant -i reads.fastq.gz -r reference.fasta -o medaka_variants/ -t 8 -m r941_min_hac_g507,call variants from ONT reads (haploid) with default parameters,variant-calling
megahit,megahit_01,-1 R1.fastq.gz -2 R2.fastq.gz -o megahit_output/ --num-cpu-threads 16 --min-contig-len 500,assemble a metagenome from paired-end reads,assembly
megahit,megahit_02,-1 R1.fastq.gz -2 R2.fastq.gz -o large_meta/ --num-cpu-threads 32 --presets meta-large --min-contig-len 500,assemble a large complex metagenome with meta-large preset,assembly
megahit,megahit_03,"-1 s1_R1.fq.gz,s2_R1.fq.gz -2 s1_R2.fq.gz,s2_R2.fq.gz -o coassembly/ --num-cpu-threads 32 --min-contig-len 500",assemble metagenome from multiple samples combined,assembly
megahit,megahit_04,-1 R1.fastq.gz -2 R2.fastq.gz -o custom_k/ --num-cpu-threads 16 --k-min 27 --k-max 127 --k-step 10,assemble with custom k-mer range for specific data type,assembly
megahit,megahit_05,-1 R1.fastq.gz -2 R2.fastq.gz -o megahit_output/ --num-cpu-threads 16 --min-contig-len 500,assemble a metagenome from paired-end reads with default parameters,assembly
megahit,megahit_06,-1 R1.fastq.gz -2 R2.fastq.gz -o large_meta/ --num-cpu-threads 32 --presets meta-large --min-contig-len 500 --verbose,assemble a large complex metagenome with meta-large preset with verbose output,assembly
megahit,megahit_07,"-1 s1_R1.fq.gz,s2_R1.fq.gz -2 s1_R2.fq.gz,s2_R2.fq.gz -o coassembly/ --num-cpu-threads 32 --min-contig-len 500 -t 4",assemble metagenome from multiple samples combined using multiple threads,assembly
megahit,megahit_08,-1 R1.fastq.gz -2 R2.fastq.gz -o custom_k/ --num-cpu-threads 16 --k-min 27 --k-max 127 --k-step 10,assemble with custom k-mer range for specific data type and write output to a file,assembly
megahit,megahit_09,-1 R1.fastq.gz -2 R2.fastq.gz -o megahit_output/ --num-cpu-threads 16 --min-contig-len 500 --quiet,assemble a metagenome from paired-end reads in quiet mode,assembly
megahit,megahit_10,-1 R1.fastq.gz -2 R2.fastq.gz -o large_meta/ --num-cpu-threads 32 --presets meta-large --min-contig-len 500,assemble a large complex metagenome with meta-large preset with default parameters,assembly
meme,meme_01,-dna -mod zoops -nmotifs 10 -minw 6 -maxw 20 -oc meme_output peaks.fasta,discover de novo motifs in ChIP-seq peak sequences,utilities
meme,meme_02,fimo --thresh 1e-4 --oc fimo_output $MEME/share/meme/db/motif_databases/JASPAR/JASPAR2022_CORE_vertebrates_non-redundant_v2.meme peaks.fasta,scan sequences for known TF binding motifs with FIMO,utilities
meme,meme_03,tomtom -oc tomtom_output meme_output/meme.xml $MEME/share/meme/db/motif_databases/JASPAR/JASPAR2022_CORE_vertebrates_non-redundant_v2.meme,compare discovered motifs against a known database with TOMTOM,utilities
meme,meme_04,ame --oc ame_output --control shuffled_bg.fasta peaks.fasta $MEME/share/meme/db/motif_databases/HOCOMOCO/HOCOMOCOv11_core_HUMAN_mono_meme_format.meme,test motif enrichment in a foreground vs background with AME,utilities
meme,meme_05,streme --oc streme_output --dna --p peaks.fasta --n shuffled.fasta,run STREME for fast short motif discovery,utilities
meme,meme_06,bedtools getfasta -fi genome.fa -bed peaks.bed -fo peaks.fasta,extract sequences for peak regions using bedtools first,utilities
meme,meme_07,-dna -revcomp -mod zoops -nmotifs 5 -oc meme_rc peaks.fasta,run MEME with reverse complement consideration,utilities
meme,meme_08,-dna -mod zoops -nmotifs 10 -minw 6 -maxw 20 -oc meme_output peaks.fasta -o output.txt,discover de novo motifs in ChIP-seq peak sequences and write output to a file,utilities
meme,meme_09,fimo --thresh 1e-4 --oc fimo_output $MEME/share/meme/db/motif_databases/JASPAR/JASPAR2022_CORE_vertebrates_non-redundant_v2.meme peaks.fasta --quiet,scan sequences for known TF binding motifs with FIMO in quiet mode,utilities
meme,meme_10,tomtom -oc tomtom_output meme_output/meme.xml $MEME/share/meme/db/motif_databases/JASPAR/JASPAR2022_CORE_vertebrates_non-redundant_v2.meme,compare discovered motifs against a known database with TOMTOM with default parameters,utilities
metabat2,metabat2_01,jgi_summarize_bam_contig_depths --outputDepth contig_depths.txt sample1.bam sample2.bam sample3.bam,compute contig depths from BAM files for MetaBAT2,metagenomics
metabat2,metabat2_02,-i assembly.fasta -a contig_depths.txt -o bins/bin -m 2500 -t 8,bin metagenomic assembly contigs into MAGs,metagenomics
metabat2,metabat2_03,-i assembly.fasta -o bins/bin -m 1500 -t 8,run MetaBAT2 binning without coverage information (tetranucleotide only),metagenomics
metabat2,metabat2_04,-i assembly.fasta -a contig_depths.txt -o bins/bin --sensitive -m 2000 -t 8,bin with custom sensitivity settings,metagenomics
metabat2,metabat2_05,jgi_summarize_bam_contig_depths --outputDepth contig_depths.txt sample1.bam sample2.bam sample3.bam,compute contig depths from BAM files for MetaBAT2 with default parameters,metagenomics
metabat2,metabat2_06,-i assembly.fasta -a contig_depths.txt -o bins/bin -m 2500 -t 8 --verbose,bin metagenomic assembly contigs into MAGs with verbose output,metagenomics
metabat2,metabat2_07,-i assembly.fasta -o bins/bin -m 1500 -t 8,run MetaBAT2 binning without coverage information (tetranucleotide only) using multiple threads,metagenomics
metabat2,metabat2_08,-i assembly.fasta -a contig_depths.txt -o bins/bin --sensitive -m 2000 -t 8,bin with custom sensitivity settings and write output to a file,metagenomics
metabat2,metabat2_09,jgi_summarize_bam_contig_depths --outputDepth contig_depths.txt sample1.bam sample2.bam sample3.bam --quiet,compute contig depths from BAM files for MetaBAT2 in quiet mode,metagenomics
metabat2,metabat2_10,-i assembly.fasta -a contig_depths.txt -o bins/bin -m 2500 -t 8,bin metagenomic assembly contigs into MAGs with default parameters,metagenomics
metaphlan,metaphlan_01,--input_type fastq --bowtie2db /path/to/mpa_db --index mpa_vJan21_CHOCOPhlAnSGB_202103 --nproc 8 reads.fastq.gz -o sample_profile.txt,profile microbial community from single-end FASTQ reads,metagenomics
metaphlan,metaphlan_02,"--input_type fastq --bowtie2db /path/to/mpa_db --index mpa_vJan21_CHOCOPhlAnSGB_202103 --nproc 8 -o sample_profile.txt R1.fastq.gz,R2.fastq.gz",profile paired-end metagenomic reads,metagenomics
metaphlan,metaphlan_03,--input_type fastq --bowtie2db /path/to/mpa_db --index mpa_vJan21_CHOCOPhlAnSGB_202103 --nproc 8 --bowtie2out sample.bowtie2.bz2 -o sample_profile.txt reads.fastq.gz,save bowtie2 alignments for faster re-runs and profile,metagenomics
metaphlan,metaphlan_04,sample1_profile.txt sample2_profile.txt sample3_profile.txt > merged_profiles.txt,merge multiple MetaPhlAn profiles into a single table,metagenomics
metaphlan,metaphlan_05,--input_type fastq --bowtie2db /path/to/mpa_db --index mpa_vJan21_CHOCOPhlAnSGB_202103 --nproc 8 reads.fastq.gz -o sample_profile.txt,profile microbial community from single-end FASTQ reads with default parameters,metagenomics
metaphlan,metaphlan_06,"--input_type fastq --bowtie2db /path/to/mpa_db --index mpa_vJan21_CHOCOPhlAnSGB_202103 --nproc 8 -o sample_profile.txt R1.fastq.gz,R2.fastq.gz --verbose",profile paired-end metagenomic reads with verbose output,metagenomics
metaphlan,metaphlan_07,--input_type fastq --bowtie2db /path/to/mpa_db --index mpa_vJan21_CHOCOPhlAnSGB_202103 --nproc 8 --bowtie2out sample.bowtie2.bz2 -o sample_profile.txt reads.fastq.gz -t 4,save bowtie2 alignments for faster re-runs and profile using multiple threads,metagenomics
metaphlan,metaphlan_08,sample1_profile.txt sample2_profile.txt sample3_profile.txt > merged_profiles.txt,merge multiple MetaPhlAn profiles into a single table,metagenomics
metaphlan,metaphlan_09,--input_type fastq --bowtie2db /path/to/mpa_db --index mpa_vJan21_CHOCOPhlAnSGB_202103 --nproc 8 reads.fastq.gz -o sample_profile.txt --quiet,profile microbial community from single-end FASTQ reads in quiet mode,metagenomics
metaphlan,metaphlan_10,"--input_type fastq --bowtie2db /path/to/mpa_db --index mpa_vJan21_CHOCOPhlAnSGB_202103 --nproc 8 -o sample_profile.txt R1.fastq.gz,R2.fastq.gz",profile paired-end metagenomic reads with default parameters,metagenomics
miniasm,miniasm_01,-x ava-ont -t 16 reads.fastq.gz reads.fastq.gz | gzip > overlaps.paf.gz,compute all-vs-all overlaps for ONT reads with minimap2,assembly
miniasm,miniasm_02,-f reads.fastq.gz overlaps.paf.gz > assembly.gfa,assemble ONT reads from precomputed overlaps,assembly
miniasm,miniasm_03,"/^S/ {print "">""$2""\n""$3}",convert miniasm GFA output to FASTA,assembly
miniasm,miniasm_04,-x ava-ont -t 16 reads.fastq.gz reads.fastq.gz | gzip > overlaps.paf.gz,compute all-vs-all overlaps for ONT reads with minimap2,assembly
miniasm,miniasm_05,-f reads.fastq.gz overlaps.paf.gz > assembly.gfa,assemble ONT reads from precomputed overlaps with default parameters,assembly
miniasm,miniasm_06,"/^S/ {print "">""$2""\n""$3}",convert miniasm GFA output to FASTA,assembly
miniasm,miniasm_07,-x ava-ont -t 16 reads.fastq.gz reads.fastq.gz | gzip > overlaps.paf.gz,compute all-vs-all overlaps for ONT reads with minimap2,assembly
miniasm,miniasm_08,-f reads.fastq.gz overlaps.paf.gz > assembly.gfa,assemble ONT reads from precomputed overlaps,assembly
miniasm,miniasm_09,"/^S/ {print "">""$2""\n""$3}",convert miniasm GFA output to FASTA,assembly
miniasm,miniasm_10,-x ava-ont -t 16 reads.fastq.gz reads.fastq.gz | gzip > overlaps.paf.gz,compute all-vs-all overlaps for ONT reads with minimap2 with default parameters,assembly
minimap2,minimap2_01,-ax map-ont -t 8 reference.fa nanopore_reads.fastq.gz | samtools sort -@ 4 -o aligned_sorted.bam,align Oxford Nanopore reads to a reference genome,alignment
minimap2,minimap2_02,-ax map-hifi -t 8 reference.fa hifi_reads.fastq.gz | samtools sort -@ 4 -o hifi_aligned.bam,align PacBio HiFi (CCS) reads to a reference genome,alignment
minimap2,minimap2_03,-ax splice -t 8 --junc-bed known_junctions.bed reference.fa rna_reads.fastq.gz | samtools sort -o rna_aligned.bam,align Nanopore cDNA reads for RNA-seq spliced alignment,alignment
minimap2,minimap2_04,-ax asm5 -t 8 reference.fa assembly.fa | samtools sort -o assembly_vs_ref.bam,compare two genome assemblies (assembly vs reference),alignment
minimap2,minimap2_05,-x map-ont -t 8 -c reference.fa reads.fastq.gz > aligned.paf,map long reads and output in PAF format for structural variant analysis,alignment
minimap2,minimap2_06,-x ava-ont -t 16 reads.fastq.gz reads.fastq.gz | gzip > overlaps.paf.gz,compute all-vs-all overlaps for de novo ONT assembly,alignment
minimap2,minimap2_07,-d reference_ont.mmi -x map-ont reference.fa,build a reusable minimap2 index for repeated ONT alignments,alignment
minimap2,minimap2_08,-ax map-ont -t 8 reference.fa nanopore_reads.fastq.gz | samtools sort -@ 4 -o aligned_sorted.bam,align Oxford Nanopore reads to a reference genome,alignment
minimap2,minimap2_09,-ax map-hifi -t 8 reference.fa hifi_reads.fastq.gz | samtools sort -@ 4 -o hifi_aligned.bam,align PacBio HiFi (CCS) reads to a reference genome,alignment
minimap2,minimap2_10,-ax splice -t 8 --junc-bed known_junctions.bed reference.fa rna_reads.fastq.gz | samtools sort -o rna_aligned.bam,align Nanopore cDNA reads for RNA-seq spliced alignment with default parameters,alignment
mmseqs2,mmseqs2_01,easy-search query.fasta uniref50.fasta results.m8 tmp --format-mode 0 --threads 16 -s 7.5,search protein FASTA against UniRef50 and output BLAST tabular results,sequence-utilities
mmseqs2,mmseqs2_02,easy-cluster proteins.fasta cluster_90 tmp --min-seq-id 0.9 -c 0.8 --cov-mode 0 --threads 16,cluster protein sequences at 90% identity,sequence-utilities
mmseqs2,mmseqs2_03,easy-linclust proteins.fasta cluster_50 tmp --min-seq-id 0.5 -c 0.8 --threads 32,fast linear-time clustering of large metagenomic protein set at 50% identity,sequence-utilities
mmseqs2,mmseqs2_04,createdb proteins.fasta proteinsDB,build an MMseqs2 database from a FASTA file,sequence-utilities
mmseqs2,mmseqs2_05,search queryDB targetDB resultDB tmp -s 6 --threads 16 && convertalis queryDB targetDB resultDB results.tsv --format-mode 4,search one MMseqs2 DB against another and convert results to TSV,sequence-utilities
mmseqs2,mmseqs2_06,result2repseq proteinsDB proteinsDB cluster_result repseqDB && convert2fasta repseqDB representatives.fasta,extract representative sequences from a cluster result,sequence-utilities
mmseqs2,mmseqs2_07,easy-search reads.fasta proteins.fasta hits.m8 tmp --search-type 2 --threads 16,perform translated nucleotide-to-protein search,sequence-utilities
mmseqs2,mmseqs2_08,easy-search query.fasta uniref50.fasta results.m8 tmp --format-mode 0 --threads 16 -s 7.5 -o output.txt,search protein FASTA against UniRef50 and output BLAST tabular results and write output to a file,sequence-utilities
mmseqs2,mmseqs2_09,easy-cluster proteins.fasta cluster_90 tmp --min-seq-id 0.9 -c 0.8 --cov-mode 0 --threads 16 --quiet,cluster protein sequences at 90% identity in quiet mode,sequence-utilities
mmseqs2,mmseqs2_10,easy-linclust proteins.fasta cluster_50 tmp --min-seq-id 0.5 -c 0.8 --threads 32,fast linear-time clustering of large metagenomic protein set at 50% identity with default parameters,sequence-utilities
modkit,modkit_01,pileup --ref reference.fasta --mod-code m --cpg input.bam output.bedmethyl --threads 16,generate a bedMethyl pileup of 5mC methylation from a BAM file,epigenomics
modkit,modkit_02,pileup --ref reference.fasta --cpg --combine-strands -t 16 input.bam output_combined.bedmethyl,generate bedMethyl and combine CpG sites on both strands,epigenomics
modkit,modkit_03,extract --ref reference.fasta --mod-code m input.bam per_read_mods.tsv --threads 16,extract per-read modification data to TSV,epigenomics
modkit,modkit_04,summary input.bam --threads 8,get a summary of modification calls in a BAM,epigenomics
modkit,modkit_05,motif-bed reference.fasta CG 0 > cpg_positions.bed,generate a BED file of all CpG positions in a reference for use as motif targets,epigenomics
modkit,modkit_06,pileup --ref reference.fasta --region chr1:1-10000000 --mod-code m input.bam region_output.bedmethyl --threads 8,pileup restricted to a specific genomic region,epigenomics
modkit,modkit_07,sample-probs --mod-code m input.bam --threads 8,sample modification probabilities to assess threshold distribution,epigenomics
modkit,modkit_08,pileup --ref reference.fasta --mod-code m --cpg input.bam output.bedmethyl --threads 16 -o output.txt,generate a bedMethyl pileup of 5mC methylation from a BAM file and write output to a file,epigenomics
modkit,modkit_09,pileup --ref reference.fasta --cpg --combine-strands -t 16 input.bam output_combined.bedmethyl --quiet,generate bedMethyl and combine CpG sites on both strands in quiet mode,epigenomics
modkit,modkit_10,extract --ref reference.fasta --mod-code m input.bam per_read_mods.tsv --threads 16,extract per-read modification data to TSV with default parameters,epigenomics
mosdepth,mosdepth_01,--by 500 -t 8 --prefix sample_coverage sample_sorted.bam,calculate genome-wide depth of coverage in 500bp windows,utilities
mosdepth,mosdepth_02,--by targets.bed -t 4 --prefix wes_coverage sample_sorted.bam,calculate coverage over target regions for WES,utilities
mosdepth,mosdepth_03,-t 4 -Q 20 -F 1796 --prefix filtered_coverage sample_sorted.bam,calculate per-base depth with MAPQ filter,utilities
mosdepth,mosdepth_04,-n -t 8 --prefix summary_only sample_sorted.bam,get summary statistics only without per-base output,utilities
mosdepth,mosdepth_05,--by 500 -t 8 --prefix sample_coverage sample_sorted.bam,calculate genome-wide depth of coverage in 500bp windows with default parameters,utilities
mosdepth,mosdepth_06,--by targets.bed -t 4 --prefix wes_coverage sample_sorted.bam --verbose,calculate coverage over target regions for WES with verbose output,utilities
mosdepth,mosdepth_07,-t 4 -Q 20 -F 1796 --prefix filtered_coverage sample_sorted.bam,calculate per-base depth with MAPQ filter using multiple threads,utilities
mosdepth,mosdepth_08,-n -t 8 --prefix summary_only sample_sorted.bam -o output.txt,get summary statistics only without per-base output and write output to a file,utilities
mosdepth,mosdepth_09,--by 500 -t 8 --prefix sample_coverage sample_sorted.bam --quiet,calculate genome-wide depth of coverage in 500bp windows in quiet mode,utilities
mosdepth,mosdepth_10,--by targets.bed -t 4 --prefix wes_coverage sample_sorted.bam,calculate coverage over target regions for WES with default parameters,utilities
multiqc,multiqc_01,. -o multiqc_report/ -f,aggregate all QC results from the current directory into a single report,qc
multiqc,multiqc_02,/path/to/results/ -o /path/to/qc_summary/ -n project_qc_report -f,aggregate QC results from a specific results directory,qc
multiqc,multiqc_03,/results/ --ignore /results/old_run/ -o multiqc_output/ -f,run multiqc ignoring a specific subdirectory,qc
multiqc,multiqc_04,. --flat -o flat_report/ -f,generate a multiqc report with flat (non-interactive) output suitable for PDF,qc
multiqc,multiqc_05,fastqc_results/ trimmomatic_logs/ -o summary_qc/ -f,run multiqc on only FastQC and Trimmomatic outputs,qc
multiqc,multiqc_06,. -o multiqc_report/ -f --verbose,aggregate all QC results from the current directory into a single report with verbose output,qc
multiqc,multiqc_07,/path/to/results/ -o /path/to/qc_summary/ -n project_qc_report -f -t 4,aggregate QC results from a specific results directory using multiple threads,qc
multiqc,multiqc_08,/results/ --ignore /results/old_run/ -o multiqc_output/ -f,run multiqc ignoring a specific subdirectory and write output to a file,qc
multiqc,multiqc_09,. --flat -o flat_report/ -f --quiet,generate a multiqc report with flat (non-interactive) output suitable for PDF in quiet mode,qc
multiqc,multiqc_10,fastqc_results/ trimmomatic_logs/ -o summary_qc/ -f,run multiqc on only FastQC and Trimmomatic outputs with default parameters,qc
mummer,mummer_01,nucmer --prefix=myrun reference.fna query.fna,align a query genome to a reference genome,alignment
mummer,mummer_02,dnadiff reference.fna query.fna,generate a comprehensive pairwise genome comparison report,alignment
mummer,mummer_03,delta-filter -1 myrun.delta > myrun.filtered.delta && show-snps -Clr myrun.filtered.delta > myrun.snps,filter alignments to 1-to-1 (unique) and extract SNPs,alignment
mummer,mummer_04,show-coords -r -c -l myrun.delta > myrun.coords,show alignment coordinates,alignment
mummer,mummer_05,mummerplot --png --prefix=dotplot myrun.delta,generate a synteny dot-plot image,alignment
mummer,mummer_06,nucmer --mum -p compare reference.fa query.fa && show-snps -Clrx compare.delta,compare two genomes with verbose SNP output,alignment
mummer,mummer_07,nucmer -c 100 -l 20 --prefix large_genome ref.fa query.fa,align with a custom minimum match length,alignment
mummer,mummer_08,nucmer --prefix=myrun reference.fna query.fna -o output.txt,align a query genome to a reference genome and write output to a file,alignment
mummer,mummer_09,dnadiff reference.fna query.fna --quiet,generate a comprehensive pairwise genome comparison report in quiet mode,alignment
mummer,mummer_10,delta-filter -1 myrun.delta > myrun.filtered.delta && show-snps -Clr myrun.filtered.delta > myrun.snps,filter alignments to 1-to-1 (unique) and extract SNPs with default parameters,alignment
muscle,muscle_01,-align proteins.fasta -output aligned_proteins.fasta -threads 8,align multiple protein sequences with MUSCLE v5,phylogenetics
muscle,muscle_02,-super5 large_dataset.fasta -output large_aligned.fasta -threads 16,align a large dataset with MUSCLE v5 super5 mode,phylogenetics
muscle,muscle_03,-in sequences.fasta -out aligned.fasta,align sequences with MUSCLE v3 syntax (legacy),phylogenetics
muscle,muscle_04,-align sequences.fasta -output aligned.fasta -replicates 5 -threads 8,generate multiple alignment replicates for uncertainty estimation,phylogenetics
muscle,muscle_05,-align proteins.fasta -output aligned_proteins.fasta -threads 8,align multiple protein sequences with MUSCLE v5 with default parameters,phylogenetics
muscle,muscle_06,-super5 large_dataset.fasta -output large_aligned.fasta -threads 16 --verbose,align a large dataset with MUSCLE v5 super5 mode with verbose output,phylogenetics
muscle,muscle_07,-in sequences.fasta -out aligned.fasta -t 4,align sequences with MUSCLE v3 syntax (legacy) using multiple threads,phylogenetics
muscle,muscle_08,-align sequences.fasta -output aligned.fasta -replicates 5 -threads 8 -o output.txt,generate multiple alignment replicates for uncertainty estimation and write output to a file,phylogenetics
muscle,muscle_09,-align proteins.fasta -output aligned_proteins.fasta -threads 8 --quiet,align multiple protein sequences with MUSCLE v5 in quiet mode,phylogenetics
muscle,muscle_10,-super5 large_dataset.fasta -output large_aligned.fasta -threads 16,align a large dataset with MUSCLE v5 super5 mode with default parameters,phylogenetics
nextflow,nextflow_01,"run nf-core/rnaseq -profile singularity,slurm --input samplesheet.csv --genome GRCh38 -resume",run an nf-core pipeline with Singularity on a Slurm cluster,workflow-manager
nextflow,nextflow_02,run main.nf -work-dir /scratch/$USER/nxf-work,run a pipeline with a custom work directory,workflow-manager
nextflow,nextflow_03,pull nf-core/sarek -revision 3.4.0,pull a specific pipeline version from nf-core,workflow-manager
nextflow,nextflow_04,run main.nf -resume,resume a failed pipeline run,workflow-manager
nextflow,nextflow_05,run nf-core/chipseq -c custom.config --input samplesheet.csv,run pipeline with a custom config file,workflow-manager
nextflow,nextflow_06,list,show the list of cached pipeline assets,workflow-manager
nextflow,nextflow_07,clean -but last,clean up work directory keeping only the last run's intermediate files,workflow-manager
nextflow,nextflow_08,run nf-core/rnaseq -profile singularity --singularity.cacheDir /shared/singularity-cache --input samplesheet.csv,run a pipeline with Singularity image cache set,workflow-manager
nextflow,nextflow_09,-version,check Nextflow version and environment,workflow-manager
nextflow,nextflow_10,run main.nf -with-report report.html -with-timeline timeline.html,generate a run report and timeline,workflow-manager
orthofinder,orthofinder_01,-f proteomes/ -t 32 -a 8,run OrthoFinder on a directory of species proteomes,comparative-genomics
orthofinder,orthofinder_02,-f proteomes/ -M msa -S diamond -A mafft -T iqtree -t 32 -a 8,run OrthoFinder with MSA-based gene trees using MAFFT and IQ-TREE,comparative-genomics
orthofinder,orthofinder_03,-f proteomes/ -og -t 32,infer orthogroups only without gene trees for fast proteome comparison,comparative-genomics
orthofinder,orthofinder_04,-b proteomes/OrthoFinder/Results_Jan01/ -f new_species/ -t 32 -a 8,restart OrthoFinder from existing DIAMOND results (add a new species),comparative-genomics
orthofinder,orthofinder_05,-f proteomes/ -S mmseqs2 -t 32 -a 8,use MMseqs2 instead of DIAMOND for faster all-vs-all search,comparative-genomics
orthofinder,orthofinder_06,-f proteomes/ -o results/orthofinder_run -t 32 -a 8,run OrthoFinder with a fixed output directory name,comparative-genomics
orthofinder,orthofinder_07,-f proteomes/ -t 32 -a 8,run OrthoFinder on a directory of species proteomes using multiple threads,comparative-genomics
orthofinder,orthofinder_08,-f proteomes/ -M msa -S diamond -A mafft -T iqtree -t 32 -a 8 -o output.txt,run OrthoFinder with MSA-based gene trees using MAFFT and IQ-TREE and write output to a file,comparative-genomics
orthofinder,orthofinder_09,-f proteomes/ -og -t 32 --quiet,infer orthogroups only without gene trees for fast proteome comparison in quiet mode,comparative-genomics
orthofinder,orthofinder_10,-b proteomes/OrthoFinder/Results_Jan01/ -f new_species/ -t 32 -a 8,restart OrthoFinder from existing DIAMOND results (add a new species) with default parameters,comparative-genomics
pairtools,pairtools_01,parse --min-mapq 30 --walks-policy mask --max-inter-align-gap 30 -N sample --chroms-path chromsizes.txt sorted.bam > sample.pairs.gz,parse Hi-C BWA alignments to pairs format,epigenomics
pairtools,pairtools_02,sort sample.pairs.gz --nproc 16 --tmpdir /tmp/ > sample_sorted.pairs.gz,sort pairs file for deduplication,epigenomics
pairtools,pairtools_03,dedup --nproc 16 --output-stats dedup_stats.txt sample_sorted.pairs.gz > sample_dedup.pairs.gz,deduplicate sorted pairs file,epigenomics
pairtools,pairtools_04,cload pairs --chrom1 2 --pos1 3 --chrom2 4 --pos2 5 chromsizes.txt:5000 sample_dedup.pairs.gz sample_5kb.cool,bin pairs into contact matrix using cooler,epigenomics
pairtools,pairtools_05,parse --min-mapq 30 --walks-policy mask --max-inter-align-gap 30 -N sample --chroms-path chromsizes.txt sorted.bam > sample.pairs.gz,parse Hi-C BWA alignments to pairs format with default parameters,epigenomics
pairtools,pairtools_06,sort sample.pairs.gz --nproc 16 --tmpdir /tmp/ > sample_sorted.pairs.gz,sort pairs file for deduplication,epigenomics
pairtools,pairtools_07,dedup --nproc 16 --output-stats dedup_stats.txt sample_sorted.pairs.gz > sample_dedup.pairs.gz,deduplicate sorted pairs file,epigenomics
pairtools,pairtools_08,cload pairs --chrom1 2 --pos1 3 --chrom2 4 --pos2 5 chromsizes.txt:5000 sample_dedup.pairs.gz sample_5kb.cool -o output.txt,bin pairs into contact matrix using cooler and write output to a file,epigenomics
pairtools,pairtools_09,parse --min-mapq 30 --walks-policy mask --max-inter-align-gap 30 -N sample --chroms-path chromsizes.txt sorted.bam > sample.pairs.gz,parse Hi-C BWA alignments to pairs format,epigenomics
pairtools,pairtools_10,sort sample.pairs.gz --nproc 16 --tmpdir /tmp/ > sample_sorted.pairs.gz,sort pairs file for deduplication with default parameters,epigenomics
pbfusion,pbfusion_01,--bam isoseq_aligned.bam --gtf genes.gtf --output-dir fusion_output/ --threads 8,detect gene fusions from PacBio IsoSeq aligned data,rna-seq
pbfusion,pbfusion_02,--bam isoseq_aligned.bam --gtf genes.gtf --output-dir fusion_output/ --min-support 3 --threads 8,detect fusions with minimum supporting reads,rna-seq
pbfusion,pbfusion_03,--bam isoseq_aligned.bam --gtf genes.gtf --output-dir fusion_output/ --threads 8,detect gene fusions from PacBio IsoSeq aligned data and write output to a file,rna-seq
pbfusion,pbfusion_04,--bam isoseq_aligned.bam --gtf genes.gtf --output-dir fusion_output/ --min-support 3 --threads 8 --quiet,detect fusions with minimum supporting reads in quiet mode,rna-seq
pbfusion,pbfusion_05,--bam isoseq_aligned.bam --gtf genes.gtf --output-dir fusion_output/ --threads 8,detect gene fusions from PacBio IsoSeq aligned data with default parameters,rna-seq
pbfusion,pbfusion_06,--bam isoseq_aligned.bam --gtf genes.gtf --output-dir fusion_output/ --min-support 3 --threads 8 --verbose,detect fusions with minimum supporting reads with verbose output,rna-seq
pbfusion,pbfusion_07,--bam isoseq_aligned.bam --gtf genes.gtf --output-dir fusion_output/ --threads 8,detect gene fusions from PacBio IsoSeq aligned data using multiple threads,rna-seq
pbfusion,pbfusion_08,--bam isoseq_aligned.bam --gtf genes.gtf --output-dir fusion_output/ --min-support 3 --threads 8,detect fusions with minimum supporting reads and write output to a file,rna-seq
pbfusion,pbfusion_09,--bam isoseq_aligned.bam --gtf genes.gtf --output-dir fusion_output/ --threads 8 --quiet,detect gene fusions from PacBio IsoSeq aligned data in quiet mode,rna-seq
pbfusion,pbfusion_10,--bam isoseq_aligned.bam --gtf genes.gtf --output-dir fusion_output/ --min-support 3 --threads 8,detect fusions with minimum supporting reads with default parameters,rna-seq
pbmm2,pbmm2_01,align --preset HIFI --sort -j 16 --sort-threads 4 reference.fa hifi_reads.bam aligned_sorted.bam,align PacBio HiFi reads to reference genome,alignment
pbmm2,pbmm2_02,align --preset ISOSEQ --sort -j 8 reference.fa isoseq_reads.bam isoseq_aligned.bam,align PacBio IsoSeq transcriptome reads,alignment
pbmm2,pbmm2_03,index reference.fa reference.mmi,index reference genome for repeated pbmm2 use,alignment
pbmm2,pbmm2_04,align --preset SUBREAD --sort -j 16 reference.fa subreads.bam clr_aligned.bam,align CLR (subread) PacBio reads,alignment
pbmm2,pbmm2_05,align --preset HIFI --sort -j 16 --sort-threads 4 reference.fa hifi_reads.bam aligned_sorted.bam,align PacBio HiFi reads to reference genome with default parameters,alignment
pbmm2,pbmm2_06,align --preset ISOSEQ --sort -j 8 reference.fa isoseq_reads.bam isoseq_aligned.bam --verbose,align PacBio IsoSeq transcriptome reads with verbose output,alignment
pbmm2,pbmm2_07,index reference.fa reference.mmi -t 4,index reference genome for repeated pbmm2 use using multiple threads,alignment
pbmm2,pbmm2_08,align --preset SUBREAD --sort -j 16 reference.fa subreads.bam clr_aligned.bam -o output.txt,align CLR (subread) PacBio reads and write output to a file,alignment
pbmm2,pbmm2_09,align --preset HIFI --sort -j 16 --sort-threads 4 reference.fa hifi_reads.bam aligned_sorted.bam --quiet,align PacBio HiFi reads to reference genome in quiet mode,alignment
pbmm2,pbmm2_10,align --preset ISOSEQ --sort -j 8 reference.fa isoseq_reads.bam isoseq_aligned.bam,align PacBio IsoSeq transcriptome reads with default parameters,alignment
pbsv,pbsv_01,discover --hifi sorted.bam sample.svsig.gz,discover SV signatures from PacBio HiFi aligned BAM,variant-calling
pbsv,pbsv_02,call --hifi reference.fa sample.svsig.gz output_svs.vcf,call SVs from a single sample's signature file,variant-calling
pbsv,pbsv_03,call --hifi reference.fa sample1.svsig.gz sample2.svsig.gz sample3.svsig.gz cohort_svs.vcf,call SVs jointly from multiple samples,variant-calling
pbsv,pbsv_04,discover --hifi --tandem-repeats hg38.trf.bed sorted.bam sample.svsig.gz,discover with tandem repeat annotation for better accuracy,variant-calling
pbsv,pbsv_05,discover --hifi sorted.bam sample.svsig.gz,discover SV signatures from PacBio HiFi aligned BAM with default parameters,variant-calling
pbsv,pbsv_06,call --hifi reference.fa sample.svsig.gz output_svs.vcf --verbose,call SVs from a single sample's signature file with verbose output,variant-calling
pbsv,pbsv_07,call --hifi reference.fa sample1.svsig.gz sample2.svsig.gz sample3.svsig.gz cohort_svs.vcf -t 4,call SVs jointly from multiple samples using multiple threads,variant-calling
pbsv,pbsv_08,discover --hifi --tandem-repeats hg38.trf.bed sorted.bam sample.svsig.gz -o output.txt,discover with tandem repeat annotation for better accuracy and write output to a file,variant-calling
pbsv,pbsv_09,discover --hifi sorted.bam sample.svsig.gz --quiet,discover SV signatures from PacBio HiFi aligned BAM in quiet mode,variant-calling
pbsv,pbsv_10,call --hifi reference.fa sample.svsig.gz output_svs.vcf,call SVs from a single sample's signature file with default parameters,variant-calling
perl,perl_01,script.pl input.txt,run a Perl script,programming
perl,perl_02,-V,print the Perl version and module search paths,programming
perl,perl_03,-ne 'print if /^>/' sequences.fasta,one-liner: print lines matching a pattern,programming
perl,perl_04,"-lane 'print join(""\t"", @F[0,2,4])' data.tsv",one-liner: extract specific columns from a TSV,programming
perl,perl_05,-i.bak -pe 's/chr/Chr/g' genome.fa,in-place substitution (edit file directly),programming
perl,perl_06,"-ne '$c++ if /^>/; END { print ""$c sequences\n"" }' input.fa",count FASTA sequences in a file,programming
perl,perl_07,"-MCPAN -e 'CPAN::Shell->install(""Bio::SeqIO"")'",install a module via CPAN one-liner,programming
perl,perl_08,-Mlocal::lib,set up a local user-space Perl module directory,programming
perl,perl_09,-MBio::SeqIO -e 1,check if a required module is installed,programming
perl,perl_10,-I /path/to/bioperl-lib script.pl input.fasta,run a bioinformatics script with a custom library path,programming
picard,picard_01,MarkDuplicates -I sorted.bam -O marked_dup.bam -M markdup_metrics.txt --CREATE_INDEX true,mark PCR duplicates in a sorted BAM file,alignment
picard,picard_02,AddOrReplaceReadGroups -I input.bam -O rg_added.bam --RGLB lib1 --RGPL ILLUMINA --RGPU unit1 --RGSM sample1 --CREATE_INDEX true,add or replace read groups in a BAM file,alignment
picard,picard_03,SortSam -I input.bam -O sorted.bam --SORT_ORDER coordinate --CREATE_INDEX true,sort a BAM file by coordinate using Picard,alignment
picard,picard_04,CollectAlignmentSummaryMetrics -I aligned.bam -O alignment_metrics.txt -R reference.fa,collect alignment summary metrics from a BAM file,alignment
picard,picard_05,CollectInsertSizeMetrics -I sorted.bam -O insert_size_metrics.txt -H insert_size_histogram.pdf,collect insert size distribution metrics from paired-end BAM,alignment
picard,picard_06,SortSam -I input.sam -O sorted.bam --SORT_ORDER coordinate --CREATE_INDEX true,convert SAM to sorted BAM with index,alignment
picard,picard_07,ValidateSamFile -I input.bam -O validation_report.txt --MODE SUMMARY,validate a BAM file for GATK compatibility,alignment
picard,picard_08,MarkDuplicates -I sorted.bam -O marked_dup.bam -M markdup_metrics.txt --CREATE_INDEX true -o output.txt,mark PCR duplicates in a sorted BAM file and write output to a file,alignment
picard,picard_09,AddOrReplaceReadGroups -I input.bam -O rg_added.bam --RGLB lib1 --RGPL ILLUMINA --RGPU unit1 --RGSM sample1 --CREATE_INDEX true --quiet,add or replace read groups in a BAM file in quiet mode,alignment
picard,picard_10,SortSam -I input.bam -O sorted.bam --SORT_ORDER coordinate --CREATE_INDEX true,sort a BAM file by coordinate using Picard with default parameters,alignment
pilon,pilon_01,-Xmx64g -jar pilon.jar --genome draft.fasta --frags aligned.sorted.bam --output polished --changes --threads 16,polish a draft assembly with paired-end Illumina reads,assembly
pilon,pilon_02,-Xmx128g -jar pilon.jar --genome draft.fasta --frags pe.sorted.bam --jumps mp.sorted.bam --output polished_v2 --threads 16,polish with mate-pair and paired-end libraries combined,assembly
pilon,pilon_03,-Xmx64g -jar pilon.jar --genome draft.fasta --frags aligned.sorted.bam --output polished --fix bases --threads 16,run Pilon fixing only SNPs and small indels (not structural),assembly
pilon,pilon_04,-Xmx64g -jar pilon.jar --genome draft.fasta --frags aligned.sorted.bam --output variants --variant --threads 16,generate a VCF of variants found in the assembly,assembly
pilon,pilon_05,-Xmx32g -jar pilon.jar --genome contigs.fasta --frags aligned.sorted.bam --output polished_contigs --targets contig_list.txt --threads 8,"polish a specific set of sequences (e.g., unplaced contigs only)",assembly
pilon,pilon_06,-Xmx64g -jar pilon.jar --genome polished.fasta --frags re_aligned.sorted.bam --output polished_r2 --changes --threads 16,second round of polishing after re-aligning reads to first round output,assembly
pilon,pilon_07,-Xmx64g -jar pilon.jar --genome draft.fasta --frags aligned.sorted.bam --output polished --changes --threads 16,polish a draft assembly with paired-end Illumina reads using multiple threads,assembly
pilon,pilon_08,-Xmx128g -jar pilon.jar --genome draft.fasta --frags pe.sorted.bam --jumps mp.sorted.bam --output polished_v2 --threads 16,polish with mate-pair and paired-end libraries combined and write output to a file,assembly
pilon,pilon_09,-Xmx64g -jar pilon.jar --genome draft.fasta --frags aligned.sorted.bam --output polished --fix bases --threads 16 --quiet,run Pilon fixing only SNPs and small indels (not structural) in quiet mode,assembly
pilon,pilon_10,-Xmx64g -jar pilon.jar --genome draft.fasta --frags aligned.sorted.bam --output variants --variant --threads 16,generate a VCF of variants found in the assembly with default parameters,assembly
plink2,plink2_01,"--vcf variants.vcf --make-pgen --out plink_dataset --set-missing-var-ids @:#[b37]\$r,\$a --max-alleles 2",convert VCF to PLINK2 binary format,population-genomics
plink2,plink2_02,--pfile plink_dataset --maf 0.01 --geno 0.05 --mind 0.1 --hwe 1e-6 --make-pgen --out qc_filtered,perform quality control filtering on PLINK dataset,population-genomics
plink2,plink2_03,--pfile plink_dataset --indep-pairwise 50 10 0.1 --out ld_prune && plink2 --pfile plink_dataset --extract ld_prune.prune.in --pca 20 --out pca_results,perform LD pruning and compute PCA,population-genomics
plink2,plink2_04,--pfile plink_dataset --pheno phenotypes.txt --pheno-name case_control --covar covariates.txt --glm hide-covar --out gwas_results,run genome-wide association study (GWAS) for binary phenotype,population-genomics
plink2,plink2_05,--pfile plink_dataset --extract ld_prune.prune.in --make-king-table --out kinship_matrix,compute kinship/relatedness matrix,population-genomics
plink2,plink2_06,"--vcf variants.vcf --make-pgen --out plink_dataset --set-missing-var-ids @:#[b37]\$r,\$a --max-alleles 2 --verbose",convert VCF to PLINK2 binary format with verbose output,population-genomics
plink2,plink2_07,--pfile plink_dataset --maf 0.01 --geno 0.05 --mind 0.1 --hwe 1e-6 --make-pgen --out qc_filtered -t 4,perform quality control filtering on PLINK dataset using multiple threads,population-genomics
plink2,plink2_08,--pfile plink_dataset --indep-pairwise 50 10 0.1 --out ld_prune && plink2 --pfile plink_dataset --extract ld_prune.prune.in --pca 20 --out pca_results -o output.txt,perform LD pruning and compute PCA and write output to a file,population-genomics
plink2,plink2_09,--pfile plink_dataset --pheno phenotypes.txt --pheno-name case_control --covar covariates.txt --glm hide-covar --out gwas_results --quiet,run genome-wide association study (GWAS) for binary phenotype in quiet mode,population-genomics
plink2,plink2_10,--pfile plink_dataset --extract ld_prune.prune.in --make-king-table --out kinship_matrix,compute kinship/relatedness matrix with default parameters,population-genomics
porechop,porechop_01,-i reads.fastq.gz -o trimmed_reads.fastq.gz --threads 8,trim adapters from Oxford Nanopore FASTQ reads,qc
porechop,porechop_02,-i reads.fastq.gz -o trimmed_no_chimeras.fastq.gz --discard_middle --threads 8,trim adapters and remove chimeric reads,qc
porechop,porechop_03,-i barcoded_reads.fastq.gz -b demultiplexed_reads/ --threads 8,demultiplex barcoded ONT reads into separate files,qc
porechop,porechop_04,-i reads.fastq.gz -o trimmed.fastq.gz --min_split_read_size 1000 --threads 8,trim adapters and set minimum length output,qc
porechop,porechop_05,-i reads.fastq.gz -o trimmed_reads.fastq.gz --threads 8,trim adapters from Oxford Nanopore FASTQ reads with default parameters,qc
porechop,porechop_06,-i reads.fastq.gz -o trimmed_no_chimeras.fastq.gz --discard_middle --threads 8 --verbose,trim adapters and remove chimeric reads with verbose output,qc
porechop,porechop_07,-i barcoded_reads.fastq.gz -b demultiplexed_reads/ --threads 8,demultiplex barcoded ONT reads into separate files using multiple threads,qc
porechop,porechop_08,-i reads.fastq.gz -o trimmed.fastq.gz --min_split_read_size 1000 --threads 8,trim adapters and set minimum length output and write output to a file,qc
porechop,porechop_09,-i reads.fastq.gz -o trimmed_reads.fastq.gz --threads 8 --quiet,trim adapters from Oxford Nanopore FASTQ reads in quiet mode,qc
porechop,porechop_10,-i reads.fastq.gz -o trimmed_no_chimeras.fastq.gz --discard_middle --threads 8,trim adapters and remove chimeric reads with default parameters,qc
prodigal,prodigal_01,-i genome.fasta -a proteins.faa -d cds.fna -f gff -o gene_predictions.gff,predict genes in a bacterial genome and output protein and GFF files,annotation
prodigal,prodigal_02,-i metagenomic_contigs.fasta -a meta_proteins.faa -d meta_cds.fna -f gff -o meta_genes.gff -p meta,predict genes in metagenomic contigs,annotation
prodigal,prodigal_03,-i mycoplasma_genome.fasta -a mycoplasma_proteins.faa -f gff -o mycoplasma_genes.gff -g 4,predict genes with non-standard genetic code (Mycoplasma),annotation
prodigal,prodigal_04,-i assembly.fasta -a proteins.faa -f gbk -o predictions.gbk,predict genes and output in GenBank format for import into annotation tools,annotation
prodigal,prodigal_05,-i genome.fasta -a proteins.faa -d cds.fna -f gff -o gene_predictions.gff,predict genes in a bacterial genome and output protein and GFF files with default parameters,annotation
prodigal,prodigal_06,-i metagenomic_contigs.fasta -a meta_proteins.faa -d meta_cds.fna -f gff -o meta_genes.gff -p meta --verbose,predict genes in metagenomic contigs with verbose output,annotation
prodigal,prodigal_07,-i mycoplasma_genome.fasta -a mycoplasma_proteins.faa -f gff -o mycoplasma_genes.gff -g 4 -t 4,predict genes with non-standard genetic code (Mycoplasma) using multiple threads,annotation
prodigal,prodigal_08,-i assembly.fasta -a proteins.faa -f gbk -o predictions.gbk,predict genes and output in GenBank format for import into annotation tools and write output to a file,annotation
prodigal,prodigal_09,-i genome.fasta -a proteins.faa -d cds.fna -f gff -o gene_predictions.gff --quiet,predict genes in a bacterial genome and output protein and GFF files in quiet mode,annotation
prodigal,prodigal_10,-i metagenomic_contigs.fasta -a meta_proteins.faa -d meta_cds.fna -f gff -o meta_genes.gff -p meta,predict genes in metagenomic contigs with default parameters,annotation
prokka,prokka_01,--kingdom Bacteria --genus Escherichia --species coli --strain K12 --cpus 8 --outdir prokka_output --prefix ecoli_K12 assembly.fasta,annotate a bacterial genome assembly,metagenomics
prokka,prokka_02,--metagenome --cpus 8 --outdir mag_annotation --prefix bin001 bin001_contigs.fasta,annotate a metagenome-assembled genome (MAG),metagenomics
prokka,prokka_03,--kingdom Archaea --cpus 8 --outdir archaea_output --prefix archaea_sample archaea_assembly.fasta,annotate archaea genome,metagenomics
prokka,prokka_04,--kingdom Bacteria --proteins custom_proteins.faa --cpus 8 --outdir custom_annotation --prefix sample genome.fasta,annotate genome with custom protein database for improved annotation,metagenomics
prokka,prokka_05,--kingdom Bacteria --locustag MYORG --cpus 8 --outdir annotated --prefix genome_v1 assembly.fasta,annotate genome and add specific locus tag prefix,metagenomics
prokka,prokka_06,--kingdom Bacteria --genus Escherichia --species coli --strain K12 --cpus 8 --outdir prokka_output --prefix ecoli_K12 assembly.fasta --verbose,annotate a bacterial genome assembly with verbose output,metagenomics
prokka,prokka_07,--metagenome --cpus 8 --outdir mag_annotation --prefix bin001 bin001_contigs.fasta -t 4,annotate a metagenome-assembled genome (MAG) using multiple threads,metagenomics
prokka,prokka_08,--kingdom Archaea --cpus 8 --outdir archaea_output --prefix archaea_sample archaea_assembly.fasta -o output.txt,annotate archaea genome and write output to a file,metagenomics
prokka,prokka_09,--kingdom Bacteria --proteins custom_proteins.faa --cpus 8 --outdir custom_annotation --prefix sample genome.fasta --quiet,annotate genome with custom protein database for improved annotation in quiet mode,metagenomics
prokka,prokka_10,--kingdom Bacteria --locustag MYORG --cpus 8 --outdir annotated --prefix genome_v1 assembly.fasta,annotate genome and add specific locus tag prefix with default parameters,metagenomics
python,python_01,script.py,run a Python script,programming
python,python_02,"-c ""print('Hello, World!')""",run a Python one-liner,programming
python,python_03,-m http.server 8080,"run a module as a script (e.g., start an HTTP server)",programming
python,python_04,-m venv .venv,create a virtual environment,programming
python,python_05,-m pytest tests/ -v,run a script with an additional module search path,programming
python,python_06,"-c ""import json,sys; data=json.load(sys.stdin); [print(r['name']) for r in data]""",process JSON from stdin with a one-liner,programming
python,python_07,-u pipeline_script.py,run a script with unbuffered output (for pipelines),programming
python,python_08,-m cProfile -s cumtime slow_script.py,profile a script and show cumulative time,programming
python,python_09,--version,check Python version,programming
python,python_10,-W all script.py,run a script with a warning filter to show all deprecation warnings,programming
qualimap,qualimap_01,bamqc -bam sorted.bam --java-mem-size 8G -nt 8 -outdir qualimap_results/,"run BAM QC on a sorted, indexed BAM",qc
qualimap,qualimap_02,rnaseq -bam sorted.bam -gtf annotation.gtf -p strand-specific-reverse --java-mem-size 8G -outdir qualimap_rnaseq/,run RNA-seq QC with strand information,qc
qualimap,qualimap_03,multi-bamqc -d samples.txt --java-mem-size 4G -outdir multiqc_qualimap/,run multi-sample QC and aggregate report,qc
qualimap,qualimap_04,bamqc -bam sorted.bam -gd HUMAN --java-mem-size 8G -nt 8 -outdir qualimap_gc/,run BAM QC with GC bias correction (human),qc
qualimap,qualimap_05,bamqc -bam sorted.bam -outformat PDF:HTML --java-mem-size 4G -outdir qualimap_output/,generate PDF and HTML reports,qc
qualimap,qualimap_06,bamqc -bam wgs.bam -gd HUMAN --java-mem-size 16G -nt 16 --paint-chromosome-limits -outdir wgs_qualimap/,run BAM QC on whole-genome sequencing data,qc
qualimap,qualimap_07,counts -d counts.txt -c 2 -s C -outdir counts_qc/,count QC for differential expression count matrices,qc
qualimap,qualimap_08,bamqc -bam sorted.bam --java-mem-size 8G -nt 8 -outdir qualimap_results/ -o output.txt,"run BAM QC on a sorted, indexed BAM and write output to a file",qc
qualimap,qualimap_09,rnaseq -bam sorted.bam -gtf annotation.gtf -p strand-specific-reverse --java-mem-size 8G -outdir qualimap_rnaseq/ --quiet,run RNA-seq QC with strand information in quiet mode,qc
qualimap,qualimap_10,multi-bamqc -d samples.txt --java-mem-size 4G -outdir multiqc_qualimap/,run multi-sample QC and aggregate report with default parameters,qc
quast,quast_01,-r reference.fasta -g genes.gff assembly.fasta -o quast_output/ --threads 8,assess assembly quality with reference genome,assembly
quast,quast_02,spades_assembly.fasta megahit_assembly.fasta flye_assembly.fasta -o assembly_comparison/ --threads 8,compare multiple assemblies without reference genome,assembly
quast,quast_03,"metaquast.py -r reference1.fasta,reference2.fasta assembly.fasta -o metaquast_output/ --threads 16",assess metagenome assembly quality with metaQUAST,assembly
quast,quast_04,-r reference.fasta assembly.fasta -o quast_out/ --min-contig 1000 --threads 8,assess assembly with minimum contig length filter,assembly
quast,quast_05,-r reference.fasta -g genes.gff assembly.fasta -o quast_output/ --threads 8,assess assembly quality with reference genome with default parameters,assembly
quast,quast_06,spades_assembly.fasta megahit_assembly.fasta flye_assembly.fasta -o assembly_comparison/ --threads 8 --verbose,compare multiple assemblies without reference genome with verbose output,assembly
quast,quast_07,"metaquast.py -r reference1.fasta,reference2.fasta assembly.fasta -o metaquast_output/ --threads 16",assess metagenome assembly quality with metaQUAST using multiple threads,assembly
quast,quast_08,-r reference.fasta assembly.fasta -o quast_out/ --min-contig 1000 --threads 8,assess assembly with minimum contig length filter and write output to a file,assembly
quast,quast_09,-r reference.fasta -g genes.gff assembly.fasta -o quast_output/ --threads 8 --quiet,assess assembly quality with reference genome in quiet mode,assembly
quast,quast_10,spades_assembly.fasta megahit_assembly.fasta flye_assembly.fasta -o assembly_comparison/ --threads 8,compare multiple assemblies without reference genome with default parameters,assembly
r,r_01,Rscript analysis.R,run an R script non-interactively,programming
r,r_02,Rscript analysis.R --input data.csv --output results.csv,run an R script with command-line arguments,programming
r,r_03,"Rscript -e ""cat(paste(1:10, collapse=','), '\n')""",execute a one-liner R expression,programming
r,r_04,"Rscript -e ""install.packages('ggplot2', repos='https://cloud.r-project.org', lib=Sys.getenv('R_LIBS_USER'))""",install a CRAN package into the user library,programming
r,r_05,"Rscript -e ""BiocManager::install(c('DESeq2','edgeR'))""",install Bioconductor packages,programming
r,r_06,"Rscript -e ""packageVersion('DESeq2')""",check installed package version,programming
r,r_07,"Rscript -e ""ip <- installed.packages(lib.loc=.libPaths()[1]); cat(paste(ip[,'Package'],ip[,'Version'],sep='='), sep='\n')""",list user-installed packages and their versions,programming
r,r_08,Rscript --vanilla --quiet analysis.R,run R script suppressing startup messages,programming
r,r_09,"Rscript -e "".libPaths()""",show R library paths,programming
r,r_10,"Rscript -e ""rmarkdown::render('report.Rmd', output_format='html_document')""",render an Rmarkdown document to HTML,programming
racon,racon_01,-t 16 reads.fastq.gz mapping.paf draft_assembly.fasta > polished_round1.fasta,run one round of Racon polishing on an ONT assembly,assembly
racon,racon_02,-t 16 reads.fastq.gz round2_mapping.paf polished_round1.fasta > polished_round2.fasta,run second round of Racon polishing,assembly
racon,racon_03,-t 16 reads.fastq.gz alignment.sam draft_assembly.fasta > polished_assembly.fasta,run Racon polishing using SAM alignment instead of PAF,assembly
racon,racon_04,-t 16 reads.fastq.gz mapping.paf draft_assembly.fasta > polished_round1.fasta,run one round of Racon polishing on an ONT assembly,assembly
racon,racon_05,-t 16 reads.fastq.gz round2_mapping.paf polished_round1.fasta > polished_round2.fasta,run second round of Racon polishing with default parameters,assembly
racon,racon_06,-t 16 reads.fastq.gz alignment.sam draft_assembly.fasta > polished_assembly.fasta,run Racon polishing using SAM alignment instead of PAF,assembly
racon,racon_07,-t 16 reads.fastq.gz mapping.paf draft_assembly.fasta > polished_round1.fasta,run one round of Racon polishing on an ONT assembly,assembly
racon,racon_08,-t 16 reads.fastq.gz round2_mapping.paf polished_round1.fasta > polished_round2.fasta,run second round of Racon polishing,assembly
racon,racon_09,-t 16 reads.fastq.gz alignment.sam draft_assembly.fasta > polished_assembly.fasta,run Racon polishing using SAM alignment instead of PAF,assembly
racon,racon_10,-t 16 reads.fastq.gz mapping.paf draft_assembly.fasta > polished_round1.fasta,run one round of Racon polishing on an ONT assembly with default parameters,assembly
repeatmasker,repeatmasker_01,-species human -xsmall -pa 16 -dir repeatmasker_output/ genome.fasta,softmask repeats in a mammalian genome assembly,annotation
repeatmasker,repeatmasker_02,-species arabidopsis -pa 8 -dir masked_output/ genome.fasta,hard-mask repeats in a plant genome,annotation
repeatmasker,repeatmasker_03,-lib custom_repeats.fasta -xsmall -pa 8 -dir custom_masked/ genome.fasta,mask repeats using a custom library,annotation
repeatmasker,repeatmasker_04,-noint -xsmall -pa 4 -dir simple_masked/ genome.fasta,mask only simple repeats and low-complexity regions,annotation
repeatmasker,repeatmasker_05,-species human -xsmall -pa 16 -dir repeatmasker_output/ genome.fasta,softmask repeats in a mammalian genome assembly with default parameters,annotation
repeatmasker,repeatmasker_06,-species arabidopsis -pa 8 -dir masked_output/ genome.fasta --verbose,hard-mask repeats in a plant genome with verbose output,annotation
repeatmasker,repeatmasker_07,-lib custom_repeats.fasta -xsmall -pa 8 -dir custom_masked/ genome.fasta -t 4,mask repeats using a custom library using multiple threads,annotation
repeatmasker,repeatmasker_08,-noint -xsmall -pa 4 -dir simple_masked/ genome.fasta -o output.txt,mask only simple repeats and low-complexity regions and write output to a file,annotation
repeatmasker,repeatmasker_09,-species human -xsmall -pa 16 -dir repeatmasker_output/ genome.fasta --quiet,softmask repeats in a mammalian genome assembly in quiet mode,annotation
repeatmasker,repeatmasker_10,-species arabidopsis -pa 8 -dir masked_output/ genome.fasta,hard-mask repeats in a plant genome with default parameters,annotation
rm,rm_01,file.txt,remove a single file,filesystem
rm,rm_02,-v *.tmp,remove multiple files matching a pattern,filesystem
rm,rm_03,-r old_results/,remove a directory and all its contents,filesystem
rm,rm_04,-i *.log,"interactively remove files, asking for confirmation before each deletion",filesystem
rm,rm_05,-rf temp_build/,force-remove a directory and its contents without prompts,filesystem
rm,rm_06,-rf /tmp/stale_dir/,force-remove a stale build directory,filesystem
rm,rm_07,-- -weirdfile.txt,remove a file with a name starting with a dash,filesystem
rm,rm_08,-d emptydir/,remove an empty directory,filesystem
rm,rm_09,-v *.bak,verbosely remove all files of a specific type in the current directory,filesystem
rm,rm_10,symlink_name,remove a symbolic link without following it to the target,filesystem
rsem,rsem_01,rsem-prepare-reference --gtf genes.gtf --num-threads 8 genome.fa rsem_index/genome,prepare RSEM reference from genome FASTA and GTF annotation,rna-seq
rsem,rsem_02,rsem-calculate-expression --paired-end --num-threads 8 --strandedness reverse R1.fastq.gz R2.fastq.gz rsem_index/genome sample_output,quantify paired-end RNA-seq reads using RSEM with Bowtie2,rna-seq
rsem,rsem_03,rsem-calculate-expression --paired-end --star --num-threads 8 R1.fastq.gz R2.fastq.gz rsem_index/genome sample_output,quantify RNA-seq using RSEM with STAR aligner,rna-seq
rsem,rsem_04,rsem-prepare-reference --num-threads 4 transcriptome.fa rsem_transcript_index/transcripts,prepare RSEM reference directly from transcriptome FASTA,rna-seq
rsem,rsem_05,rsem-generate-data-matrix sample1.genes.results sample2.genes.results sample3.genes.results > gene_count_matrix.txt,generate count matrix from multiple RSEM results files for DESeq2,rna-seq
rsem,rsem_06,rsem-calculate-expression --num-threads 8 reads.fastq.gz rsem_index/genome sample_output,quantify single-end RNA-seq reads using RSEM,rna-seq
rsem,rsem_07,rsem-prepare-reference --gtf genes.gtf --num-threads 8 --polyA genome.fa rsem_polyA_index/genome,prepare RSEM reference with Bowtie2 and poly-A trimming for scRNA-seq,rna-seq
rsem,rsem_08,rsem-calculate-expression --paired-end --num-threads 8 --estimate-rspd --strandedness none R1.fq.gz R2.fq.gz rsem_index/genome sample,quantify with RSEM and estimate read start position distribution,rna-seq
rsem,rsem_09,rsem-generate-data-matrix sample1.genes.results sample2.genes.results sample3.genes.results > count_matrix.txt,extract TPM column from RSEM gene results for cross-sample comparison,rna-seq
rsem,rsem_10,rsem-calculate-expression --paired-end --num-threads 8 --calc-ci R1.fastq.gz R2.fastq.gz rsem_index/genome sample_ci,calculate expression with confidence intervals for uncertainty estimation,rna-seq
rseqc,rseqc_01,infer_experiment.py -r hg38.bed -i sorted.bam -s 2000000,infer library strandedness from a BAM,qc
rseqc,rseqc_02,read_distribution.py -r hg38.bed -i sorted.bam,get read distribution across genomic features,qc
rseqc,rseqc_03,junction_annotation.py -r hg38.bed -i sorted.bam -o sample_junctions,annotate splice junctions,qc
rseqc,rseqc_04,junction_saturation.py -r hg38.bed -i sorted.bam -o sample_sat,check saturation of junction detection,qc
rseqc,rseqc_05,bam_stat.py -i sorted.bam,compute BAM statistics,qc
rseqc,rseqc_06,tin.py -i sorted.bam -r hg38.bed,measure transcript integrity (RNA quality),qc
rseqc,rseqc_07,inner_distance.py -r hg38.bed -i sorted.bam -o inner_dist,estimate inner distance for paired-end RNA-seq,qc
rseqc,rseqc_08,read_duplication.py -i sorted.bam -o duplication,check for read duplication rate,qc
rseqc,rseqc_09,infer_experiment.py -r hg38.bed -i sorted.bam -s 2000000 --quiet,infer library strandedness from a BAM in quiet mode,qc
rseqc,rseqc_10,read_distribution.py -r hg38.bed -i sorted.bam,get read distribution across genomic features with default parameters,qc
rsync,rsync_01,-avz /local/data/ user@remote:/remote/data/,sync a local directory to a remote server with verbose output and compression,networking
rsync,rsync_02,-avzn /source/ /dest/,dry-run to preview what would be transferred,networking
rsync,rsync_03,-avz --delete /source/ /dest/,"mirror source to destination, deleting removed files",networking
rsync,rsync_04,-avz user@remote:/remote/data/ /local/backup/,sync from remote server to local directory,networking
rsync,rsync_05,-avzP user@remote:/path/large-file.tar.gz /local/,resume a large interrupted transfer,networking
rsync,rsync_06,-avz --exclude='.git' --exclude='*.pyc' --exclude='__pycache__' /src/ user@remote:/dest/,sync excluding specific directories and patterns,networking
rsync,rsync_07,-avz -e 'ssh -p 2222' /local/data/ user@remote:/data/,sync using a non-standard SSH port,networking
rsync,rsync_08,-avz --info=progress2 /source/ /dest/,show total transfer progress instead of per-file progress,networking
rsync,rsync_09,-avzH /source/ /dest/,copy files preserving hard links,networking
rsync,rsync_10,-avz --update /source/ /dest/,sync only files newer than a reference file,networking
salmon,salmon_01,index -t transcriptome.fa -i salmon_index --threads 8,build a Salmon transcriptome index,rna-seq
salmon,salmon_02,quant -i salmon_index -l A -1 R1.fastq.gz -2 R2.fastq.gz -p 8 --gcBias --validateMappings -o sample_quant,quantify paired-end RNA-seq reads with automatic library type detection,rna-seq
salmon,salmon_03,quant -i salmon_index -l A -r reads.fastq.gz -p 8 --gcBias -o sample_quant,quantify single-end RNA-seq reads,rna-seq
salmon,salmon_04,index -t gentrome.fa -d decoys.txt -i salmon_index_decoy --threads 8,build decoy-aware salmon index for more accurate quantification,rna-seq
salmon,salmon_05,quant -i salmon_index -l ISR -1 R1.fastq.gz -2 R2.fastq.gz -p 8 --gcBias --seqBias --validateMappings -o sample_quant,quantify bulk RNA-seq with strand-specific reverse library,rna-seq
salmon,salmon_06,index -t transcriptome.fa -i salmon_index --threads 8 --verbose,build a Salmon transcriptome index with verbose output,rna-seq
salmon,salmon_07,quant -i salmon_index -l A -1 R1.fastq.gz -2 R2.fastq.gz -p 8 --gcBias --validateMappings -o sample_quant,quantify paired-end RNA-seq reads with automatic library type detection using multiple threads,rna-seq
salmon,salmon_08,quant -i salmon_index -l A -r reads.fastq.gz -p 8 --gcBias -o sample_quant,quantify single-end RNA-seq reads and write output to a file,rna-seq
salmon,salmon_09,index -t gentrome.fa -d decoys.txt -i salmon_index_decoy --threads 8 --quiet,build decoy-aware salmon index for more accurate quantification in quiet mode,rna-seq
salmon,salmon_10,quant -i salmon_index -l ISR -1 R1.fastq.gz -2 R2.fastq.gz -p 8 --gcBias --seqBias --validateMappings -o sample_quant,quantify bulk RNA-seq with strand-specific reverse library with default parameters,rna-seq
samtools,samtools_01,sort -@ 4 -o sorted.bam input.bam,sort a BAM file by genomic coordinates,alignment
samtools,samtools_02,index sorted.bam,create an index for a sorted BAM file,alignment
samtools,samtools_03,view -b -f 2 -F 256 -F 2048 -o proper_paired.bam input.bam,filter to keep only properly paired primary alignments,alignment
samtools,samtools_04,flagstat input.bam,"get alignment statistics (mapped, unmapped, duplicates)",alignment
samtools,samtools_05,fastq -@ 4 -1 R1.fastq.gz -2 R2.fastq.gz -0 /dev/null -s /dev/null -n input.bam,convert BAM to FASTQ for paired-end reads,alignment
samtools,samtools_06,view -b -o region.bam input.bam chr1:100000-200000,extract reads mapping to chromosome 1 between 100000 and 200000,alignment
samtools,samtools_07,markdup -@ 4 -f stats.txt input_namesorted.bam output_markdup.bam,mark PCR duplicates,alignment
samtools,samtools_08,merge -@ 4 -f merged.bam sample1.bam sample2.bam sample3.bam,merge multiple BAM files into one,alignment
samtools,samtools_09,depth -a -o coverage.txt input.bam,compute per-base depth of coverage,alignment
samtools,samtools_10,view -H input.bam,view the BAM header,alignment
sed,sed_01,-i 's/foo/bar/g' file.txt,replace all occurrences of a word in a file in-place,text-processing
sed,sed_02,-i.bak 's/old_host/new_host/g' config.conf,replace text in-place and keep a backup,text-processing
sed,sed_03,'/^$/d' file.txt,delete all blank lines from a file,text-processing
sed,sed_04,-i '/^#/d' script.sh,delete lines containing a pattern,text-processing
sed,sed_05,-n '/error/p' app.log,print only lines matching a pattern (like grep),text-processing
sed,sed_06,-E 's/([0-9]{4})-([0-9]{2})-([0-9]{2})/\3\/\2\/\1/' dates.txt,extract and reformat date using capture groups,text-processing
sed,sed_07,'s/^/PREFIX: /' input.txt,add a prefix to every line in a file,text-processing
sed,sed_08,-E 's/[[:space:]]+$//' file.txt,remove trailing whitespace from all lines,text-processing
sed,sed_09,'10s/old/new/' file.txt,replace only on a specific line number,text-processing
sed,sed_10,'/^\[section\]/a new_key=value' config.ini,insert a line after a matching pattern,text-processing
seqkit,seqkit_01,stats -a reads.fastq.gz,"get basic statistics of a FASTQ file (read count, total bases, average length)",sequence-utilities
seqkit,seqkit_02,seq -m 100 -j 4 -o filtered.fastq.gz input.fastq.gz,filter reads shorter than 100 bp and write to a new file,sequence-utilities
seqkit,seqkit_03,seq -r -p -j 4 input.fa -o revcomp.fa,get the reverse complement of all sequences in a FASTA file,sequence-utilities
seqkit,seqkit_04,grep -f id_list.txt input.fa -o subset.fa,extract sequences by name from a list file,sequence-utilities
seqkit,seqkit_05,sample -n 10000 -s 42 -j 4 -o sample.fastq.gz input.fastq.gz,randomly sample 10000 reads from a large FASTQ file,sequence-utilities
seqkit,seqkit_06,fq2fa -j 4 input.fastq.gz -o output.fa.gz,convert FASTQ to FASTA,sequence-utilities
seqkit,seqkit_07,split2 -s 1000 -j 4 -O split_output input.fa,split FASTA file into chunks of 1000 sequences each,sequence-utilities
seqkit,seqkit_08,stats -a reads.fastq.gz -o output.txt,"get basic statistics of a FASTQ file (read count, total bases, average length) and write output to a file",sequence-utilities
seqkit,seqkit_09,seq -m 100 -j 4 -o filtered.fastq.gz input.fastq.gz --quiet,filter reads shorter than 100 bp and write to a new file in quiet mode,sequence-utilities
seqkit,seqkit_10,seq -r -p -j 4 input.fa -o revcomp.fa,get the reverse complement of all sequences in a FASTA file with default parameters,sequence-utilities
seqtk,seqtk_01,sample -s 42 R1.fastq.gz 1000000 | gzip > sub_R1.fastq.gz,subsample 1 million read pairs from paired-end FASTQ files,sequence-utilities
seqtk,seqtk_02,seq -a reads.fastq.gz > reads.fasta,convert FASTQ to FASTA format,sequence-utilities
seqtk,seqtk_03,seq -r sequences.fasta > revcomp.fasta,reverse complement all sequences in a FASTA file,sequence-utilities
seqtk,seqtk_04,subseq reads.fastq.gz read_names.txt > extracted_reads.fastq,extract specific sequences by name from a FASTQ file,sequence-utilities
seqtk,seqtk_05,trimfq -q 0.05 reads.fastq.gz | gzip > trimmed.fastq.gz,quality trim reads below Phred 20 from both ends,sequence-utilities
seqtk,seqtk_06,sample -s 100 reads.fastq.gz 0.1 > subsampled_10pct.fastq,subsample 10% of reads with reproducible seed,sequence-utilities
seqtk,seqtk_07,sample -s 42 R1.fastq.gz 1000000 | gzip > sub_R1.fastq.gz,subsample 1 million read pairs from paired-end FASTQ files,sequence-utilities
seqtk,seqtk_08,seq -a reads.fastq.gz > reads.fasta,convert FASTQ to FASTA format,sequence-utilities
seqtk,seqtk_09,seq -r sequences.fasta > revcomp.fasta,reverse complement all sequences in a FASTA file,sequence-utilities
seqtk,seqtk_10,subseq reads.fastq.gz read_names.txt > extracted_reads.fastq,extract specific sequences by name from a FASTQ file with default parameters,sequence-utilities
shapeit4,shapeit4_01,--input variants.vcf.gz --map genetic_map_chr1.txt --region chr1 --output phased_chr1.vcf.gz --thread 8,phase a chromosome using SHAPEIT4,population-genomics
shapeit4,shapeit4_02,--input variants.vcf.gz --scaffold reference_panel.vcf.gz --map genetic_map_chr22.txt --region chr22 --output phased_chr22.vcf.gz --thread 8,phase with a reference haplotype scaffold panel,population-genomics
shapeit4,shapeit4_03,"--input variants.vcf.gz --map genetic_map.txt --region chr1 --sequencing --output phased.vcf.gz --thread 16 --mcmc-iterations 8b,1p,1b,1p,1b,1p,5m",phase sequencing data with higher accuracy settings,population-genomics
shapeit4,shapeit4_04,--input variants.vcf.gz --map genetic_map_chr1.txt --region chr1 --output phased_chr1.vcf.gz --thread 8 --quiet,phase a chromosome using SHAPEIT4 in quiet mode,population-genomics
shapeit4,shapeit4_05,--input variants.vcf.gz --scaffold reference_panel.vcf.gz --map genetic_map_chr22.txt --region chr22 --output phased_chr22.vcf.gz --thread 8,phase with a reference haplotype scaffold panel with default parameters,population-genomics
shapeit4,shapeit4_06,"--input variants.vcf.gz --map genetic_map.txt --region chr1 --sequencing --output phased.vcf.gz --thread 16 --mcmc-iterations 8b,1p,1b,1p,1b,1p,5m --verbose",phase sequencing data with higher accuracy settings with verbose output,population-genomics
shapeit4,shapeit4_07,--input variants.vcf.gz --map genetic_map_chr1.txt --region chr1 --output phased_chr1.vcf.gz --thread 8 -t 4,phase a chromosome using SHAPEIT4 using multiple threads,population-genomics
shapeit4,shapeit4_08,--input variants.vcf.gz --scaffold reference_panel.vcf.gz --map genetic_map_chr22.txt --region chr22 --output phased_chr22.vcf.gz --thread 8,phase with a reference haplotype scaffold panel and write output to a file,population-genomics
shapeit4,shapeit4_09,"--input variants.vcf.gz --map genetic_map.txt --region chr1 --sequencing --output phased.vcf.gz --thread 16 --mcmc-iterations 8b,1p,1b,1p,1b,1p,5m --quiet",phase sequencing data with higher accuracy settings in quiet mode,population-genomics
shapeit4,shapeit4_10,--input variants.vcf.gz --map genetic_map_chr1.txt --region chr1 --output phased_chr1.vcf.gz --thread 8,phase a chromosome using SHAPEIT4 with default parameters,population-genomics
snakemake,snakemake_01,--cores all --use-conda,run a workflow using all available cores,workflow-manager
snakemake,snakemake_02,--dry-run --printshellcmds,dry-run to see what would be executed,workflow-manager
snakemake,snakemake_03,--executor slurm --jobs 50 --default-resources mem_mb=4096 runtime=60 --use-conda,run a workflow on a Slurm cluster,workflow-manager
snakemake,snakemake_04,--configfile config/config.yaml --cores 8,run with a configuration file,workflow-manager
snakemake,snakemake_05,--profile slurm,use a named profile for cluster execution,workflow-manager
snakemake,snakemake_06,--forcerun trimming alignment --cores 16,force re-run of specific rules,workflow-manager
snakemake,snakemake_07,--unlock,unlock a workflow after a crash,workflow-manager
snakemake,snakemake_08,--dag | dot -Tpng > dag.png,generate a rule dependency graph (DAG),workflow-manager
snakemake,snakemake_09,--rerun-incomplete --cores all,clean up incomplete output files and restart,workflow-manager
snakemake,snakemake_10,--use-singularity --singularity-args '--bind /scratch' --cores 8,run with Singularity containers,workflow-manager
sniffles,sniffles_01,--input sorted.bam --vcf output_svs.vcf --threads 8,call SVs from a single Oxford Nanopore BAM file,variant-calling
sniffles,sniffles_02,--input sorted.bam --vcf output_svs.vcf --minsupport 5 --minsvlen 50 --threads 8,call SVs with minimum read support of 5 and minimum SV length of 50 bp,variant-calling
sniffles,sniffles_03,--input sample1.bam --snf sample1.snf --vcf sample1.vcf --threads 8,generate SNF file for multi-sample population SV calling,variant-calling
sniffles,sniffles_04,--input sample1.snf sample2.snf sample3.snf --vcf population_svs.vcf --threads 8,combine multiple SNF files for population-level SV calling,variant-calling
sniffles,sniffles_05,--input tumor.bam --vcf mosaic_svs.vcf --mosaic --threads 8,call mosaic or somatic SVs with low frequency support,variant-calling
sniffles,sniffles_06,--input sorted.bam --vcf output_svs.vcf --threads 8 --verbose,call SVs from a single Oxford Nanopore BAM file with verbose output,variant-calling
sniffles,sniffles_07,--input sorted.bam --vcf output_svs.vcf --minsupport 5 --minsvlen 50 --threads 8,call SVs with minimum read support of 5 and minimum SV length of 50 bp using multiple threads,variant-calling
sniffles,sniffles_08,--input sample1.bam --snf sample1.snf --vcf sample1.vcf --threads 8 -o output.txt,generate SNF file for multi-sample population SV calling and write output to a file,variant-calling
sniffles,sniffles_09,--input sample1.snf sample2.snf sample3.snf --vcf population_svs.vcf --threads 8 --quiet,combine multiple SNF files for population-level SV calling in quiet mode,variant-calling
sniffles,sniffles_10,--input tumor.bam --vcf mosaic_svs.vcf --mosaic --threads 8,call mosaic or somatic SVs with low frequency support with default parameters,variant-calling
snpeff,snpeff_01,ann -v GRCh38.105 variants.vcf > annotated.vcf,annotate variants in a VCF file using the GRCh38 human genome database,variant-annotation
snpeff,snpeff_02,ann -v -stats snpeff_summary.html GRCh38.105 variants.vcf > annotated.vcf,annotate variants and generate an HTML statistics report,variant-annotation
snpeff,snpeff_03,ann -v hg19 variants.vcf > annotated_hg19.vcf,annotate variants from hg19/GRCh37 genome,variant-annotation
snpeff,snpeff_04,ann -v -no-downstream -no-upstream -no-intron -no-intergenic GRCh38.105 variants.vcf > coding_annotated.vcf,annotate variants and filter by quality for clinical reporting,variant-annotation
snpeff,snpeff_05,build -gff3 -v MyGenome,build a custom SnpEff genome database from GFF3 annotation,variant-annotation
snpeff,snpeff_06,ann -v GRCh38.105 variants.vcf > annotated.vcf,annotate variants in a VCF file using the GRCh38 human genome database,variant-annotation
snpeff,snpeff_07,ann -v -stats snpeff_summary.html GRCh38.105 variants.vcf > annotated.vcf,annotate variants and generate an HTML statistics report,variant-annotation
snpeff,snpeff_08,ann -v hg19 variants.vcf > annotated_hg19.vcf,annotate variants from hg19/GRCh37 genome,variant-annotation
snpeff,snpeff_09,ann -v -no-downstream -no-upstream -no-intron -no-intergenic GRCh38.105 variants.vcf > coding_annotated.vcf,annotate variants and filter by quality for clinical reporting,variant-annotation
snpeff,snpeff_10,build -gff3 -v MyGenome,build a custom SnpEff genome database from GFF3 annotation with default parameters,variant-annotation
sourmash,sourmash_01,"sketch dna -p k=31,scaled=1000 genome.fasta -o genome.sig",sketch a genome FASTA file at default parameters,sequence-utilities
sourmash,sourmash_02,"sketch dna -p k=31,scaled=1000 *.fasta --output-dir sigs/",sketch multiple genome files and store in one database,sequence-utilities
sourmash,sourmash_03,compare sigs/*.sig --csv similarity_matrix.csv -k 31,compare all signatures in a directory and output similarity matrix,sequence-utilities
sourmash,sourmash_04,gather sample.sig gtdb_rs207.k31.zip -k 31 --threshold-bp 50000 -o gather_results.csv,decompose a metagenome sample against a reference database,sequence-utilities
sourmash,sourmash_05,taxonomy annotate -g gather_results.csv -t gtdb-rs207.taxonomy.csv -o annotated_results.csv,add taxonomy to gather results,sequence-utilities
sourmash,sourmash_06,search query.sig refdb.zip -k 31 --threshold 0.1 -n 20 -o search_results.csv,search a signature against a database for top hits,sequence-utilities
sourmash,sourmash_07,index refdb.zip sigs/*.sig -k 31,build an indexed database from many signature files for fast search,sequence-utilities
sourmash,sourmash_08,"sketch dna -p k=31,scaled=1000 genome.fasta -o genome.sig",sketch a genome FASTA file at default parameters and write output to a file,sequence-utilities
sourmash,sourmash_09,"sketch dna -p k=31,scaled=1000 *.fasta --output-dir sigs/ --quiet",sketch multiple genome files and store in one database in quiet mode,sequence-utilities
sourmash,sourmash_10,compare sigs/*.sig --csv similarity_matrix.csv -k 31,compare all signatures in a directory and output similarity matrix with default parameters,sequence-utilities
spades,spades_01,-1 R1.fastq.gz -2 R2.fastq.gz -o spades_output/ --threads 16 --memory 32 --careful,assemble a bacterial genome from paired-end reads,assembly
spades,spades_02,--meta -1 R1.fastq.gz -2 R2.fastq.gz -o metaspades_output/ --threads 32 --memory 128,assemble a metagenome from paired-end reads,assembly
spades,spades_03,--plasmid -1 R1.fastq.gz -2 R2.fastq.gz -o plasmidspades_output/ --threads 8 --memory 16,assemble plasmids from paired-end reads,assembly
spades,spades_04,--sc -1 R1.fastq.gz -2 R2.fastq.gz -o sc_spades_output/ --threads 8 --memory 32,assemble single-cell MDA amplified data,assembly
spades,spades_05,-o spades_output/ --continue,resume interrupted SPAdes assembly,assembly
spades,spades_06,-1 short_R1.fastq.gz -2 short_R2.fastq.gz --nanopore long_reads.fastq.gz -o hybrid_output/ --threads 16 --memory 64,assemble with both paired-end and long reads (hybrid assembly),assembly
spades,spades_07,-1 R1.fastq.gz -2 R2.fastq.gz -o spades_output/ --threads 16 --memory 32 --careful,assemble a bacterial genome from paired-end reads using multiple threads,assembly
spades,spades_08,--meta -1 R1.fastq.gz -2 R2.fastq.gz -o metaspades_output/ --threads 32 --memory 128,assemble a metagenome from paired-end reads and write output to a file,assembly
spades,spades_09,--plasmid -1 R1.fastq.gz -2 R2.fastq.gz -o plasmidspades_output/ --threads 8 --memory 16 --quiet,assemble plasmids from paired-end reads in quiet mode,assembly
spades,spades_10,--sc -1 R1.fastq.gz -2 R2.fastq.gz -o sc_spades_output/ --threads 8 --memory 32,assemble single-cell MDA amplified data with default parameters,assembly
sra-tools,sra-tools_01,fasterq-dump SRR123456 -O output_directory/ -e 8,download and convert an SRA accession to FASTQ,utilities
sra-tools,sra-tools_02,prefetch SRR123456 -O sra_downloads/,download SRA file first then convert (more reliable),utilities
sra-tools,sra-tools_03,prefetch --option-file accession_list.txt -O sra_downloads/,download multiple SRA accessions in batch,utilities
sra-tools,sra-tools_04,fasterq-dump SRR123456 -O output/ -e 8 && gzip output/SRR123456_1.fastq output/SRR123456_2.fastq,convert SRA to compressed FASTQ,utilities
sra-tools,sra-tools_05,vdb-validate SRR123456.sra,validate an SRA file integrity,utilities
sra-tools,sra-tools_06,sra-stat --quick --xml SRR123456,get statistics for an SRA run without downloading reads,utilities
sra-tools,sra-tools_07,prefetch ERR123456 -O sra_downloads/,download an ENA/EBI accession using prefetch,utilities
sra-tools,sra-tools_08,fasterq-dump SRR123456 --stdout -e 4 | head -40,list all reads in an SRA file,utilities
sra-tools,sra-tools_09,fasterq-dump SRR123456 --check-space,check available disk space before a large download,utilities
sra-tools,sra-tools_10,fasterq-dump SRR123456 -O output/ -e 8 --skip-technical,download a single-end SRA accession and skip technical reads,utilities
ssh,ssh_01,user@hostname,connect to a remote server as a specific user,networking
ssh,ssh_02,-i ~/.ssh/id_ed25519 user@hostname,connect using a specific private key file,networking
ssh,ssh_03,-p 2222 user@hostname,connect on a non-standard port,networking
ssh,ssh_04,-L 8080:localhost:80 user@hostname,forward a local port to a remote service (local port forwarding),networking
ssh,ssh_05,user@hostname 'df -h && free -h',run a command on a remote host without an interactive shell,networking
ssh,ssh_06,-X user@hostname,enable X11 forwarding to run graphical applications remotely,networking
ssh,ssh_07,-R 9090:localhost:3000 user@hostname,connect and set up reverse port forwarding (expose local service to remote),networking
ssh,ssh_08,-D 1080 -N user@hostname,create a SOCKS5 proxy tunnel through the remote host,networking
ssh,ssh_09,-o ServerAliveInterval=60 -o ServerAliveCountMax=3 user@hostname,keep connection alive and reconnect automatically,networking
ssh,ssh_10,-J bastion_user@bastion_host target_user@target_host,use jump host (bastion) to reach a machine not directly accessible,networking
star,star_01,--runMode genomeGenerate --genomeDir /path/to/star_index --genomeFastaFiles genome.fa --sjdbGTFfile genes.gtf --runThreadN 8,build genome index for STAR alignment,alignment
star,star_02,--runMode alignReads --genomeDir /path/to/star_index --readFilesIn R1.fastq.gz R2.fastq.gz --readFilesCommand zcat --runThreadN 8 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample1/ --outSAMattributes NH HI AS NM,align paired-end RNA-seq gzipped FASTQ files to the genome,alignment
star,star_03,--runMode alignReads --genomeDir /star_index --readFilesIn reads.fastq.gz --readFilesCommand zcat --twopassMode Basic --runThreadN 8 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample/,align single-end RNA-seq reads with two-pass mode for better junction detection,alignment
star,star_04,--runMode alignReads --genomeDir /star_index --readFilesIn R1.fq.gz R2.fq.gz --readFilesCommand zcat --runThreadN 8 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample/ --outReadsUnmapped Fastx,align reads and output unmapped reads to a FASTQ file,alignment
star,star_05,--runMode genomeGenerate --genomeDir /path/to/star_index --genomeFastaFiles genome.fa --sjdbGTFfile genes.gtf --runThreadN 8,build genome index for STAR alignment with default parameters,alignment
star,star_06,--runMode alignReads --genomeDir /path/to/star_index --readFilesIn R1.fastq.gz R2.fastq.gz --readFilesCommand zcat --runThreadN 8 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample1/ --outSAMattributes NH HI AS NM --verbose,align paired-end RNA-seq gzipped FASTQ files to the genome with verbose output,alignment
star,star_07,--runMode alignReads --genomeDir /star_index --readFilesIn reads.fastq.gz --readFilesCommand zcat --twopassMode Basic --runThreadN 8 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample/ -t 4,align single-end RNA-seq reads with two-pass mode for better junction detection using multiple threads,alignment
star,star_08,--runMode alignReads --genomeDir /star_index --readFilesIn R1.fq.gz R2.fq.gz --readFilesCommand zcat --runThreadN 8 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample/ --outReadsUnmapped Fastx -o output.txt,align reads and output unmapped reads to a FASTQ file and write output to a file,alignment
star,star_09,--runMode genomeGenerate --genomeDir /path/to/star_index --genomeFastaFiles genome.fa --sjdbGTFfile genes.gtf --runThreadN 8 --quiet,build genome index for STAR alignment in quiet mode,alignment
star,star_10,--runMode alignReads --genomeDir /path/to/star_index --readFilesIn R1.fastq.gz R2.fastq.gz --readFilesCommand zcat --runThreadN 8 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample1/ --outSAMattributes NH HI AS NM,align paired-end RNA-seq gzipped FASTQ files to the genome with default parameters,alignment
starsolo,starsolo_01,--soloType CB_UMI_Simple --soloCBwhitelist 3M-february-2018.txt --soloCBstart 1 --soloCBlen 16 --soloUMIstart 17 --soloUMIlen 12 --genomeDir /path/to/star_genome/ --readFilesIn R2.fastq.gz R1.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --outSAMattributes NH HI nM AS CR UR CB UB GX GN sS sQ sM --runThreadN 16 --outFileNamePrefix sample_starsolo/,process 10x Chromium v3 scRNA-seq with STARsolo,single-cell
starsolo,starsolo_02,--soloType CB_UMI_Simple --soloCBwhitelist 737K-august-2016.txt --soloCBlen 16 --soloUMIlen 10 --genomeDir /star_genome/ --readFilesIn R2.fastq.gz R1.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --runThreadN 16 --outFileNamePrefix sample_v2/,process 10x Chromium v2 scRNA-seq with STARsolo,single-cell
starsolo,starsolo_03,--soloType CB_UMI_Simple --soloCBwhitelist 3M-february-2018.txt --soloCBlen 16 --soloUMIlen 12 --soloFeatures Gene Velocyto --genomeDir /star_genome/ --readFilesIn R2.fastq.gz R1.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --runThreadN 16 --outFileNamePrefix velocity_sample/,run STARsolo with RNA velocity output,single-cell
starsolo,starsolo_04,--soloType CB_UMI_Simple --soloCBwhitelist 3M-february-2018.txt --soloCBstart 1 --soloCBlen 16 --soloUMIstart 17 --soloUMIlen 12 --genomeDir /path/to/star_genome/ --readFilesIn R2.fastq.gz R1.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --outSAMattributes NH HI nM AS CR UR CB UB GX GN sS sQ sM --runThreadN 16 --outFileNamePrefix sample_starsolo/ --quiet,process 10x Chromium v3 scRNA-seq with STARsolo in quiet mode,single-cell
starsolo,starsolo_05,--soloType CB_UMI_Simple --soloCBwhitelist 737K-august-2016.txt --soloCBlen 16 --soloUMIlen 10 --genomeDir /star_genome/ --readFilesIn R2.fastq.gz R1.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --runThreadN 16 --outFileNamePrefix sample_v2/,process 10x Chromium v2 scRNA-seq with STARsolo with default parameters,single-cell
starsolo,starsolo_06,--soloType CB_UMI_Simple --soloCBwhitelist 3M-february-2018.txt --soloCBlen 16 --soloUMIlen 12 --soloFeatures Gene Velocyto --genomeDir /star_genome/ --readFilesIn R2.fastq.gz R1.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --runThreadN 16 --outFileNamePrefix velocity_sample/ --verbose,run STARsolo with RNA velocity output with verbose output,single-cell
starsolo,starsolo_07,--soloType CB_UMI_Simple --soloCBwhitelist 3M-february-2018.txt --soloCBstart 1 --soloCBlen 16 --soloUMIstart 17 --soloUMIlen 12 --genomeDir /path/to/star_genome/ --readFilesIn R2.fastq.gz R1.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --outSAMattributes NH HI nM AS CR UR CB UB GX GN sS sQ sM --runThreadN 16 --outFileNamePrefix sample_starsolo/ -t 4,process 10x Chromium v3 scRNA-seq with STARsolo using multiple threads,single-cell
starsolo,starsolo_08,--soloType CB_UMI_Simple --soloCBwhitelist 737K-august-2016.txt --soloCBlen 16 --soloUMIlen 10 --genomeDir /star_genome/ --readFilesIn R2.fastq.gz R1.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --runThreadN 16 --outFileNamePrefix sample_v2/ -o output.txt,process 10x Chromium v2 scRNA-seq with STARsolo and write output to a file,single-cell
starsolo,starsolo_09,--soloType CB_UMI_Simple --soloCBwhitelist 3M-february-2018.txt --soloCBlen 16 --soloUMIlen 12 --soloFeatures Gene Velocyto --genomeDir /star_genome/ --readFilesIn R2.fastq.gz R1.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --runThreadN 16 --outFileNamePrefix velocity_sample/ --quiet,run STARsolo with RNA velocity output in quiet mode,single-cell
starsolo,starsolo_10,--soloType CB_UMI_Simple --soloCBwhitelist 3M-february-2018.txt --soloCBstart 1 --soloCBlen 16 --soloUMIstart 17 --soloUMIlen 12 --genomeDir /path/to/star_genome/ --readFilesIn R2.fastq.gz R1.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --outSAMattributes NH HI nM AS CR UR CB UB GX GN sS sQ sM --runThreadN 16 --outFileNamePrefix sample_starsolo/,process 10x Chromium v3 scRNA-seq with STARsolo with default parameters,single-cell
strelka2,strelka2_01,configureStrelkaGermlineWorkflow.py --bam sorted.bam --referenceFasta reference.fa --runDir strelka_germline && python strelka_germline/runWorkflow.py -m local -j 8,configure and run Strelka2 germline variant calling,variant-calling
strelka2,strelka2_02,configureStrelkaSomaticWorkflow.py --normalBam normal.bam --tumourBam tumor.bam --referenceFasta reference.fa --runDir strelka_somatic && python strelka_somatic/runWorkflow.py -m local -j 8,configure and run Strelka2 somatic variant calling for tumor-normal pair,variant-calling
strelka2,strelka2_03,configureStrelkaGermlineWorkflow.py --bam sorted.bam --referenceFasta reference.fa --exome --callRegions targets.bed.gz --runDir strelka_wes && python strelka_wes/runWorkflow.py -m local -j 8,run Strelka2 germline on WES data with target regions,variant-calling
strelka2,strelka2_04,configureStrelkaSomaticWorkflow.py --normalBam normal.bam --tumourBam tumor.bam --referenceFasta reference.fa --indelCandidates manta_results/results/variants/candidateSmallIndels.vcf.gz --runDir strelka_with_manta && python strelka_with_manta/runWorkflow.py -m local -j 8,run Strelka2 somatic with Manta indel candidates for improved accuracy,variant-calling
strelka2,strelka2_05,configureStrelkaGermlineWorkflow.py --bam sorted.bam --referenceFasta reference.fa --runDir strelka_germline && python strelka_germline/runWorkflow.py -m local -j 8,configure and run Strelka2 germline variant calling with default parameters,variant-calling
strelka2,strelka2_06,configureStrelkaSomaticWorkflow.py --normalBam normal.bam --tumourBam tumor.bam --referenceFasta reference.fa --runDir strelka_somatic && python strelka_somatic/runWorkflow.py -m local -j 8 --verbose,configure and run Strelka2 somatic variant calling for tumor-normal pair with verbose output,variant-calling
strelka2,strelka2_07,configureStrelkaGermlineWorkflow.py --bam sorted.bam --referenceFasta reference.fa --exome --callRegions targets.bed.gz --runDir strelka_wes && python strelka_wes/runWorkflow.py -m local -j 8,run Strelka2 germline on WES data with target regions using multiple threads,variant-calling
strelka2,strelka2_08,configureStrelkaSomaticWorkflow.py --normalBam normal.bam --tumourBam tumor.bam --referenceFasta reference.fa --indelCandidates manta_results/results/variants/candidateSmallIndels.vcf.gz --runDir strelka_with_manta && python strelka_with_manta/runWorkflow.py -m local -j 8 -o output.txt,run Strelka2 somatic with Manta indel candidates for improved accuracy and write output to a file,variant-calling
strelka2,strelka2_09,configureStrelkaGermlineWorkflow.py --bam sorted.bam --referenceFasta reference.fa --runDir strelka_germline && python strelka_germline/runWorkflow.py -m local -j 8 --quiet,configure and run Strelka2 germline variant calling in quiet mode,variant-calling
strelka2,strelka2_10,configureStrelkaSomaticWorkflow.py --normalBam normal.bam --tumourBam tumor.bam --referenceFasta reference.fa --runDir strelka_somatic && python strelka_somatic/runWorkflow.py -m local -j 8,configure and run Strelka2 somatic variant calling for tumor-normal pair with default parameters,variant-calling
stringtie,stringtie_01,-G genes.gtf -o sample1.gtf -p 8 --rf sample1_sorted.bam,assemble transcripts from HISAT2-aligned RNA-seq BAM with reference annotation,rna-seq
stringtie,stringtie_02,--merge -G genes.gtf -o merged.gtf sample1.gtf sample2.gtf sample3.gtf,merge per-sample StringTie GTFs into unified transcript catalog,rna-seq
stringtie,stringtie_03,-e -B -G merged.gtf -o sample1_re/sample1.gtf -p 8 --rf sample1_sorted.bam,re-quantify known and assembled transcripts using merged annotation (for count extraction),rna-seq
stringtie,stringtie_04,-o novel_transcripts.gtf -p 8 --rf sample1_sorted.bam,assemble and quantify without reference annotation (novel transcript discovery),rna-seq
stringtie,stringtie_05,-i sample_list.txt -g gene_count_matrix.csv -e transcript_count_matrix.csv,extract count matrix from StringTie -e output for DESeq2 with prepDE.py3,rna-seq
stringtie,stringtie_06,-G genes.gtf -o sample1.gtf -p 8 --rf sample1_sorted.bam --verbose,assemble transcripts from HISAT2-aligned RNA-seq BAM with reference annotation with verbose output,rna-seq
stringtie,stringtie_07,--merge -G genes.gtf -o merged.gtf sample1.gtf sample2.gtf sample3.gtf -t 4,merge per-sample StringTie GTFs into unified transcript catalog using multiple threads,rna-seq
stringtie,stringtie_08,-e -B -G merged.gtf -o sample1_re/sample1.gtf -p 8 --rf sample1_sorted.bam,re-quantify known and assembled transcripts using merged annotation (for count extraction) and write output to a file,rna-seq
stringtie,stringtie_09,-o novel_transcripts.gtf -p 8 --rf sample1_sorted.bam --quiet,assemble and quantify without reference annotation (novel transcript discovery) in quiet mode,rna-seq
stringtie,stringtie_10,-i sample_list.txt -g gene_count_matrix.csv -e transcript_count_matrix.csv,extract count matrix from StringTie -e output for DESeq2 with prepDE.py3 with default parameters,rna-seq
survivor,survivor_01,merge vcf_list.txt 500 2 1 1 0 50 merged_svs.vcf,merge SV VCFs from multiple callers requiring support from at least 2 callers,variant-calling
survivor,survivor_02,merge sample_vcfs.txt 1000 1 1 1 0 50 cohort_svs.vcf,merge SV calls from a single caller across multiple samples,variant-calling
survivor,survivor_03,stats -i calls.vcf -o sv_stats.txt,get summary statistics for SVs in a VCF,variant-calling
survivor,survivor_04,filter -i calls.vcf -o filtered.vcf -s 50 -e 100000 -f 0,filter SVs to a high-confidence set by size and minimum quality,variant-calling
survivor,survivor_05,simSV reference.fasta parameter_file.txt 0 0 simulated,simulate structural variants on a reference genome for benchmarking,variant-calling
survivor,survivor_06,ls sniffles.vcf pbsv.vcf cutesv.vcf > vcf_list.txt && merge vcf_list.txt 500 2 1 1 0 50 consensus_svs.vcf,create a VCF list file and merge three caller outputs,variant-calling
survivor,survivor_07,merge vcf_list.txt 500 2 1 1 0 50 merged_svs.vcf -t 4,merge SV VCFs from multiple callers requiring support from at least 2 callers using multiple threads,variant-calling
survivor,survivor_08,merge sample_vcfs.txt 1000 1 1 1 0 50 cohort_svs.vcf -o output.txt,merge SV calls from a single caller across multiple samples and write output to a file,variant-calling
survivor,survivor_09,stats -i calls.vcf -o sv_stats.txt --quiet,get summary statistics for SVs in a VCF in quiet mode,variant-calling
survivor,survivor_10,filter -i calls.vcf -o filtered.vcf -s 50 -e 100000 -f 0,filter SVs to a high-confidence set by size and minimum quality with default parameters,variant-calling
tabix,tabix_01,variants.vcf.gz,compress a VCF file with bgzip and create tabix index,utilities
tabix,tabix_02,-p vcf variants.vcf.gz,create tabix index for a bgzipped VCF file,utilities
tabix,tabix_03,-h variants.vcf.gz chr1:1000000-2000000 > chr1_region.vcf,query a specific genomic region from an indexed VCF,utilities
tabix,tabix_04,-p bed regions.bed.gz,create tabix index for a bgzipped BED file,utilities
tabix,tabix_05,-l variants.vcf.gz,list all chromosomes/contigs in an indexed VCF,utilities
tabix,tabix_06,-C variants.vcf.gz,create a CSI index for large genomes with contigs >512 Mb,utilities
tabix,tabix_07,-h variants.vcf.gz chr1:1000000-2000000 chr2:500000-1000000 > multi_region.vcf,query multiple regions at once from an indexed VCF,utilities
tabix,tabix_08,-p gff annotation.gff3.gz,index a bgzipped GFF3 annotation file,utilities
tabix,tabix_09,-h https://example.com/variants.vcf.gz chr1:1000-2000 > remote_region.vcf,fetch a remote indexed VCF region without downloading the whole file,utilities
tabix,tabix_10,-s 1 -b 2 -e 3 custom_format.bed.gz,reindex a tabix file using a custom sequence dictionary order,utilities
tar,tar_01,-czf archive.tar.gz data/,create a gzip-compressed archive of a directory,filesystem
tar,tar_02,-xzf archive.tar.gz,extract a gzip archive into the current directory,filesystem
tar,tar_03,-xf archive.tar.gz -C /opt/myapp/,extract an archive into a specific directory,filesystem
tar,tar_04,-tf archive.tar.gz,list contents of an archive without extracting,filesystem
tar,tar_05,-cjvf backup.tar.bz2 /home/user/documents/,create a verbose bzip2-compressed archive,filesystem
tar,tar_06,-xzf project-1.0.tar.gz --strip-components=1 -C /opt/project/,extract and strip the top-level directory from the archive,filesystem
tar,tar_07,-czf backup.tar.gz project/ --exclude='*.pyc' --exclude='.git',create an archive excluding certain file patterns,filesystem
tar,tar_08,-rf existing.tar newfile.txt,add files to an existing uncompressed archive,filesystem
tar,tar_09,-cJf archive.tar.xz largedir/,create a highly compressed archive using xz,filesystem
tar,tar_10,-xzf archive.tar.gz path/inside/archive/file.txt,extract a single file from an archive,filesystem
trim_galore,trim_galore_01,--paired --quality 20 --length 36 --cores 4 --gzip -o trimmed_output/ R1.fastq.gz R2.fastq.gz,trim adapters and quality-filter paired-end Illumina reads,qc
trim_galore,trim_galore_02,--paired --rrbs --quality 20 --length 20 --cores 4 --gzip -o rrbs_trimmed/ R1.fastq.gz R2.fastq.gz,trim RRBS bisulfite sequencing data,qc
trim_galore,trim_galore_03,--quality 20 --length 36 --cores 4 --gzip -o se_trimmed/ reads.fastq.gz,trim single-end reads with automatic adapter detection,qc
trim_galore,trim_galore_04,--paired --adapter AGATCGGAAGAGCACACGTCT --adapter2 AGATCGGAAGAGCGTCGTGTA --quality 20 --cores 4 --gzip -o custom_trimmed/ R1.fastq.gz R2.fastq.gz,trim with specific adapter sequence for non-standard libraries,qc
trim_galore,trim_galore_05,--paired --quality 20 --length 36 --cores 4 --gzip -o trimmed_output/ R1.fastq.gz R2.fastq.gz,trim adapters and quality-filter paired-end Illumina reads with default parameters,qc
trim_galore,trim_galore_06,--paired --rrbs --quality 20 --length 20 --cores 4 --gzip -o rrbs_trimmed/ R1.fastq.gz R2.fastq.gz --verbose,trim RRBS bisulfite sequencing data with verbose output,qc
trim_galore,trim_galore_07,--quality 20 --length 36 --cores 4 --gzip -o se_trimmed/ reads.fastq.gz -t 4,trim single-end reads with automatic adapter detection using multiple threads,qc
trim_galore,trim_galore_08,--paired --adapter AGATCGGAAGAGCACACGTCT --adapter2 AGATCGGAAGAGCGTCGTGTA --quality 20 --cores 4 --gzip -o custom_trimmed/ R1.fastq.gz R2.fastq.gz,trim with specific adapter sequence for non-standard libraries and write output to a file,qc
trim_galore,trim_galore_09,--paired --quality 20 --length 36 --cores 4 --gzip -o trimmed_output/ R1.fastq.gz R2.fastq.gz --quiet,trim adapters and quality-filter paired-end Illumina reads in quiet mode,qc
trim_galore,trim_galore_10,--paired --rrbs --quality 20 --length 20 --cores 4 --gzip -o rrbs_trimmed/ R1.fastq.gz R2.fastq.gz,trim RRBS bisulfite sequencing data with default parameters,qc
trimmomatic,trimmomatic_01,PE -threads 8 -phred33 R1.fastq.gz R2.fastq.gz R1_paired.fastq.gz R1_unpaired.fastq.gz R2_paired.fastq.gz R2_unpaired.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36,trim adapters and quality-filter paired-end Illumina reads,qc
trimmomatic,trimmomatic_02,SE -threads 4 -phred33 reads.fastq.gz trimmed_reads.fastq.gz ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36,trim single-end reads with quality filtering,qc
trimmomatic,trimmomatic_03,PE -threads 8 -phred33 R1.fastq.gz R2.fastq.gz R1_paired.fastq.gz R1_unpaired.fastq.gz R2_paired.fastq.gz R2_unpaired.fastq.gz ILLUMINACLIP:NexteraPE-PE.fa:2:30:10:8:true LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36,trim Nextera adapters from paired-end reads,qc
trimmomatic,trimmomatic_04,PE -threads 8 -phred33 R1.fastq.gz R2.fastq.gz R1_paired.fastq.gz R1_unpaired.fastq.gz R2_paired.fastq.gz R2_unpaired.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:5 TRAILING:5 SLIDINGWINDOW:4:20 MINLEN:50,aggressive quality trimming for low-quality paired-end data,qc
trimmomatic,trimmomatic_05,PE -threads 8 -phred33 R1.fastq.gz R2.fastq.gz R1_paired.fastq.gz R1_unpaired.fastq.gz R2_paired.fastq.gz R2_unpaired.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36,trim adapters and quality-filter paired-end Illumina reads with default parameters,qc
trimmomatic,trimmomatic_06,SE -threads 4 -phred33 reads.fastq.gz trimmed_reads.fastq.gz ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 --verbose,trim single-end reads with quality filtering with verbose output,qc
trimmomatic,trimmomatic_07,PE -threads 8 -phred33 R1.fastq.gz R2.fastq.gz R1_paired.fastq.gz R1_unpaired.fastq.gz R2_paired.fastq.gz R2_unpaired.fastq.gz ILLUMINACLIP:NexteraPE-PE.fa:2:30:10:8:true LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 -t 4,trim Nextera adapters from paired-end reads using multiple threads,qc
trimmomatic,trimmomatic_08,PE -threads 8 -phred33 R1.fastq.gz R2.fastq.gz R1_paired.fastq.gz R1_unpaired.fastq.gz R2_paired.fastq.gz R2_unpaired.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:5 TRAILING:5 SLIDINGWINDOW:4:20 MINLEN:50 -o output.txt,aggressive quality trimming for low-quality paired-end data and write output to a file,qc
trimmomatic,trimmomatic_09,PE -threads 8 -phred33 R1.fastq.gz R2.fastq.gz R1_paired.fastq.gz R1_unpaired.fastq.gz R2_paired.fastq.gz R2_unpaired.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 --quiet,trim adapters and quality-filter paired-end Illumina reads in quiet mode,qc
trimmomatic,trimmomatic_10,SE -threads 4 -phred33 reads.fastq.gz trimmed_reads.fastq.gz ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36,trim single-end reads with quality filtering with default parameters,qc
trinity,trinity_01,--seqType fq --left R1.fastq.gz --right R2.fastq.gz --max_memory 50G --CPU 16 --output trinity_output/,de novo transcriptome assembly from paired-end RNA-seq reads,rna-seq
trinity,trinity_02,--genome_guided_bam star_aligned.bam --genome_guided_max_intron 10000 --max_memory 50G --CPU 16 --output genome_guided_trinity/,genome-guided Trinity assembly using STAR alignments,rna-seq
trinity,trinity_03,--seqType fq --single reads.fastq.gz --max_memory 32G --CPU 8 --output trinity_se/,de novo assembly from single-end RNA-seq reads,rna-seq
trinity,trinity_04,--seqType fq --left R1.fastq.gz --right R2.fastq.gz --SS_lib_type RF --max_memory 50G --CPU 16 --output stranded_trinity/,Trinity assembly with strand-specific library,rna-seq
trinity,trinity_05,--seqType fq --left R1.fastq.gz --right R2.fastq.gz --max_memory 50G --CPU 16 --output trinity_output/,de novo transcriptome assembly from paired-end RNA-seq reads with default parameters,rna-seq
trinity,trinity_06,--genome_guided_bam star_aligned.bam --genome_guided_max_intron 10000 --max_memory 50G --CPU 16 --output genome_guided_trinity/ --verbose,genome-guided Trinity assembly using STAR alignments with verbose output,rna-seq
trinity,trinity_07,--seqType fq --single reads.fastq.gz --max_memory 32G --CPU 8 --output trinity_se/ -t 4,de novo assembly from single-end RNA-seq reads using multiple threads,rna-seq
trinity,trinity_08,--seqType fq --left R1.fastq.gz --right R2.fastq.gz --SS_lib_type RF --max_memory 50G --CPU 16 --output stranded_trinity/,Trinity assembly with strand-specific library and write output to a file,rna-seq
trinity,trinity_09,--seqType fq --left R1.fastq.gz --right R2.fastq.gz --max_memory 50G --CPU 16 --output trinity_output/ --quiet,de novo transcriptome assembly from paired-end RNA-seq reads in quiet mode,rna-seq
trinity,trinity_10,--genome_guided_bam star_aligned.bam --genome_guided_max_intron 10000 --max_memory 50G --CPU 16 --output genome_guided_trinity/,genome-guided Trinity assembly using STAR alignments with default parameters,rna-seq
truvari,truvari_01,bench -b truth.vcf.gz -c calls.vcf.gz -f reference.fasta -o bench_output --passonly --sizemin 50,benchmark a structural variant caller VCF against a truth set,utilities
truvari,truvari_02,bench -b truth.vcf.gz -c calls.vcf.gz -f reference.fasta -o bench_output --refdist 1000 --pctsize 0.7 --passonly,benchmark with relaxed position tolerance for long-read SV calls,utilities
truvari,truvari_03,collapse -i calls.vcf.gz -o collapsed.vcf --passonly --sizemin 50 --refdist 500,collapse redundant SV calls within a single caller VCF,utilities
truvari,truvari_04,collapse -i multi_caller.vcf.gz -o merged.vcf --chain --keep common,merge SV calls from multiple callers into a consensus VCF,utilities
truvari,truvari_05,refine --reference reference.fasta --regions bench_output/candidate.refine.bed bench_output/,run truvari refine to improve benchmarking accuracy with sequence realignment,utilities
truvari,truvari_06,bench -b truth.vcf.gz -c calls.vcf.gz -f reference.fasta -o bench_output --sizemin 50 --sizemax 10000 --passonly,filter SV VCF to a specific size range before benchmarking,utilities
truvari,truvari_07,bench -b truth.vcf.gz -c calls.vcf.gz -f reference.fasta -o bench_output --passonly --sizemin 50 -t 4,benchmark a structural variant caller VCF against a truth set using multiple threads,utilities
truvari,truvari_08,bench -b truth.vcf.gz -c calls.vcf.gz -f reference.fasta -o bench_output --refdist 1000 --pctsize 0.7 --passonly,benchmark with relaxed position tolerance for long-read SV calls and write output to a file,utilities
truvari,truvari_09,collapse -i calls.vcf.gz -o collapsed.vcf --passonly --sizemin 50 --refdist 500 --quiet,collapse redundant SV calls within a single caller VCF in quiet mode,utilities
truvari,truvari_10,collapse -i multi_caller.vcf.gz -o merged.vcf --chain --keep common,merge SV calls from multiple callers into a consensus VCF with default parameters,utilities
varscan2,varscan2_01,mpileup2snp --min-coverage 8 --min-reads2 2 --min-avg-qual 15 --min-var-freq 0.01 --p-value 0.99 --output-vcf 1 > snps.vcf,call germline SNPs from a tumor or normal sample,variant-calling
varscan2,varscan2_02,somatic normal_pileup.pileup tumor_pileup.pileup --output-snp somatic.snp.vcf --output-indel somatic.indel.vcf --output-vcf 1 --min-coverage 8 --min-var-freq 0.1 --somatic-p-value 0.05,call somatic variants from tumor-normal pair,variant-calling
varscan2,varscan2_03,processSomatic somatic.snp.vcf --min-tumor-freq 0.1 --max-normal-freq 0.05 --p-value 0.05,filter somatic variants for high-confidence calls,variant-calling
varscan2,varscan2_04,mpileup2snp --min-coverage 8 --min-reads2 2 --min-avg-qual 15 --min-var-freq 0.01 --p-value 0.99 --output-vcf 1 > snps.vcf,call germline SNPs from a tumor or normal sample,variant-calling
varscan2,varscan2_05,somatic normal_pileup.pileup tumor_pileup.pileup --output-snp somatic.snp.vcf --output-indel somatic.indel.vcf --output-vcf 1 --min-coverage 8 --min-var-freq 0.1 --somatic-p-value 0.05,call somatic variants from tumor-normal pair with default parameters,variant-calling
varscan2,varscan2_06,processSomatic somatic.snp.vcf --min-tumor-freq 0.1 --max-normal-freq 0.05 --p-value 0.05 --verbose,filter somatic variants for high-confidence calls with verbose output,variant-calling
varscan2,varscan2_07,mpileup2snp --min-coverage 8 --min-reads2 2 --min-avg-qual 15 --min-var-freq 0.01 --p-value 0.99 --output-vcf 1 > snps.vcf,call germline SNPs from a tumor or normal sample,variant-calling
varscan2,varscan2_08,somatic normal_pileup.pileup tumor_pileup.pileup --output-snp somatic.snp.vcf --output-indel somatic.indel.vcf --output-vcf 1 --min-coverage 8 --min-var-freq 0.1 --somatic-p-value 0.05,call somatic variants from tumor-normal pair and write output to a file,variant-calling
varscan2,varscan2_09,processSomatic somatic.snp.vcf --min-tumor-freq 0.1 --max-normal-freq 0.05 --p-value 0.05 --quiet,filter somatic variants for high-confidence calls in quiet mode,variant-calling
varscan2,varscan2_10,mpileup2snp --min-coverage 8 --min-reads2 2 --min-avg-qual 15 --min-var-freq 0.01 --p-value 0.99 --output-vcf 1 > snps.vcf,call germline SNPs from a tumor or normal sample with default parameters,variant-calling
vcfanno,vcfanno_01,-p 8 config.toml input.vcf.gz > annotated.vcf,annotate a VCF with gnomAD allele frequencies,variant-annotation
vcfanno,vcfanno_02,-p 16 clinvar_bed_config.toml input.vcf.gz | bgzip > annotated.vcf.gz,annotate variants with ClinVar pathogenicity and a custom BED file,variant-annotation
vcfanno,vcfanno_03,-p 8 regions_config.toml input.vcf.gz > flagged.vcf,add a flag for variants overlapping a BED region of interest,variant-annotation
vcfanno,vcfanno_04,-p 8 bam_config.toml input.vcf.gz > coverage_annotated.vcf,compute mean coverage at each variant position from a BAM file,variant-annotation
vcfanno,vcfanno_05,-p 8 -lua filters.lua combined_config.toml input.vcf.gz > filtered_annotated.vcf,use a Lua postannotation to combine scores into a final filter,variant-annotation
vcfanno,vcfanno_06,-p 8 cosmic_config.toml input.vcf.gz | bcftools view -f PASS > cosmic_annotated.vcf,annotate indels with COSMIC and output only annotated variants,variant-annotation
vcfanno,vcfanno_07,-p 8 config.toml input.vcf.gz > annotated.vcf,annotate a VCF with gnomAD allele frequencies,variant-annotation
vcfanno,vcfanno_08,-p 16 clinvar_bed_config.toml input.vcf.gz | bgzip > annotated.vcf.gz,annotate variants with ClinVar pathogenicity and a custom BED file,variant-annotation
vcfanno,vcfanno_09,-p 8 regions_config.toml input.vcf.gz > flagged.vcf,add a flag for variants overlapping a BED region of interest,variant-annotation
vcfanno,vcfanno_10,-p 8 bam_config.toml input.vcf.gz > coverage_annotated.vcf,compute mean coverage at each variant position from a BAM file with default parameters,variant-annotation
vcftools,vcftools_01,--vcf variants.vcf --maf 0.05 --max-missing 0.9 --recode --recode-INFO-all --out filtered_variants,filter VCF by minor allele frequency and missingness,variant-calling
vcftools,vcftools_02,--vcf variants.vcf --site-pi --TajimaD 10000 --out popgen_stats,calculate per-site nucleotide diversity and Tajima's D statistics,variant-calling
vcftools,vcftools_03,--vcf variants.vcf --remove-indels --min-alleles 2 --max-alleles 2 --minDP 10 --recode --recode-INFO-all --out snps_only,filter VCF to biallelic SNPs with minimum depth,variant-calling
vcftools,vcftools_04,--vcf variants.vcf --weir-fst-pop pop1_samples.txt --weir-fst-pop pop2_samples.txt --fst-window-size 50000 --out fst_results,compute pairwise FST between two populations,variant-calling
vcftools,vcftools_05,--vcf variants.vcf --plink --out plink_dataset,convert VCF to PLINK format for downstream analysis,variant-calling
vcftools,vcftools_06,--vcf variants.vcf --maf 0.05 --max-missing 0.9 --recode --recode-INFO-all --out filtered_variants --verbose,filter VCF by minor allele frequency and missingness with verbose output,variant-calling
vcftools,vcftools_07,--vcf variants.vcf --site-pi --TajimaD 10000 --out popgen_stats -t 4,calculate per-site nucleotide diversity and Tajima's D statistics using multiple threads,variant-calling
vcftools,vcftools_08,--vcf variants.vcf --remove-indels --min-alleles 2 --max-alleles 2 --minDP 10 --recode --recode-INFO-all --out snps_only -o output.txt,filter VCF to biallelic SNPs with minimum depth and write output to a file,variant-calling
vcftools,vcftools_09,--vcf variants.vcf --weir-fst-pop pop1_samples.txt --weir-fst-pop pop2_samples.txt --fst-window-size 50000 --out fst_results --quiet,compute pairwise FST between two populations in quiet mode,variant-calling
vcftools,vcftools_10,--vcf variants.vcf --plink --out plink_dataset,convert VCF to PLINK format for downstream analysis with default parameters,variant-calling
vep,vep_01,--input_file variants.vcf --output_file annotated.vcf --vcf --cache --dir_cache /path/to/cache/ --assembly GRCh38 --fork 8 --offline,annotate VCF variants with VEP using offline cache,variant-annotation
vep,vep_02,--input_file variants.vcf --output_file annotated.vcf --vcf --cache --dir_cache /path/to/cache/ --assembly GRCh38 --everything --fork 8 --offline,annotate with all standard functional predictions,variant-annotation
vep,vep_03,--input_file variants.vcf --output_file annotated.vcf --vcf --cache --dir_cache /path/to/cache/ --assembly GRCh38 --pick --fork 8 --offline,annotate and pick single most severe consequence per variant,variant-annotation
vep,vep_04,--input_file variants.vcf --output_file annotated.vcf --vcf --cache --dir_cache /path/to/cache/ --assembly GRCh38 --af_gnomad --fork 8 --offline,annotate with gnomAD population frequencies,variant-annotation
vep,vep_05,--input_file variants.vcf --output_file annotated.vcf --vcf --cache --dir_cache /path/to/cache/ --assembly GRCh38 --fork 8 --offline,annotate VCF variants with VEP using offline cache with default parameters,variant-annotation
vep,vep_06,--input_file variants.vcf --output_file annotated.vcf --vcf --cache --dir_cache /path/to/cache/ --assembly GRCh38 --everything --fork 8 --offline --verbose,annotate with all standard functional predictions with verbose output,variant-annotation
vep,vep_07,--input_file variants.vcf --output_file annotated.vcf --vcf --cache --dir_cache /path/to/cache/ --assembly GRCh38 --pick --fork 8 --offline -t 4,annotate and pick single most severe consequence per variant using multiple threads,variant-annotation
vep,vep_08,--input_file variants.vcf --output_file annotated.vcf --vcf --cache --dir_cache /path/to/cache/ --assembly GRCh38 --af_gnomad --fork 8 --offline,annotate with gnomAD population frequencies and write output to a file,variant-annotation
vep,vep_09,--input_file variants.vcf --output_file annotated.vcf --vcf --cache --dir_cache /path/to/cache/ --assembly GRCh38 --fork 8 --offline --quiet,annotate VCF variants with VEP using offline cache in quiet mode,variant-annotation
vep,vep_10,--input_file variants.vcf --output_file annotated.vcf --vcf --cache --dir_cache /path/to/cache/ --assembly GRCh38 --everything --fork 8 --offline,annotate with all standard functional predictions with default parameters,variant-annotation
verkko,verkko_01,--hifi hifi_reads.fastq.gz -d assembly_out --threads 64,assemble a genome using only HiFi reads,assembly
verkko,verkko_02,--hifi hifi_reads.fastq.gz --ont ont_reads.fastq.gz -d hybrid_assembly --threads 64,assemble a genome with both HiFi and ONT reads for maximum continuity,assembly
verkko,verkko_03,--hifi hifi_reads.fastq.gz --ont ont_reads.fastq.gz --hap-kmers maternal.meryl paternal.meryl -d trio_assembly --threads 64,perform haplotype-resolved assembly with trio binning,assembly
verkko,verkko_04,"--hifi hifi_reads.fastq.gz --ont ont_reads.fastq.gz -d assembly_out --threads 4 --snakeopts ""--cluster 'sbatch -c {threads} --mem {resources.mem_gb}G' --jobs 50""",run Verkko on a cluster using Slurm via Snakemake,assembly
verkko,verkko_05,--hifi hifi_reads.fastq.gz --ont ont_reads.fastq.gz -d assembly_out --threads 64 --resume,resume an interrupted Verkko assembly,assembly
verkko,verkko_06,--ont ont_reads.fastq.gz -d ont_assembly --threads 64,assemble with ONT reads only (no HiFi),assembly
verkko,verkko_07,--hifi hifi_reads.fastq.gz -d assembly_out --threads 64,assemble a genome using only HiFi reads using multiple threads,assembly
verkko,verkko_08,--hifi hifi_reads.fastq.gz --ont ont_reads.fastq.gz -d hybrid_assembly --threads 64 -o output.txt,assemble a genome with both HiFi and ONT reads for maximum continuity and write output to a file,assembly
verkko,verkko_09,--hifi hifi_reads.fastq.gz --ont ont_reads.fastq.gz --hap-kmers maternal.meryl paternal.meryl -d trio_assembly --threads 64 --quiet,perform haplotype-resolved assembly with trio binning in quiet mode,assembly
verkko,verkko_10,"--hifi hifi_reads.fastq.gz --ont ont_reads.fastq.gz -d assembly_out --threads 4 --snakeopts ""--cluster 'sbatch -c {threads} --mem {resources.mem_gb}G' --jobs 50""",run Verkko on a cluster using Slurm via Snakemake with default parameters,assembly
wget,wget_01,https://example.com/files/data.tar.gz,download a file and save with its remote filename,networking
wget,wget_02,-O /data/myfile.csv https://example.com/export/data.csv,download a file with a custom local filename,networking
wget,wget_03,-c https://example.com/large-file.iso,resume an interrupted download,networking
wget,wget_04,-b -q https://example.com/large-dataset.tar.gz,download in the background with logging,networking
wget,wget_05,--tries=5 --timeout=30 --wait=2 https://example.com/file.tar.gz,download with retry and timeout settings,networking
wget,wget_06,-r -l 2 -np -P ./mirror https://example.com/docs/,mirror a website section without going to parent directories,networking
wget,wget_07,-r -l 3 -np -A '.pdf' https://example.com/papers/,download only PDF files recursively from a site,networking
wget,wget_08,-i urls.txt -P downloads/,download a list of URLs from a file,networking
wget,wget_09,--post-data='query=search+term' -O result.html https://example.com/search,send a POST request and download the response,networking
wget,wget_10,--user-agent='Mozilla/5.0' -O page.html https://example.com/page,download with a custom User-Agent header,networking
whatshap,whatshap_01,phase --output phased.vcf.gz --reference reference.fa variants.vcf.gz long_reads.bam,phase variants using long reads (ONT/PacBio),variant-calling
whatshap,whatshap_02,phase --output phased.vcf.gz --reference reference.fa variants.vcf.gz illumina.bam,phase variants using Illumina short reads,variant-calling
whatshap,whatshap_03,haplotag --output haplotagged.bam --reference reference.fa phased.vcf.gz sorted.bam,tag reads with haplotype information after phasing,variant-calling
whatshap,whatshap_04,stats phased.vcf.gz,compute phasing statistics from a phased VCF,variant-calling
whatshap,whatshap_05,phase --output phased.vcf.gz --reference reference.fa variants.vcf.gz long_reads.bam,phase variants using long reads (ONT/PacBio) with default parameters,variant-calling
whatshap,whatshap_06,phase --output phased.vcf.gz --reference reference.fa variants.vcf.gz illumina.bam --verbose,phase variants using Illumina short reads with verbose output,variant-calling
whatshap,whatshap_07,haplotag --output haplotagged.bam --reference reference.fa phased.vcf.gz sorted.bam -t 4,tag reads with haplotype information after phasing using multiple threads,variant-calling
whatshap,whatshap_08,stats phased.vcf.gz -o output.txt,compute phasing statistics from a phased VCF and write output to a file,variant-calling
whatshap,whatshap_09,phase --output phased.vcf.gz --reference reference.fa variants.vcf.gz long_reads.bam --quiet,phase variants using long reads (ONT/PacBio) in quiet mode,variant-calling
whatshap,whatshap_10,phase --output phased.vcf.gz --reference reference.fa variants.vcf.gz illumina.bam,phase variants using Illumina short reads with default parameters,variant-calling
wtdbg2,wtdbg2_01,-x ont -g 5m -i reads.fastq.gz -fo assembly -t 16 && wtpoa-cns -t 16 -i assembly.ctg.lay.gz -fo assembly.ctg.fa,assemble genome from Oxford Nanopore reads,assembly
wtdbg2,wtdbg2_02,-x ccs -g 3g -i hifi_reads.fastq.gz -fo hifi_assembly -t 32 && wtpoa-cns -t 32 -i hifi_assembly.ctg.lay.gz -fo hifi_assembly.ctg.fa,assemble genome from PacBio HiFi reads,assembly
wtdbg2,wtdbg2_03,-x rs -g 4m -i clr_reads.fastq.gz -fo clr_assembly -t 8 && wtpoa-cns -t 8 -i clr_assembly.ctg.lay.gz -fo clr_assembly.ctg.fa,assemble bacterial genome from PacBio CLR reads,assembly
wtdbg2,wtdbg2_04,-x ont -g 5m -i reads.fastq.gz -fo assembly -t 16 && wtpoa-cns -t 16 -i assembly.ctg.lay.gz -fo assembly.ctg.fa --quiet,assemble genome from Oxford Nanopore reads in quiet mode,assembly
wtdbg2,wtdbg2_05,-x ccs -g 3g -i hifi_reads.fastq.gz -fo hifi_assembly -t 32 && wtpoa-cns -t 32 -i hifi_assembly.ctg.lay.gz -fo hifi_assembly.ctg.fa,assemble genome from PacBio HiFi reads with default parameters,assembly
wtdbg2,wtdbg2_06,-x rs -g 4m -i clr_reads.fastq.gz -fo clr_assembly -t 8 && wtpoa-cns -t 8 -i clr_assembly.ctg.lay.gz -fo clr_assembly.ctg.fa --verbose,assemble bacterial genome from PacBio CLR reads with verbose output,assembly
wtdbg2,wtdbg2_07,-x ont -g 5m -i reads.fastq.gz -fo assembly -t 16 && wtpoa-cns -t 16 -i assembly.ctg.lay.gz -fo assembly.ctg.fa,assemble genome from Oxford Nanopore reads using multiple threads,assembly
wtdbg2,wtdbg2_08,-x ccs -g 3g -i hifi_reads.fastq.gz -fo hifi_assembly -t 32 && wtpoa-cns -t 32 -i hifi_assembly.ctg.lay.gz -fo hifi_assembly.ctg.fa -o output.txt,assemble genome from PacBio HiFi reads and write output to a file,assembly
wtdbg2,wtdbg2_09,-x rs -g 4m -i clr_reads.fastq.gz -fo clr_assembly -t 8 && wtpoa-cns -t 8 -i clr_assembly.ctg.lay.gz -fo clr_assembly.ctg.fa --quiet,assemble bacterial genome from PacBio CLR reads in quiet mode,assembly
wtdbg2,wtdbg2_10,-x ont -g 5m -i reads.fastq.gz -fo assembly -t 16 && wtpoa-cns -t 16 -i assembly.ctg.lay.gz -fo assembly.ctg.fa,assemble genome from Oxford Nanopore reads with default parameters,assembly