oxo-call 0.11.0

Model-intelligent orchestration for CLI bioinformatics — call any tool with LLM intelligence
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
tool,scenario_id,reference_args,task_description,category
admixture,admixture_01,data.bed 5 --cv=10 -j8,run ADMIXTURE for K=5 ancestral populations,population-genomics
admixture,admixture_02,data.bed 3 --seed=42 --cv=10 -j8,run ADMIXTURE with reproducible seed,population-genomics
admixture,admixture_03,data.bed 3 --supervised -j8,run supervised ADMIXTURE with known reference populations,population-genomics
admixture,admixture_04,data.bed K --cv=10 -j8 > admixture_K.log,run ADMIXTURE across multiple K values (shell loop),population-genomics
admixture,admixture_05,data.bed 5 -B100 -j8,run ADMIXTURE with 100 bootstrap replicates for standard errors,population-genomics
admixture,admixture_06,data.bed 5 -P -j8,run projection analysis onto a fixed P-matrix,population-genomics
admixture,admixture_07,data.bed 4 --seed=1 --cv=10 -j8 > run1.log,run multiple replicates for K=4 with different seeds to check convergence,population-genomics
admixture,admixture_08,data.bed 6 --cv=10 -j8 | tee admixture_K6.log,compare cross-validation errors across K values,population-genomics
admixture,admixture_09,data.bed 5 --maf=0.05 --cv=10 -j8,filter for minor allele frequency before running ADMIXTURE,population-genomics
admixture,admixture_10,data.bed 5 --em --cv=10 -j8,run ADMIXTURE with accelerated EM for faster convergence,population-genomics
agat,agat_01,agat_convert_sp_gff2gtf.pl --gff annotation.gff3 -o annotation.gtf,convert GFF3 to GTF format,annotation
agat,agat_02,agat_sp_statistics.pl --gff annotation.gff3 -o statistics_report.txt,get annotation statistics from a GFF3 file,annotation
agat,agat_03,agat_sp_filter_gene_by_length.pl --gff annotation.gff3 --size 300 -o filtered_annotation.gff3,filter genes by minimum length,annotation
agat,agat_04,agat_convert_sp_gxf2gxf.pl -g malformed.gff3 -o fixed.gff3,fix and standardize a malformed GFF3 file,annotation
agat,agat_05,agat_convert_sp_gff2gtf.pl --gff annotation.gff3 -o annotation.gtf,convert GFF3 to GTF format with default parameters,annotation
agat,agat_06,agat_sp_statistics.pl --gff annotation.gff3 -o statistics_report.txt --verbose,get annotation statistics from a GFF3 file with verbose output,annotation
agat,agat_07,agat_sp_filter_gene_by_length.pl --gff annotation.gff3 --size 300 -o filtered_annotation.gff3 -t 4,filter genes by minimum length using multiple threads,annotation
agat,agat_08,agat_convert_sp_gxf2gxf.pl -g malformed.gff3 -o fixed.gff3,fix and standardize a malformed GFF3 file and write output to a file,annotation
agat,agat_09,agat_convert_sp_gff2gtf.pl --gff annotation.gff3 -o annotation.gtf --quiet,convert GFF3 to GTF format in quiet mode,annotation
agat,agat_10,agat_sp_statistics.pl --gff annotation.gff3 -o statistics_report.txt,get annotation statistics from a GFF3 file with default parameters,annotation
angsd,angsd_01,-bam bam_list.txt -GL 1 -doGlf 2 -doMaf 1 -SNP_pval 1e-6 -minMapQ 30 -minQ 20 -nThreads 16 -out output,compute genotype likelihoods and allele frequencies for a set of BAMs,population-genomics
angsd,angsd_02,-bam pop1_bams.txt -GL 1 -doSaf 1 -anc ancestral.fasta -minMapQ 30 -minQ 20 -nThreads 16 -out pop1,compute per-site allele frequency spectrum for a single population,population-genomics
angsd,angsd_03,realSFS pop1.saf.idx -P 16 > pop1.sfs,estimate 1D site frequency spectrum from doSaf output,population-genomics
angsd,angsd_04,-bam pop1_bams.txt -GL 1 -doSaf 1 -doThetas 1 -pest pop1.sfs -anc ancestral.fasta -minMapQ 30 -minQ 20 -out pop1_thetas,estimate Watterson's theta and Tajima's D in sliding windows,population-genomics
angsd,angsd_05,realSFS pop1.saf.idx pop2.saf.idx -P 16 > pop1_pop2.2dsfs && realSFS fst index pop1.saf.idx pop2.saf.idx -sfs pop1_pop2.2dsfs -fstout pop1_pop2,compute Fst between two populations using 2D SFS,population-genomics
angsd,angsd_06,-bam bam_list.txt -GL 1 -doGlf 2 -doMaf 1 -SNP_pval 1e-6 -minMapQ 30 -minQ 20 -nInd 50 -minInd 40 -nThreads 16 -out snps_for_pca,call SNPs and compute principal component analysis input,population-genomics
angsd,angsd_07,-bam bam_list.txt -GL 1 -doGlf 2 -doMaf 1 -SNP_pval 1e-6 -minMapQ 30 -minQ 20 -nThreads 16 -out output -t 4,compute genotype likelihoods and allele frequencies for a set of BAMs using multiple threads,population-genomics
angsd,angsd_08,-bam pop1_bams.txt -GL 1 -doSaf 1 -anc ancestral.fasta -minMapQ 30 -minQ 20 -nThreads 16 -out pop1 -o output.txt,compute per-site allele frequency spectrum for a single population and write output to a file,population-genomics
angsd,angsd_09,realSFS pop1.saf.idx -P 16 > pop1.sfs,estimate 1D site frequency spectrum from doSaf output,population-genomics
angsd,angsd_10,-bam pop1_bams.txt -GL 1 -doSaf 1 -doThetas 1 -pest pop1.sfs -anc ancestral.fasta -minMapQ 30 -minQ 20 -out pop1_thetas,estimate Watterson's theta and Tajima's D in sliding windows with default parameters,population-genomics
arriba,arriba_01,--runMode alignReads --genomeDir /star_index/ --readFilesIn R1.fastq.gz R2.fastq.gz --readFilesCommand zcat --runThreadN 8 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample/ --chimSegmentMin 10 --chimOutType WithinBAM --chimJunctionOverhangMin 10 --chimScoreDropMax 30 --peOverlapNbasesMin 12,run STAR with chimeric output for Arriba fusion detection,rna-seq
arriba,arriba_02,-x sample/Aligned.sortedByCoord.out.bam -o fusions.tsv -O discarded_fusions.tsv -g genome.fa -a genes.gtf -b blacklist_hg38_GRCh38_v2.4.0.tsv.gz,detect gene fusions with Arriba,rna-seq
arriba,arriba_03,draw_fusions.R --fusions=fusions.tsv --alignments=sample/Aligned.sortedByCoord.out.bam --genome=genome.fa --annotation=genes.gtf --output=fusion_plots.pdf,visualize detected fusions with Arriba draw_fusions,rna-seq
arriba,arriba_04,--runMode alignReads --genomeDir /star_index/ --readFilesIn R1.fastq.gz R2.fastq.gz --readFilesCommand zcat --runThreadN 8 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample/ --chimSegmentMin 10 --chimOutType WithinBAM --chimJunctionOverhangMin 10 --chimScoreDropMax 30 --peOverlapNbasesMin 12 --quiet,run STAR with chimeric output for Arriba fusion detection in quiet mode,rna-seq
arriba,arriba_05,-x sample/Aligned.sortedByCoord.out.bam -o fusions.tsv -O discarded_fusions.tsv -g genome.fa -a genes.gtf -b blacklist_hg38_GRCh38_v2.4.0.tsv.gz,detect gene fusions with Arriba with default parameters,rna-seq
arriba,arriba_06,draw_fusions.R --fusions=fusions.tsv --alignments=sample/Aligned.sortedByCoord.out.bam --genome=genome.fa --annotation=genes.gtf --output=fusion_plots.pdf --verbose,visualize detected fusions with Arriba draw_fusions with verbose output,rna-seq
arriba,arriba_07,--runMode alignReads --genomeDir /star_index/ --readFilesIn R1.fastq.gz R2.fastq.gz --readFilesCommand zcat --runThreadN 8 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample/ --chimSegmentMin 10 --chimOutType WithinBAM --chimJunctionOverhangMin 10 --chimScoreDropMax 30 --peOverlapNbasesMin 12 -t 4,run STAR with chimeric output for Arriba fusion detection using multiple threads,rna-seq
arriba,arriba_08,-x sample/Aligned.sortedByCoord.out.bam -o fusions.tsv -O discarded_fusions.tsv -g genome.fa -a genes.gtf -b blacklist_hg38_GRCh38_v2.4.0.tsv.gz,detect gene fusions with Arriba and write output to a file,rna-seq
arriba,arriba_09,draw_fusions.R --fusions=fusions.tsv --alignments=sample/Aligned.sortedByCoord.out.bam --genome=genome.fa --annotation=genes.gtf --output=fusion_plots.pdf --quiet,visualize detected fusions with Arriba draw_fusions in quiet mode,rna-seq
arriba,arriba_10,--runMode alignReads --genomeDir /star_index/ --readFilesIn R1.fastq.gz R2.fastq.gz --readFilesCommand zcat --runThreadN 8 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample/ --chimSegmentMin 10 --chimOutType WithinBAM --chimJunctionOverhangMin 10 --chimScoreDropMax 30 --peOverlapNbasesMin 12,run STAR with chimeric output for Arriba fusion detection with default parameters,rna-seq
augustus,augustus_01,--species=human genome.fasta --gff3=on > augustus_predictions.gff3,predict genes in a eukaryotic genome using human parameters,annotation
augustus,augustus_02,--species=arabidopsis --hintsfile=rnaseq_hints.gff --extrinsicCfgFile=extrinsic.cfg genome.fasta --gff3=on > improved_predictions.gff3,predict genes with RNA-seq hints for improved accuracy,annotation
augustus,augustus_03,--species=fly --gff3=on --protein=on --codingseq=on genome.fasta > fly_predictions.gff3,predict genes and output protein sequences,annotation
augustus,augustus_04,--species=zebrafish zebrafish_masked.fasta --gff3=on --softmasking=1 > zebrafish_genes.gff3,run Augustus on a repeat-masked genome,annotation
augustus,augustus_05,--species=human genome.fasta --gff3=on > augustus_predictions.gff3,predict genes in a eukaryotic genome using human parameters with default parameters,annotation
augustus,augustus_06,--species=arabidopsis --hintsfile=rnaseq_hints.gff --extrinsicCfgFile=extrinsic.cfg genome.fasta --gff3=on > improved_predictions.gff3,predict genes with RNA-seq hints for improved accuracy,annotation
augustus,augustus_07,--species=fly --gff3=on --protein=on --codingseq=on genome.fasta > fly_predictions.gff3,predict genes and output protein sequences,annotation
augustus,augustus_08,--species=zebrafish zebrafish_masked.fasta --gff3=on --softmasking=1 > zebrafish_genes.gff3,run Augustus on a repeat-masked genome,annotation
augustus,augustus_09,--species=human genome.fasta --gff3=on > augustus_predictions.gff3,predict genes in a eukaryotic genome using human parameters,annotation
augustus,augustus_10,--species=arabidopsis --hintsfile=rnaseq_hints.gff --extrinsicCfgFile=extrinsic.cfg genome.fasta --gff3=on > improved_predictions.gff3,predict genes with RNA-seq hints for improved accuracy with default parameters,annotation
awk,awk_01,"-F ',' '{print $1"",""$3}' file.csv",print specific columns from a CSV file,text-processing
awk,awk_02,"'{sum+=$2} END{print ""Total:"", sum}' data.txt",sum values in a column and print the total,text-processing
awk,awk_03,'$3 > 100 {print $0}' data.tsv,filter and print lines where a column exceeds a threshold,text-processing
awk,awk_04,"'{count[$1]++} END{for(k in count) print k, count[k]}' data.txt",count occurrences of each unique value in a column,text-processing
awk,awk_05,"'/START/,/END/{print}' file.txt",print lines between two patterns (inclusive),text-processing
awk,awk_06,'prev!=$0{print; prev=$0}' file.txt,remove duplicate consecutive lines,text-processing
awk,awk_07,"'{print NR, $0}' file.txt",add line numbers to output,text-processing
awk,awk_08,"-F '\t' 'BEGIN{OFS="",""} {$1=$1; print}' input.tsv",convert tab-separated to comma-separated,text-processing
awk,awk_09,"'{sum+=$1; n++} END{if(n>0) print ""Average:"", sum/n}' values.txt",calculate average of a column,text-processing
awk,awk_10,'{print $NF}' file.txt,print the last field of each line regardless of column count,text-processing
bakta,bakta_01,--db /path/to/bakta_db/ --threads 8 --output annotation/ --prefix genome_annotation genome.fasta,annotate a bacterial genome with Bakta,annotation
bakta,bakta_02,--db /path/to/bakta_db/ --compliant --locus-tag MYORG --genus Escherichia --species coli --threads 8 --output ncbi_annotation/ --prefix ecoli_K12 genome.fasta,annotate genome for NCBI submission,annotation
bakta,bakta_03,--db /path/to/bakta_db/ --plasmid --threads 4 --output plasmid_annotation/ --prefix plasmid plasmid.fasta,annotate plasmid sequence,annotation
bakta,bakta_04,--db /path/to/bakta_db/ --threads 8 --output annotation/ --prefix genome_annotation genome.fasta --quiet,annotate a bacterial genome with Bakta in quiet mode,annotation
bakta,bakta_05,--db /path/to/bakta_db/ --compliant --locus-tag MYORG --genus Escherichia --species coli --threads 8 --output ncbi_annotation/ --prefix ecoli_K12 genome.fasta,annotate genome for NCBI submission with default parameters,annotation
bakta,bakta_06,--db /path/to/bakta_db/ --plasmid --threads 4 --output plasmid_annotation/ --prefix plasmid plasmid.fasta --verbose,annotate plasmid sequence with verbose output,annotation
bakta,bakta_07,--db /path/to/bakta_db/ --threads 8 --output annotation/ --prefix genome_annotation genome.fasta,annotate a bacterial genome with Bakta using multiple threads,annotation
bakta,bakta_08,--db /path/to/bakta_db/ --compliant --locus-tag MYORG --genus Escherichia --species coli --threads 8 --output ncbi_annotation/ --prefix ecoli_K12 genome.fasta,annotate genome for NCBI submission and write output to a file,annotation
bakta,bakta_09,--db /path/to/bakta_db/ --plasmid --threads 4 --output plasmid_annotation/ --prefix plasmid plasmid.fasta --quiet,annotate plasmid sequence in quiet mode,annotation
bakta,bakta_10,--db /path/to/bakta_db/ --threads 8 --output annotation/ --prefix genome_annotation genome.fasta,annotate a bacterial genome with Bakta with default parameters,annotation
bamtools,bamtools_01,stats -in input.bam > alignment_stats.txt,get alignment statistics from a BAM file,utilities
bamtools,bamtools_02,count -in input.bam,count aligned reads in a BAM file,utilities
bamtools,bamtools_03,filter -in input.bam -out filtered.bam -isMapped true -isProperPair true,"filter BAM to keep only mapped, properly paired reads",utilities
bamtools,bamtools_04,merge -in sample1.bam -in sample2.bam -in sample3.bam -out merged.bam,merge multiple BAM files,utilities
bamtools,bamtools_05,convert -in input.bam -format fastq -out reads.fastq,convert BAM to FASTQ,utilities
bamtools,bamtools_06,stats -in input.bam > alignment_stats.txt,get alignment statistics from a BAM file,utilities
bamtools,bamtools_07,count -in input.bam -t 4,count aligned reads in a BAM file using multiple threads,utilities
bamtools,bamtools_08,filter -in input.bam -out filtered.bam -isMapped true -isProperPair true -o output.txt,"filter BAM to keep only mapped, properly paired reads and write output to a file",utilities
bamtools,bamtools_09,merge -in sample1.bam -in sample2.bam -in sample3.bam -out merged.bam --quiet,merge multiple BAM files in quiet mode,utilities
bamtools,bamtools_10,convert -in input.bam -format fastq -out reads.fastq,convert BAM to FASTQ with default parameters,utilities
bash,bash_01,script.sh arg1 arg2,run a bash script,programming
bash,bash_02,-euo pipefail -c 'command1 | command2',run a script with strict error handling,programming
bash,bash_03,-c 'source ~/.bashrc && printenv',source a configuration file into the current shell,programming
bash,bash_04,--version,check bash version,programming
bash,bash_05,-x script.sh,run a script and print each command as it executes (debugging),programming
bash,bash_06,-c 'declare -f; alias',list all loaded functions and aliases,programming
bash,bash_07,-c 'export MY_VAR=test && echo $MY_VAR',execute a command in a subshell without affecting the current environment,programming
bash,bash_08,-c 'diff <(sort file1.txt) <(sort file2.txt)',write a multi-command pipeline using process substitution,programming
bash,bash_09,"-c 'long_running_command &; PID=$!; wait $PID; echo ""exit: $?""'",run a background pipeline job and capture its PID,programming
bash,bash_10,"-c 'for f in *.bam; do samtools flagstat ""$f"" > ""${f%.bam}.stats""; done'",loop over a list of files and process each,programming
bbtools,bbtools_01,bbduk.sh in=R1.fastq.gz in2=R2.fastq.gz out=R1_trimmed.fastq.gz out2=R2_trimmed.fastq.gz ref=adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo qtrim=r trimq=20 minlen=50,trim adapters and quality-filter with BBDuk,sequence-utilities
bbtools,bbtools_02,bbduk.sh in=R1.fastq.gz in2=R2.fastq.gz out=clean_R1.fastq.gz out2=clean_R2.fastq.gz ref=phix174_ill.ref.fa.gz k=31 hdist=1,remove PhiX contamination from a FASTQ,sequence-utilities
bbtools,bbtools_03,bbmap.sh in=reads.fastq.gz ref=genome.fa out=aligned.sam,align reads to a reference genome,sequence-utilities
bbtools,bbtools_04,bbmerge.sh in=R1.fastq.gz in2=R2.fastq.gz out=merged.fastq.gz outu=unmerged_R1.fastq.gz outu2=unmerged_R2.fastq.gz,merge overlapping paired-end reads,sequence-utilities
bbtools,bbtools_05,reformat.sh in=large.fastq.gz out=subset.fastq.gz samplereadstarget=1000000,subsample a FASTQ file to a specific number of reads,sequence-utilities
bbtools,bbtools_06,reformat.sh in=reads.fastq.gz out=reads.fa,convert FASTQ to FASTA,sequence-utilities
bbtools,bbtools_07,bbmap.sh in=sample.fastq.gz ref=human_genome.fa outm=host_reads.fastq.gz outu=non_host_reads.fastq.gz nodisk=t,remove host reads before metagenomics analysis,sequence-utilities
bbtools,bbtools_08,reformat.sh in=reads.fastq.gz,get detailed statistics for a FASTQ file,sequence-utilities
bbtools,bbtools_09,dedupe.sh in=reads.fastq.gz out=deduped.fastq.gz,remove duplicate reads with dedupe.sh,sequence-utilities
bbtools,bbtools_10,"bbsplit.sh in=sample.fastq.gz ref=genome1.fa,genome2.fa out_genome1=reads_genome1.fastq.gz out_genome2=reads_genome2.fastq.gz",split reads by genome of origin for metagenomics,sequence-utilities
bcftools,bcftools_01,mpileup -f reference.fa -O u input.bam | bcftools call -m -v -O z -o variants.vcf.gz,call variants from a BAM file against a reference genome,variant-calling
bcftools,bcftools_02,"view -i 'QUAL>30 && INFO/DP>10 && TYPE=""snp""' -O z -o filtered.vcf.gz input.vcf.gz","filter VCF to keep only high-quality SNPs (QUAL > 30, depth > 10)",variant-calling
bcftools,bcftools_03,merge -O z -o merged.vcf.gz sample1.vcf.gz sample2.vcf.gz sample3.vcf.gz,merge multiple VCF files from different samples,variant-calling
bcftools,bcftools_04,view -s SAMPLE_NAME -O z -o sample.vcf.gz multisample.vcf.gz,extract a specific sample from a multi-sample VCF,variant-calling
bcftools,bcftools_05,norm -m -any -f reference.fa -O z -o normalized.vcf.gz input.vcf.gz,normalize indels and split multi-allelic variants,variant-calling
bcftools,bcftools_06,stats input.vcf.gz > stats.txt,compute variant statistics for a VCF file,variant-calling
bcftools,bcftools_07,view -v snps -O z -o snps.vcf.gz input.vcf.gz,select only SNPs from a VCF file,variant-calling
bcftools,bcftools_08,annotate -a dbsnp.vcf.gz -c ID -O z -o annotated.vcf.gz input.vcf.gz,annotate VCF with a reference VCF (add ID field from dbSNP),variant-calling
bcftools,bcftools_09,mpileup -f reference.fa -O u input.bam | bcftools call -m -v -O z -o variants.vcf.gz,call variants from a BAM file against a reference genome,variant-calling
bcftools,bcftools_10,"view -i 'QUAL>30 && INFO/DP>10 && TYPE=""snp""' -O z -o filtered.vcf.gz input.vcf.gz","filter VCF to keep only high-quality SNPs (QUAL > 30, depth > 10) with default parameters",variant-calling
bedops,bedops_01,sort-bed input.bed > input.sorted.bed,sort a BED file for use with BEDOPS tools,genomic-intervals
bedops,bedops_02,--intersect a.sorted.bed b.sorted.bed > intersection.bed,intersect two sorted BED files (intervals present in both),genomic-intervals
bedops,bedops_03,--difference a.sorted.bed b.sorted.bed > a_not_b.bed,find intervals in file A that do not overlap file B,genomic-intervals
bedops,bedops_04,bedmap --echo --sum --delim '\t' genes.sorted.bed signal.sorted.bedgraph > genes_with_coverage.bed,compute coverage (sum of signal) from bigwig/bedgraph mapped to gene windows,genomic-intervals
bedops,bedops_05,starch input.sorted.bed > input.starch,compress a sorted BED file to starch format,genomic-intervals
bedops,bedops_06,bedextract chr1:100000-200000 input.sorted.bed,extract all intervals overlapping a specific region,genomic-intervals
bedops,bedops_07,--merge a.sorted.bed b.sorted.bed c.sorted.bed > merged_union.bed,merge overlapping intervals and compute union across three BED files,genomic-intervals
bedops,bedops_08,sort-bed input.bed > input.sorted.bed,sort a BED file for use with BEDOPS tools,genomic-intervals
bedops,bedops_09,--intersect a.sorted.bed b.sorted.bed > intersection.bed,intersect two sorted BED files (intervals present in both),genomic-intervals
bedops,bedops_10,--difference a.sorted.bed b.sorted.bed > a_not_b.bed,find intervals in file A that do not overlap file B with default parameters,genomic-intervals
bedtools,bedtools_01,intersect -a query.bed -b features.bed -wa,find intervals in file A that overlap with file B,genomic-intervals
bedtools,bedtools_02,subtract -a regions.bed -b blacklist.bed,subtract regions in B from regions in A,genomic-intervals
bedtools,bedtools_03,merge -i input.bed,merge overlapping intervals in a BED file,genomic-intervals
bedtools,bedtools_04,genomecov -ibam sorted.bam -bg > coverage.bedgraph,compute per-base coverage from a BAM file,genomic-intervals
bedtools,bedtools_05,closest -a query.bed -b annotations.bed -d,find closest non-overlapping feature in B for each interval in A,genomic-intervals
bedtools,bedtools_06,intersect -a genes.bed -b reads.bam -c,count overlaps between A intervals and B features,genomic-intervals
bedtools,bedtools_07,getfasta -fi reference.fa -bed intervals.bed -fo output.fa,get sequences for intervals in a BED file,genomic-intervals
bedtools,bedtools_08,genomecov -ibam sorted.bam -bga > coverage_all.bedgraph,compute coverage including zero-coverage positions,genomic-intervals
bedtools,bedtools_09,intersect -a query.bed -b features.bed -wb,intersect two BED files and report original B intervals that overlap A,genomic-intervals
bedtools,bedtools_10,makewindows -g genome.txt -w 1000 > windows.bed,make windows of fixed size across a genome,genomic-intervals
bismark,bismark_01,bismark_genome_preparation /path/to/genome_directory/,prepare bisulfite genome index for alignment,epigenomics
bismark,bismark_02,--genome /path/to/genome_dir/ -1 R1.fastq.gz -2 R2.fastq.gz --output_dir bismark_output/ -p 4,align paired-end WGBS reads to bisulfite genome,epigenomics
bismark,bismark_03,deduplicate_bismark --paired --bam sample_bismark_bt2_pe.bam,deduplicate bismark-aligned paired-end BAM file,epigenomics
bismark,bismark_04,bismark_methylation_extractor --paired-end --comprehensive --CX_context --genome_folder /path/to/genome_dir/ --output_dir methylation/ sample_deduplicated.bam,extract methylation information from deduplicated BAM,epigenomics
bismark,bismark_05,--genome /path/to/genome_dir/ --rrbs -1 R1.fastq.gz -2 R2.fastq.gz --output_dir rrbs_output/ -p 4,align RRBS data with MspI site handling,epigenomics
bismark,bismark_06,--genome /path/to/genome_dir/ --hisat2 reads.fastq.gz --output_dir bismark_output/ -p 4,align single-end WGBS reads with HISAT2 aligner,epigenomics
bismark,bismark_07,--genome /path/to/genome_dir/ --non_directional -1 R1.fastq.gz -2 R2.fastq.gz --output_dir pbat_output/ -p 4,align PBAT or scBS-seq (non-directional) library,epigenomics
bismark,bismark_08,bismark_methylation_extractor --paired-end --comprehensive --bedGraph --CX_context --genome_folder /path/to/genome_dir/ --output_dir methylation/ sample_deduplicated.bam,extract CpG methylation and generate bedGraph coverage file,epigenomics
bismark,bismark_09,bismark_genome_preparation --hisat2 /path/to/genome_directory/,prepare bisulfite index for HISAT2-based alignment,epigenomics
bismark,bismark_10,bismark_methylation_extractor --paired-end --mbias_only --genome_folder /path/to/genome_dir/ --output_dir mbias/ sample.bam,generate M-bias plot to identify read-end bias in methylation calls,epigenomics
blast,blast_01,-in genome.fasta -dbtype nucl -out genome_db -title 'Genome Database' -parse_seqids,build a nucleotide BLAST database from a FASTA file,sequence-utilities
blast,blast_02,-query query.fasta -db genome_db -out blast_results.txt -outfmt 6 -evalue 1e-5 -num_threads 8,run blastn to find similar nucleotide sequences,sequence-utilities
blast,blast_03,-query proteins.faa -db /path/to/nr -out blastp_results.txt -outfmt '6 std stitle staxids' -evalue 1e-5 -num_threads 16 -max_target_seqs 5,search protein sequences against NR database,sequence-utilities
blast,blast_04,-query contigs.fasta -db /path/to/swissprot -out blastx_results.txt -outfmt 6 -evalue 1e-5 -num_threads 8 -max_target_seqs 1,run blastx to annotate nucleotide sequences against protein database,sequence-utilities
blast,blast_05,-query query.fasta -db nr -out remote_blast.txt -outfmt 6 -remote -max_target_seqs 10,perform remote BLAST search against NCBI nr database,sequence-utilities
blast,blast_06,-in genome.fasta -dbtype nucl -out genome_db -title 'Genome Database' -parse_seqids --verbose,build a nucleotide BLAST database from a FASTA file with verbose output,sequence-utilities
blast,blast_07,-query query.fasta -db genome_db -out blast_results.txt -outfmt 6 -evalue 1e-5 -num_threads 8 -t 4,run blastn to find similar nucleotide sequences using multiple threads,sequence-utilities
blast,blast_08,-query proteins.faa -db /path/to/nr -out blastp_results.txt -outfmt '6 std stitle staxids' -evalue 1e-5 -num_threads 16 -max_target_seqs 5 -o output.txt,search protein sequences against NR database and write output to a file,sequence-utilities
blast,blast_09,-query contigs.fasta -db /path/to/swissprot -out blastx_results.txt -outfmt 6 -evalue 1e-5 -num_threads 8 -max_target_seqs 1 --quiet,run blastx to annotate nucleotide sequences against protein database in quiet mode,sequence-utilities
blast,blast_10,-query query.fasta -db nr -out remote_blast.txt -outfmt 6 -remote -max_target_seqs 10,perform remote BLAST search against NCBI nr database with default parameters,sequence-utilities
bowtie2,bowtie2_01,bowtie2-build reference.fa reference_index,build a bowtie2 index from a reference FASTA file,alignment
bowtie2,bowtie2_02,bowtie2-build --threads 8 reference.fa reference_index,build a bowtie2 index using multiple threads for a large genome,alignment
bowtie2,bowtie2_03,-x reference_index -1 R1.fastq.gz -2 R2.fastq.gz -p 8 | samtools view -b -o aligned.bam,align paired-end reads to a reference genome using 8 threads,alignment
bowtie2,bowtie2_04,-x reference_index -U reads.fastq.gz --very-sensitive -p 8 | samtools sort -o sorted.bam,align single-end reads with sensitive settings,alignment
bowtie2,bowtie2_05,-x reference_index -1 R1.fq.gz -2 R2.fq.gz -p 8 --no-unal -S aligned.sam 2> align_stats.txt,align paired-end reads and save the alignment statistics,alignment
bowtie2,bowtie2_06,-x reference_index -1 R1.fastq.gz -2 R2.fastq.gz -p 8 --rg-id sample1 --rg SM:sample1 --rg LB:lib1 --rg PL:ILLUMINA | samtools view -b -o sample1.bam,align paired-end reads with read group tags for GATK downstream analysis,alignment
bowtie2,bowtie2_07,-x reference_index -1 R1.fastq.gz -2 R2.fastq.gz --local --very-sensitive-local -p 8 | samtools view -b -o local_aligned.bam,align in local mode to allow soft-clipping of read ends,alignment
bowtie2,bowtie2_08,-x reference_index -1 R1.fastq.gz -2 R2.fastq.gz -p 16 --no-unal | samtools sort -@ 4 -o sorted.bam,align paired-end RNA-seq reads discarding unaligned reads,alignment
bowtie2,bowtie2_09,-x reference_index -U reads.fastq.gz --fast -p 4 -S quick_check.sam,align single-end reads in fast mode for a quick quality check,alignment
bowtie2,bowtie2_10,-x reference_index -1 R1.fastq.gz -2 R2.fastq.gz -p 8 --un-conc unmapped_%.fq | samtools view -b -o aligned.bam,align paired-end reads writing unmapped reads to separate files,alignment
bracken,bracken_01,-d /path/to/kraken2_db -i kraken_report.txt -o bracken_output.bracken -l S -r 150 -t 10,run Bracken on a Kraken2 report for species-level abundance estimation,metagenomics
bracken,bracken_02,-d /path/to/kraken2_db -i kraken_report.txt -o bracken_genus.bracken -l G -r 150 -t 5,run Bracken for genus-level abundance estimation,metagenomics
bracken,bracken_03,--files sample1.bracken sample2.bracken sample3.bracken --output combined_abundance.txt,combine Bracken results from multiple samples into one table,metagenomics
bracken,bracken_04,-d /path/to/kraken2_db -i kraken_report.txt -o bracken_75bp.bracken -l S -r 75 -t 10,run Bracken on short reads (75 bp),metagenomics
bracken,bracken_05,-d /path/to/kraken2_db -i kraken_report.txt -o bracken_output.bracken -l S -r 150 -t 10,run Bracken on a Kraken2 report for species-level abundance estimation with default parameters,metagenomics
bracken,bracken_06,-d /path/to/kraken2_db -i kraken_report.txt -o bracken_genus.bracken -l G -r 150 -t 5 --verbose,run Bracken for genus-level abundance estimation with verbose output,metagenomics
bracken,bracken_07,--files sample1.bracken sample2.bracken sample3.bracken --output combined_abundance.txt -t 4,combine Bracken results from multiple samples into one table using multiple threads,metagenomics
bracken,bracken_08,-d /path/to/kraken2_db -i kraken_report.txt -o bracken_75bp.bracken -l S -r 75 -t 10,run Bracken on short reads (75 bp) and write output to a file,metagenomics
bracken,bracken_09,-d /path/to/kraken2_db -i kraken_report.txt -o bracken_output.bracken -l S -r 150 -t 10 --quiet,run Bracken on a Kraken2 report for species-level abundance estimation in quiet mode,metagenomics
bracken,bracken_10,-d /path/to/kraken2_db -i kraken_report.txt -o bracken_genus.bracken -l G -r 150 -t 5,run Bracken for genus-level abundance estimation with default parameters,metagenomics
busco,busco_01,-i genome_assembly.fasta -o busco_bacteria -l bacteria_odb10 -m genome -c 8,assess completeness of a bacterial genome assembly,assembly
busco,busco_02,-i eukaryote_assembly.fasta -o busco_euk -l eukaryota_odb10 -m genome -c 16,assess completeness of a eukaryotic genome assembly,assembly
busco,busco_03,-i proteins.faa -o busco_proteome -l fungi_odb10 -m proteins -c 8,assess proteome completeness from predicted proteins,assembly
busco,busco_04,-i transcriptome.fasta -o busco_transcriptome -l vertebrata_odb10 -m transcriptome -c 8,assess transcriptome completeness,assembly
busco,busco_05,-i genome.fasta -o busco_autolineage -m genome --auto-lineage -c 16,run BUSCO with automatic lineage detection,assembly
busco,busco_06,-i genome_assembly.fasta -o busco_bacteria -l bacteria_odb10 -m genome -c 8 --verbose,assess completeness of a bacterial genome assembly with verbose output,assembly
busco,busco_07,-i eukaryote_assembly.fasta -o busco_euk -l eukaryota_odb10 -m genome -c 16 -t 4,assess completeness of a eukaryotic genome assembly using multiple threads,assembly
busco,busco_08,-i proteins.faa -o busco_proteome -l fungi_odb10 -m proteins -c 8,assess proteome completeness from predicted proteins and write output to a file,assembly
busco,busco_09,-i transcriptome.fasta -o busco_transcriptome -l vertebrata_odb10 -m transcriptome -c 8 --quiet,assess transcriptome completeness in quiet mode,assembly
busco,busco_10,-i genome.fasta -o busco_autolineage -m genome --auto-lineage -c 16,run BUSCO with automatic lineage detection with default parameters,assembly
bwa-mem2,bwa-mem2_01,index reference.fa,build BWA-MEM2 index from reference genome,alignment
bwa-mem2,bwa-mem2_02,mem -t 16 reference.fa R1.fastq.gz R2.fastq.gz | samtools sort -@ 4 -o sorted.bam,align paired-end reads to reference using 16 threads,alignment
bwa-mem2,bwa-mem2_03,mem -t 16 -R '@RG\tID:sample1\tSM:sample1\tLB:lib1\tPL:ILLUMINA' reference.fa R1.fastq.gz R2.fastq.gz | samtools view -b -o aligned.bam,align paired-end reads with GATK read group,alignment
bwa-mem2,bwa-mem2_04,index reference.fa --quiet,build BWA-MEM2 index from reference genome in quiet mode,alignment
bwa-mem2,bwa-mem2_05,mem -t 16 reference.fa R1.fastq.gz R2.fastq.gz | samtools sort -@ 4 -o sorted.bam,align paired-end reads to reference using 16 threads with default parameters,alignment
bwa-mem2,bwa-mem2_06,mem -t 16 -R '@RG\tID:sample1\tSM:sample1\tLB:lib1\tPL:ILLUMINA' reference.fa R1.fastq.gz R2.fastq.gz | samtools view -b -o aligned.bam,align paired-end reads with GATK read group,alignment
bwa-mem2,bwa-mem2_07,index reference.fa -t 4,build BWA-MEM2 index from reference genome using multiple threads,alignment
bwa-mem2,bwa-mem2_08,mem -t 16 reference.fa R1.fastq.gz R2.fastq.gz | samtools sort -@ 4 -o sorted.bam,align paired-end reads to reference using 16 threads,alignment
bwa-mem2,bwa-mem2_09,mem -t 16 -R '@RG\tID:sample1\tSM:sample1\tLB:lib1\tPL:ILLUMINA' reference.fa R1.fastq.gz R2.fastq.gz | samtools view -b -o aligned.bam,align paired-end reads with GATK read group,alignment
bwa-mem2,bwa-mem2_10,index reference.fa,build BWA-MEM2 index from reference genome with default parameters,alignment
bwa,bwa_01,index reference.fa,index a reference genome FASTA file,alignment
bwa,bwa_02,mem -t 8 reference.fa R1.fastq.gz R2.fastq.gz,align paired-end reads to a reference genome using 8 threads,alignment
bwa,bwa_03,mem -t 4 -R '@RG\tID:sample1\tSM:sample1\tLB:lib1\tPL:ILLUMINA' reference.fa reads.fastq.gz,align single-end reads and save as BAM with read group for GATK,alignment
bwa,bwa_04,mem -x ont2d reference.fa reads.fastq,align long reads (PacBio/Oxford Nanopore) to reference,alignment
bwa,bwa_05,mem -t 8 reference.fa R1.fastq.gz R2.fastq.gz | samtools sort -@ 4 -o sorted.bam,align paired-end reads and sort the output directly to a BAM file,alignment
bwa,bwa_06,mem -t 8 -R '@RG\tID:run1\tSM:patient1\tLB:lib1\tPL:ILLUMINA\tPU:unit1' reference.fa R1.fastq.gz R2.fastq.gz | samtools view -b -o sample1.bam,align paired-end reads with complete read group for GATK HaplotypeCaller,alignment
bwa,bwa_07,mem -t 8 reference.fa R1.fastq.gz R2.fastq.gz | samtools view -b -F 4 -o mapped.bam,align paired-end reads and report only mapped reads,alignment
bwa,bwa_08,mem -t 4 -B 4 -O 6 -E 1 reference.fa reads.fastq.gz > aligned.sam,align with specific gap extension and mismatch penalties,alignment
bwa,bwa_09,mem -t 8 -R '@RG\tID:sample2\tSM:sample2\tLB:lib2\tPL:ILLUMINA' reference.fa R1.fastq.gz R2.fastq.gz | samtools sort -@ 4 -o sample2_sorted.bam && samtools index sample2_sorted.bam,align paired-end reads in a pipeline saving both BAM and stats,alignment
bwa,bwa_10,mem -t 8 -Y reference.fa R1.fastq.gz R2.fastq.gz | samtools view -b -o sv_aligned.bam,align with soft-clipping allowed for structural variant discovery,alignment
canu,canu_01,-p ecoli_assembly -d canu_ecoli/ genomeSize=5m -nanopore reads.fastq.gz maxMemory=16g maxThreads=8,assemble bacterial genome from ONT reads,assembly
canu,canu_02,-p hifi_assembly -d canu_hifi/ genomeSize=3g -pacbio-hifi hifi_reads.fastq.gz maxMemory=64g maxThreads=32,assemble genome from PacBio HiFi reads,assembly
canu,canu_03,-p metagenome -d canu_meta/ genomeSize=100m -nanopore meta_reads.fastq.gz maxMemory=128g maxThreads=32 useGrid=false,assemble metagenome from ONT reads,assembly
canu,canu_04,-p assembly_only -d canu_assembly_only/ -assemble genomeSize=5m -nanopore-corrected corrected_reads.fasta maxMemory=16g maxThreads=8,run only the assembly stage (skip correction and trimming),assembly
canu,canu_05,-p ecoli_assembly -d canu_ecoli/ genomeSize=5m -nanopore reads.fastq.gz maxMemory=16g maxThreads=8,assemble bacterial genome from ONT reads with default parameters,assembly
canu,canu_06,-p hifi_assembly -d canu_hifi/ genomeSize=3g -pacbio-hifi hifi_reads.fastq.gz maxMemory=64g maxThreads=32 --verbose,assemble genome from PacBio HiFi reads with verbose output,assembly
canu,canu_07,-p metagenome -d canu_meta/ genomeSize=100m -nanopore meta_reads.fastq.gz maxMemory=128g maxThreads=32 useGrid=false,assemble metagenome from ONT reads using multiple threads,assembly
canu,canu_08,-p assembly_only -d canu_assembly_only/ -assemble genomeSize=5m -nanopore-corrected corrected_reads.fasta maxMemory=16g maxThreads=8 -o output.txt,run only the assembly stage (skip correction and trimming) and write output to a file,assembly
canu,canu_09,-p ecoli_assembly -d canu_ecoli/ genomeSize=5m -nanopore reads.fastq.gz maxMemory=16g maxThreads=8 --quiet,assemble bacterial genome from ONT reads in quiet mode,assembly
canu,canu_10,-p hifi_assembly -d canu_hifi/ genomeSize=3g -pacbio-hifi hifi_reads.fastq.gz maxMemory=64g maxThreads=32,assemble genome from PacBio HiFi reads with default parameters,assembly
cellsnp-lite,cellsnp-lite_01,-s possorted_genome_bam.bam -b barcodes.tsv -O cellsnp_out -R common_snps.vcf.gz -p 16 --minMAF 0.1 --minCOUNT 20,pileup known SNPs in a 10x Chromium scRNA-seq BAM with cell barcodes,single-cell
cellsnp-lite,cellsnp-lite_02,-s bulk.bam -O bulk_snp_out -R common_snps.vcf.gz -p 16 --minMAF 0.05 --minCOUNT 10,pileup SNPs in a bulk BAM without cell barcodes,single-cell
cellsnp-lite,cellsnp-lite_03,-s possorted_genome_bam.bam -b barcodes.tsv -O denovo_snp_out -p 16 --minMAF 0.1 --minCOUNT 100 --gzip,de novo SNP discovery in single-cell BAM (Mode 2),single-cell
cellsnp-lite,cellsnp-lite_04,"-s sample1.bam,sample2.bam,sample3.bam -O multi_sample_out -R common_snps.vcf.gz -p 16 --minMAF 0.1 --minCOUNT 20",pileup multiple BAMs from different samples at shared SNP positions,single-cell
cellsnp-lite,cellsnp-lite_05,-s possorted_genome_bam.bam -b barcodes.tsv -O chr1_out -R chr1_snps.vcf.gz --chrom 1 -p 8 --minMAF 0.1 --minCOUNT 20,restrict pileup to specific chromosomes to reduce runtime,single-cell
cellsnp-lite,cellsnp-lite_06,-s possorted_genome_bam.bam -b barcodes.tsv -O hq_out -R snps.vcf.gz -p 16 --minMAF 0.1 --minCOUNT 20 --minBQ 30 --minMAPQ 30,pileup with strict base quality filter for high-confidence allele counts,single-cell
cellsnp-lite,cellsnp-lite_07,-s possorted_genome_bam.bam -b barcodes.tsv -O cellsnp_out -R common_snps.vcf.gz -p 16 --minMAF 0.1 --minCOUNT 20,pileup known SNPs in a 10x Chromium scRNA-seq BAM with cell barcodes using multiple threads,single-cell
cellsnp-lite,cellsnp-lite_08,-s bulk.bam -O bulk_snp_out -R common_snps.vcf.gz -p 16 --minMAF 0.05 --minCOUNT 10 -o output.txt,pileup SNPs in a bulk BAM without cell barcodes and write output to a file,single-cell
cellsnp-lite,cellsnp-lite_09,-s possorted_genome_bam.bam -b barcodes.tsv -O denovo_snp_out -p 16 --minMAF 0.1 --minCOUNT 100 --gzip --quiet,de novo SNP discovery in single-cell BAM (Mode 2) in quiet mode,single-cell
cellsnp-lite,cellsnp-lite_10,"-s sample1.bam,sample2.bam,sample3.bam -O multi_sample_out -R common_snps.vcf.gz -p 16 --minMAF 0.1 --minCOUNT 20",pileup multiple BAMs from different samples at shared SNP positions with default parameters,single-cell
centrifuge,centrifuge_01,-x /databases/bv_bacteria -1 R1.fastq.gz -2 R2.fastq.gz -S classifications.tsv --report-file report.tsv -p 16,classify paired-end reads against a pre-built bacterial/viral database,metagenomics
centrifuge,centrifuge_02,-x /databases/nt -U reads.fastq.gz -S classifications.tsv --report-file report.tsv -p 16,classify single-end reads against the NT database,metagenomics
centrifuge,centrifuge_03,centrifuge-build -p 16 --taxonomy-tree nodes.dmp --name-table names.dmp --conversion-table seqid2taxid.map genomes.fasta custom_db,build a custom centrifuge index from bacterial reference genomes,metagenomics
centrifuge,centrifuge_04,-x /databases/viral -U reads.fastq.gz -S viral_hits.tsv --report-file viral_report.tsv -p 8 --min-hitlen 16,classify reads with increased sensitivity for viral detection,metagenomics
centrifuge,centrifuge_05,centrifuge-kreport -x /databases/bv_bacteria classifications.tsv > kraken_report.txt,convert centrifuge output to Kraken-style report for Pavian/Krona,metagenomics
centrifuge,centrifuge_06,-x /databases/hg38 -1 R1.fastq.gz -2 R2.fastq.gz -S human_classifications.tsv -p 16 --un-conc non_human_%.fastq.gz,remove human reads by classifying against human genome and excluding matches,metagenomics
centrifuge,centrifuge_07,centrifuge-build -p 8 --taxonomy-tree nodes.dmp --name-table names.dmp --conversion-table seqid2taxid.map viral_sequences.fasta viral_db,build centrifuge index from viral reference sequences,metagenomics
centrifuge,centrifuge_08,-x /databases/bv_bacteria -1 R1.fastq.gz -2 R2.fastq.gz -S classifications.tsv --report-file report.tsv -p 16 --un-conc unclassified_%.fastq.gz,classify reads and save unclassified reads for downstream assembly,metagenomics
centrifuge,centrifuge_09,-x /databases/nt -U reads.fastq.gz -S classifications.tsv --report-file report.tsv -p 16 --min-hitlen 30,use high minimum hit length for precision metagenomic classification,metagenomics
centrifuge,centrifuge_10,-x /databases/custom_microbiome -1 R1.fastq.gz -2 R2.fastq.gz -S classifications.tsv --report-file report.tsv -p 16 -k 5,classify paired-end metagenome against custom host-depleted database,metagenomics
checkm2,checkm2_01,predict --input bins_directory/ --output-directory checkm2_results/ --threads 16,assess quality of all MAG bins in a directory,metagenomics
checkm2,checkm2_02,predict --input bins_directory/ --output-directory checkm2_output/ --threads 16 --database_path /path/to/checkm2_database/,assess genome quality with custom database path,metagenomics
checkm2,checkm2_03,predict --input bins_directory/ --output-directory checkm2_results/ --threads 16 --allmodels,assess quality and produce detailed outputs including protein predictions,metagenomics
checkm2,checkm2_04,database --download --path /path/to/databases/,download the CheckM2 database,metagenomics
checkm2,checkm2_05,predict --input bins_directory/ --output-directory checkm2_results/ --threads 16,assess quality of all MAG bins in a directory with default parameters,metagenomics
checkm2,checkm2_06,predict --input bins_directory/ --output-directory checkm2_output/ --threads 16 --database_path /path/to/checkm2_database/ --verbose,assess genome quality with custom database path with verbose output,metagenomics
checkm2,checkm2_07,predict --input bins_directory/ --output-directory checkm2_results/ --threads 16 --allmodels,assess quality and produce detailed outputs including protein predictions using multiple threads,metagenomics
checkm2,checkm2_08,database --download --path /path/to/databases/ -o output.txt,download the CheckM2 database and write output to a file,metagenomics
checkm2,checkm2_09,predict --input bins_directory/ --output-directory checkm2_results/ --threads 16 --quiet,assess quality of all MAG bins in a directory in quiet mode,metagenomics
checkm2,checkm2_10,predict --input bins_directory/ --output-directory checkm2_output/ --threads 16 --database_path /path/to/checkm2_database/,assess genome quality with custom database path with default parameters,metagenomics
chopper,chopper_01,-q 10 -l 1000 --threads 8,filter ONT reads by minimum quality Q10 and minimum length 1000 bp,qc
chopper,chopper_02,-q 15 -l 500 --threads 8,"filter high-quality ONT reads for variant calling (Q15, min 500 bp)",qc
chopper,chopper_03,-q 10 -l 1000 --headcrop 30 --tailcrop 30 --threads 8,filter reads and remove low-quality ends,qc
chopper,chopper_04,-q 8 -l 200 --maxlength 50000 --threads 4,filter reads with maximum length cutoff for specific applications,qc
chopper,chopper_05,-q 10 -l 1000 --threads 8,filter ONT reads by minimum quality Q10 and minimum length 1000 bp with default parameters,qc
chopper,chopper_06,-q 15 -l 500 --threads 8 --verbose,"filter high-quality ONT reads for variant calling (Q15, min 500 bp) with verbose output",qc
chopper,chopper_07,-q 10 -l 1000 --headcrop 30 --tailcrop 30 --threads 8,filter reads and remove low-quality ends using multiple threads,qc
chopper,chopper_08,-q 8 -l 200 --maxlength 50000 --threads 4 -o output.txt,filter reads with maximum length cutoff for specific applications and write output to a file,qc
chopper,chopper_09,-q 10 -l 1000 --threads 8 --quiet,filter ONT reads by minimum quality Q10 and minimum length 1000 bp in quiet mode,qc
chopper,chopper_10,-q 15 -l 500 --threads 8,"filter high-quality ONT reads for variant calling (Q15, min 500 bp) with default parameters",qc
chromap,chromap_01,-i -r genome.fa -o genome.index,build Chromap genome index,epigenomics
chromap,chromap_02,--preset atac -x genome.index -r genome.fa -1 R1.fastq.gz -2 R2.fastq.gz -o fragments.bed -t 16,align paired-end ATAC-seq reads with Chromap,epigenomics
chromap,chromap_03,--preset atac -x genome.index -r genome.fa -1 R1.fastq.gz -2 R2.fastq.gz -b barcode.fastq.gz --barcode-whitelist whitelist.txt -o scatac_fragments.bed -t 16,process single-cell ATAC-seq with barcodes,epigenomics
chromap,chromap_04,--preset chip -x genome.index -r genome.fa -1 R1.fastq.gz -2 R2.fastq.gz -o chip_aligned.bed -t 16,align ChIP-seq reads with Chromap,epigenomics
chromap,chromap_05,-i -r genome.fa -o genome.index,build Chromap genome index with default parameters,epigenomics
chromap,chromap_06,--preset atac -x genome.index -r genome.fa -1 R1.fastq.gz -2 R2.fastq.gz -o fragments.bed -t 16 --verbose,align paired-end ATAC-seq reads with Chromap with verbose output,epigenomics
chromap,chromap_07,--preset atac -x genome.index -r genome.fa -1 R1.fastq.gz -2 R2.fastq.gz -b barcode.fastq.gz --barcode-whitelist whitelist.txt -o scatac_fragments.bed -t 16,process single-cell ATAC-seq with barcodes using multiple threads,epigenomics
chromap,chromap_08,--preset chip -x genome.index -r genome.fa -1 R1.fastq.gz -2 R2.fastq.gz -o chip_aligned.bed -t 16,align ChIP-seq reads with Chromap and write output to a file,epigenomics
chromap,chromap_09,-i -r genome.fa -o genome.index --quiet,build Chromap genome index in quiet mode,epigenomics
chromap,chromap_10,--preset atac -x genome.index -r genome.fa -1 R1.fastq.gz -2 R2.fastq.gz -o fragments.bed -t 16,align paired-end ATAC-seq reads with Chromap with default parameters,epigenomics
cnvkit,cnvkit_01,batch tumor.bam --normal normal.bam --targets targets.bed --annotate refFlat.txt --fasta reference.fa --access access.hg38.bed --output-reference normal_reference.cnn --output-dir cnvkit_output/ -p 8,run CNVkit batch workflow for tumor-normal WES,variant-calling
cnvkit,cnvkit_02,batch tumor.bam --reference normal_reference.cnn --targets targets.bed --output-dir cnvkit_tumor_only/ -p 4,run CNVkit on tumor-only WES with pre-built reference,variant-calling
cnvkit,cnvkit_03,scatter tumor.cnr -s tumor.cns -o cnv_scatter.pdf,visualize CNV scatter plot,variant-calling
cnvkit,cnvkit_04,call tumor.cns -o tumor.call.cns --center median --purity 0.8,call integer copy numbers from segments,variant-calling
cnvkit,cnvkit_05,batch tumor.bam --normal normal.bam --targets targets.bed --annotate refFlat.txt --fasta reference.fa --access access.hg38.bed --output-reference normal_reference.cnn --output-dir cnvkit_output/ -p 8,run CNVkit batch workflow for tumor-normal WES with default parameters,variant-calling
cnvkit,cnvkit_06,batch tumor.bam --reference normal_reference.cnn --targets targets.bed --output-dir cnvkit_tumor_only/ -p 4 --verbose,run CNVkit on tumor-only WES with pre-built reference with verbose output,variant-calling
cnvkit,cnvkit_07,scatter tumor.cnr -s tumor.cns -o cnv_scatter.pdf -t 4,visualize CNV scatter plot using multiple threads,variant-calling
cnvkit,cnvkit_08,call tumor.cns -o tumor.call.cns --center median --purity 0.8,call integer copy numbers from segments and write output to a file,variant-calling
cnvkit,cnvkit_09,batch tumor.bam --normal normal.bam --targets targets.bed --annotate refFlat.txt --fasta reference.fa --access access.hg38.bed --output-reference normal_reference.cnn --output-dir cnvkit_output/ -p 8 --quiet,run CNVkit batch workflow for tumor-normal WES in quiet mode,variant-calling
cnvkit,cnvkit_10,batch tumor.bam --reference normal_reference.cnn --targets targets.bed --output-dir cnvkit_tumor_only/ -p 4,run CNVkit on tumor-only WES with pre-built reference with default parameters,variant-calling
crossmap,crossmap_01,bed hg19ToHg38.over.chain.gz input_hg19.bed output_hg38.bed,convert BED file from hg19 to hg38 coordinates,utilities
crossmap,crossmap_02,vcf hg19ToHg38.over.chain.gz input_hg19.vcf hg38_reference.fa output_hg38.vcf,convert VCF file from hg19 to hg38 with target reference,utilities
crossmap,crossmap_03,gff hg19ToHg38.over.chain.gz annotation_hg19.gtf output_hg38.gtf,convert GFF/GTF annotation from one assembly to another,utilities
crossmap,crossmap_04,bam hg19ToHg38.over.chain.gz input_hg19.bam output_hg38.bam,convert BAM file from one genome build to another,utilities
crossmap,crossmap_05,bed hg19ToHg38.over.chain.gz input_hg19.bed output_hg38.bed,convert BED file from hg19 to hg38 coordinates with default parameters,utilities
crossmap,crossmap_06,vcf hg19ToHg38.over.chain.gz input_hg19.vcf hg38_reference.fa output_hg38.vcf --verbose,convert VCF file from hg19 to hg38 with target reference with verbose output,utilities
crossmap,crossmap_07,gff hg19ToHg38.over.chain.gz annotation_hg19.gtf output_hg38.gtf -t 4,convert GFF/GTF annotation from one assembly to another using multiple threads,utilities
crossmap,crossmap_08,bam hg19ToHg38.over.chain.gz input_hg19.bam output_hg38.bam -o output.txt,convert BAM file from one genome build to another and write output to a file,utilities
crossmap,crossmap_09,bed hg19ToHg38.over.chain.gz input_hg19.bed output_hg38.bed --quiet,convert BED file from hg19 to hg38 coordinates in quiet mode,utilities
crossmap,crossmap_10,vcf hg19ToHg38.over.chain.gz input_hg19.vcf hg38_reference.fa output_hg38.vcf,convert VCF file from hg19 to hg38 with target reference with default parameters,utilities
curl,curl_01,-L -O https://example.com/files/archive.tar.gz,download a file and save with its original filename,networking
curl,curl_02,-L -o /data/dataset.csv https://example.com/dataset.csv,download a file and save to a specific local filename,networking
curl,curl_03,"-X POST -H 'Content-Type: application/json' -d '{""name"":""test"",""value"":42}' https://api.example.com/endpoint",send a JSON POST request to an API,networking
curl,curl_04,-H 'Authorization: Bearer TOKEN' https://api.example.com/data,authenticate with a Bearer token and call an API,networking
curl,curl_05,-L -C - -O https://example.com/large-file.iso,resume an interrupted download,networking
curl,curl_06,-I https://example.com,fetch only HTTP response headers,networking
curl,curl_07,-X POST -F 'file=@/local/path/data.txt' -F 'name=upload' https://api.example.com/upload,send a multipart form upload,networking
curl,curl_08,-L --progress-bar -o output.zip https://example.com/file.zip,download with progress bar and follow redirects silently,networking
curl,curl_09,--connect-timeout 10 --retry 3 --retry-delay 5 -L -O https://example.com/file.tar.gz,set a connection timeout and retry on failure,networking
curl,curl_10,-u alice:password123 https://protected.example.com/api,pass basic authentication credentials,networking
cutadapt,cutadapt_01,-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -o R1_trimmed.fastq.gz -p R2_trimmed.fastq.gz R1.fastq.gz R2.fastq.gz,remove Illumina TruSeq adapters from paired-end reads,qc
cutadapt,cutadapt_02,-a AGATCGGAAGAGC -A AGATCGGAAGAGC -q 20 --minimum-length 36 -j 8 -o R1_trimmed.fastq.gz -p R2_trimmed.fastq.gz R1.fastq.gz R2.fastq.gz,"trim adapters and quality-filter, discarding short reads",qc
cutadapt,cutadapt_03,-a A{20} -q 20 --minimum-length 30 -j 4 -o trimmed.fastq.gz reads.fastq.gz,remove polyA tail from single-end RNA-seq reads,qc
cutadapt,cutadapt_04,-a CTGTCTCTTATA -A CTGTCTCTTATA -q 20 --minimum-length 20 -j 8 -o R1_trimmed.fastq.gz -p R2_trimmed.fastq.gz R1.fastq.gz R2.fastq.gz,trim Nextera transposase adapters from paired-end ATAC-seq data,qc
cutadapt,cutadapt_05,-g ACACTGACGACATGGTTCTACA --discard-untrimmed -o trimmed.fastq.gz reads.fastq.gz,remove 5' primer from single-end amplicon reads,qc
cutadapt,cutadapt_06,-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -o R1_trimmed.fastq.gz -p R2_trimmed.fastq.gz R1.fastq.gz R2.fastq.gz --verbose,remove Illumina TruSeq adapters from paired-end reads with verbose output,qc
cutadapt,cutadapt_07,-a AGATCGGAAGAGC -A AGATCGGAAGAGC -q 20 --minimum-length 36 -j 8 -o R1_trimmed.fastq.gz -p R2_trimmed.fastq.gz R1.fastq.gz R2.fastq.gz,"trim adapters and quality-filter, discarding short reads using multiple threads",qc
cutadapt,cutadapt_08,-a A{20} -q 20 --minimum-length 30 -j 4 -o trimmed.fastq.gz reads.fastq.gz,remove polyA tail from single-end RNA-seq reads and write output to a file,qc
cutadapt,cutadapt_09,-a CTGTCTCTTATA -A CTGTCTCTTATA -q 20 --minimum-length 20 -j 8 -o R1_trimmed.fastq.gz -p R2_trimmed.fastq.gz R1.fastq.gz R2.fastq.gz --quiet,trim Nextera transposase adapters from paired-end ATAC-seq data in quiet mode,qc
cutadapt,cutadapt_10,-g ACACTGACGACATGGTTCTACA --discard-untrimmed -o trimmed.fastq.gz reads.fastq.gz,remove 5' primer from single-end amplicon reads with default parameters,qc
deeptools,deeptools_01,bamCoverage -b sorted.bam -o output.bw --normalizeUsing RPKM --binSize 10 -p 8,generate normalized bigWig coverage track from a BAM file,epigenomics
deeptools,deeptools_02,bamCompare -b1 chip.bam -b2 input.bam -o chip_vs_input_log2.bw --normalizeUsing RPKM --binSize 10 -p 8,create log2 ratio (ChIP/Input) bigWig track,epigenomics
deeptools,deeptools_03,computeMatrix reference-point -S chip.bw -R genes.bed --referencePoint TSS -b 3000 -a 3000 -o matrix.gz -p 8,compute signal matrix around TSS for heatmap visualization,epigenomics
deeptools,deeptools_04,plotHeatmap -m matrix.gz -out heatmap.png --colorMap RdBu_r --whatToShow 'heatmap and colorbar',plot heatmap of signal around genomic regions,epigenomics
deeptools,deeptools_05,multiBamSummary bins -b sample1.bam sample2.bam sample3.bam -o readCounts.npz -p 8,compute read count correlation between multiple BAM files,epigenomics
deeptools,deeptools_06,bamCoverage -b atac_sorted.bam -o atac_signal.bw --ATACshift --normalizeUsing RPGC --effectiveGenomeSize 2913022398 --binSize 10 -p 8,generate ATAC-seq normalized bigWig with Tn5 shift correction,epigenomics
deeptools,deeptools_07,bamCoverage -b sorted.bam -o output.bw --normalizeUsing RPKM --binSize 10 -p 8,generate normalized bigWig coverage track from a BAM file using multiple threads,epigenomics
deeptools,deeptools_08,bamCompare -b1 chip.bam -b2 input.bam -o chip_vs_input_log2.bw --normalizeUsing RPKM --binSize 10 -p 8,create log2 ratio (ChIP/Input) bigWig track and write output to a file,epigenomics
deeptools,deeptools_09,computeMatrix reference-point -S chip.bw -R genes.bed --referencePoint TSS -b 3000 -a 3000 -o matrix.gz -p 8 --quiet,compute signal matrix around TSS for heatmap visualization in quiet mode,epigenomics
deeptools,deeptools_10,plotHeatmap -m matrix.gz -out heatmap.png --colorMap RdBu_r --whatToShow 'heatmap and colorbar',plot heatmap of signal around genomic regions with default parameters,epigenomics
delly,delly_01,call -g reference.fa -o sample_svs.bcf sample.bam,call structural variants from a single sample,variant-calling
delly,delly_02,call -g reference.fa -x hg38.excl -o sample_svs.bcf sample.bam,call SVs with repetitive region exclusion list,variant-calling
delly,delly_03,call -g reference.fa -x hg38.excl -o somatic_svs.bcf tumor.bam normal.bam,call somatic SVs from tumor-normal pair,variant-calling
delly,delly_04,filter -f somatic -o somatic_filtered.bcf -s samples.tsv somatic_svs.bcf,filter somatic SVs from DELLY output,variant-calling
delly,delly_05,merge -o merged_sites.bcf sample1.bcf sample2.bcf sample3.bcf,merge per-sample SV calls for population analysis,variant-calling
delly,delly_06,call -g reference.fa -o sample_svs.bcf sample.bam --verbose,call structural variants from a single sample with verbose output,variant-calling
delly,delly_07,call -g reference.fa -x hg38.excl -o sample_svs.bcf sample.bam -t 4,call SVs with repetitive region exclusion list using multiple threads,variant-calling
delly,delly_08,call -g reference.fa -x hg38.excl -o somatic_svs.bcf tumor.bam normal.bam,call somatic SVs from tumor-normal pair and write output to a file,variant-calling
delly,delly_09,filter -f somatic -o somatic_filtered.bcf -s samples.tsv somatic_svs.bcf --quiet,filter somatic SVs from DELLY output in quiet mode,variant-calling
delly,delly_10,merge -o merged_sites.bcf sample1.bcf sample2.bcf sample3.bcf,merge per-sample SV calls for population analysis with default parameters,variant-calling
diamond,diamond_01,makedb --in nr.faa -d nr_diamond --threads 8,build a DIAMOND protein database from a FASTA file,metagenomics
diamond,diamond_02,blastp -q proteins.faa -d nr_diamond -o blastp_results.tsv --outfmt 6 --threads 8 --evalue 1e-5,search protein sequences against a DIAMOND database (blastp),metagenomics
diamond,diamond_03,blastx -q reads.fastq.gz -d nr_diamond -o blastx_results.tsv --outfmt 6 --threads 16 --evalue 1e-5 --max-target-seqs 1,search DNA reads against protein database using blastx (translated search),metagenomics
diamond,diamond_04,blastp -q proteins.faa -d uniprot_diamond -o detailed_results.tsv --outfmt '6 qseqid sseqid pident length evalue bitscore stitle' --more-sensitive --threads 8,sensitive mode search with custom output fields,metagenomics
diamond,diamond_05,blastx -q metagenome.faa -d nr_diamond --taxonmap prot.accession2taxid.gz --taxonnodes nodes.dmp -o results_tax.tsv --outfmt '6 qseqid sseqid pident evalue bitscore staxids sscinames' --threads 16,search with taxonomy-aware output for functional annotation,metagenomics
diamond,diamond_06,makedb --in nr.faa -d nr_diamond --threads 8 --verbose,build a DIAMOND protein database from a FASTA file with verbose output,metagenomics
diamond,diamond_07,blastp -q proteins.faa -d nr_diamond -o blastp_results.tsv --outfmt 6 --threads 8 --evalue 1e-5,search protein sequences against a DIAMOND database (blastp) using multiple threads,metagenomics
diamond,diamond_08,blastx -q reads.fastq.gz -d nr_diamond -o blastx_results.tsv --outfmt 6 --threads 16 --evalue 1e-5 --max-target-seqs 1,search DNA reads against protein database using blastx (translated search) and write output to a file,metagenomics
diamond,diamond_09,blastp -q proteins.faa -d uniprot_diamond -o detailed_results.tsv --outfmt '6 qseqid sseqid pident length evalue bitscore stitle' --more-sensitive --threads 8 --quiet,sensitive mode search with custom output fields in quiet mode,metagenomics
diamond,diamond_10,blastx -q metagenome.faa -d nr_diamond --taxonmap prot.accession2taxid.gz --taxonnodes nodes.dmp -o results_tax.tsv --outfmt '6 qseqid sseqid pident evalue bitscore staxids sscinames' --threads 16,search with taxonomy-aware output for functional annotation with default parameters,metagenomics
fastp,fastp_01,-i R1.fastq.gz -I R2.fastq.gz -o clean_R1.fastq.gz -O clean_R2.fastq.gz -h report.html -j report.json -w 8,quality trim and filter paired-end FASTQ reads with 8 threads,qc
fastp,fastp_02,-i reads.fastq.gz -o clean_reads.fastq.gz -l 50 -h report.html -j report.json,trim adapters from single-end reads and filter reads shorter than 50 bp,qc
fastp,fastp_03,-i R1.fq.gz -I R2.fq.gz -o out_R1.fq.gz -O out_R2.fq.gz -q 20 -l 36 -w 8 -h qc.html -j qc.json,quality trim paired-end reads and set minimum quality to 20,qc
fastp,fastp_04,-i R1.fq.gz -I R2.fq.gz -o out_R1.fq.gz -O out_R2.fq.gz --trim_poly_a -w 8 -h rna_qc.html -j rna_qc.json,run fastp on paired-end RNA-seq data with polyA trimming,qc
fastp,fastp_05,-i R1.fq.gz -I R2.fq.gz -o /dev/null -O /dev/null --disable_adapter_trimming --disable_quality_filtering -h qc_report.html -j qc_report.json,"quality control only (no trimming, just generate the QC report)",qc
fastp,fastp_06,-i R1.fastq.gz -I R2.fastq.gz -o clean_R1.fastq.gz -O clean_R2.fastq.gz -h report.html -j report.json -w 8 --verbose,quality trim and filter paired-end FASTQ reads with 8 threads with verbose output,qc
fastp,fastp_07,-i reads.fastq.gz -o clean_reads.fastq.gz -l 50 -h report.html -j report.json,trim adapters from single-end reads and filter reads shorter than 50 bp using multiple threads,qc
fastp,fastp_08,-i R1.fq.gz -I R2.fq.gz -o out_R1.fq.gz -O out_R2.fq.gz -q 20 -l 36 -w 8 -h qc.html -j qc.json,quality trim paired-end reads and set minimum quality to 20 and write output to a file,qc
fastp,fastp_09,-i R1.fq.gz -I R2.fq.gz -o out_R1.fq.gz -O out_R2.fq.gz --trim_poly_a -w 8 -h rna_qc.html -j rna_qc.json --quiet,run fastp on paired-end RNA-seq data with polyA trimming in quiet mode,qc
fastp,fastp_10,-i R1.fq.gz -I R2.fq.gz -o /dev/null -O /dev/null --disable_adapter_trimming --disable_quality_filtering -h qc_report.html -j qc_report.json,"quality control only (no trimming, just generate the QC report) with default parameters",qc
fastq-screen,fastq-screen_01,--conf fastq_screen.conf --outdir results/ --threads 8 sample_R1.fastq.gz,screen a FASTQ file against default databases,qc
fastq-screen,fastq-screen_02,--conf fastq_screen.conf --subset 0 --outdir results/ --threads 8 sample_R1.fastq.gz,screen all reads (no subsampling) for thorough contamination check,qc
fastq-screen,fastq-screen_03,--conf fastq_screen.conf --aligner bismark --paired --outdir results/ --threads 8 R1.fastq.gz R2.fastq.gz,screen paired-end reads and report bisulfite alignment stats,qc
fastq-screen,fastq-screen_04,--conf fastq_screen.conf --no_html --outdir results/ --threads 8 sample.fastq.gz,screen reads and get only the table output without generating plots,qc
fastq-screen,fastq-screen_05,--conf custom_screen.conf --outdir results/ --threads 8 sample_R1.fastq.gz,add a custom database to the config and screen for mycoplasma contamination,qc
fastq-screen,fastq-screen_06,for f in *.fastq.gz; do fastq_screen --conf fastq_screen.conf --outdir screen_results/ --threads 4 $f; done && multiqc screen_results/ -o multiqc_report/,screen multiple samples in a loop and collect MultiQC report,qc
fastq-screen,fastq-screen_07,--conf fastq_screen.conf --outdir results/ --threads 8 sample_R1.fastq.gz,screen a FASTQ file against default databases using multiple threads,qc
fastq-screen,fastq-screen_08,--conf fastq_screen.conf --subset 0 --outdir results/ --threads 8 sample_R1.fastq.gz -o output.txt,screen all reads (no subsampling) for thorough contamination check and write output to a file,qc
fastq-screen,fastq-screen_09,--conf fastq_screen.conf --aligner bismark --paired --outdir results/ --threads 8 R1.fastq.gz R2.fastq.gz --quiet,screen paired-end reads and report bisulfite alignment stats in quiet mode,qc
fastq-screen,fastq-screen_10,--conf fastq_screen.conf --no_html --outdir results/ --threads 8 sample.fastq.gz,screen reads and get only the table output without generating plots with default parameters,qc
fastqc,fastqc_01,reads.fastq.gz -o qc_results/,run quality control on a single FASTQ file,qc
fastqc,fastqc_02,-t 4 -o qc_results/ R1.fastq.gz R2.fastq.gz,run quality control on paired-end FASTQ files using 4 threads,qc
fastqc,fastqc_03,--noextract -t 8 -o qc_output/ sample1_R1.fastq.gz sample1_R2.fastq.gz sample2_R1.fastq.gz sample2_R2.fastq.gz,run quality control on multiple samples and keep zip files without extracting,qc
fastqc,fastqc_04,-t 4 -o qc_results/ aligned.bam,run fastqc on a BAM file,qc
fastqc,fastqc_05,-f fastq -a adapters.txt -t 4 -o qc_results/ reads.fastq.gz,run fastqc with custom adapter sequences and format specification,qc
fastqc,fastqc_06,reads.fastq.gz -o qc_results/ --verbose,run quality control on a single FASTQ file with verbose output,qc
fastqc,fastqc_07,-t 4 -o qc_results/ R1.fastq.gz R2.fastq.gz,run quality control on paired-end FASTQ files using 4 threads using multiple threads,qc
fastqc,fastqc_08,--noextract -t 8 -o qc_output/ sample1_R1.fastq.gz sample1_R2.fastq.gz sample2_R1.fastq.gz sample2_R2.fastq.gz,run quality control on multiple samples and keep zip files without extracting and write output to a file,qc
fastqc,fastqc_09,-t 4 -o qc_results/ aligned.bam --quiet,run fastqc on a BAM file in quiet mode,qc
fastqc,fastqc_10,-f fastq -a adapters.txt -t 4 -o qc_results/ reads.fastq.gz,run fastqc with custom adapter sequences and format specification with default parameters,qc
fasttree,fasttree_01,-nt -gtr aligned_sequences.fasta > nucleotide_tree.nwk,infer phylogenetic tree from nucleotide alignment,phylogenetics
fasttree,fasttree_02,aligned_proteins.fasta > protein_tree.nwk,infer phylogenetic tree from protein alignment,phylogenetics
fasttree,fasttree_03,-wag aligned_proteins.fasta > wag_tree.nwk,infer tree with WAG protein substitution model,phylogenetics
fasttree,fasttree_04,-nt -gtr -boot 1000 -seed 42 aligned_sequences.fasta > tree_with_support.nwk,infer tree with local support values,phylogenetics
fasttree,fasttree_05,-nt -gtr aligned_sequences.fasta > tree.nwk,infer tree using multithreaded FastTreeMP,phylogenetics
fasttree,fasttree_06,-lg aligned_proteins.fasta > lg_tree.nwk,infer protein tree with LG substitution model,phylogenetics
fasttree,fasttree_07,-nt -gtr -fastest aligned_sequences.fasta > fast_tree.nwk,run faster but less thorough tree search,phylogenetics
fasttree,fasttree_08,-nt -gtr -gamma aligned_sequences.fasta > gamma_tree.nwk,infer tree with gamma-distributed rate variation,phylogenetics
fasttree,fasttree_09,-nt -gtr -n 1 alignment.phy > phylip_tree.nwk,infer tree from PHYLIP format input,phylogenetics
fasttree,fasttree_10,-nt -gtr -slownni aligned_sequences.fasta > thorough_tree.nwk,infer tree with more thorough nearest-neighbor interchange search,phylogenetics
featurecounts,featurecounts_01,-T 8 -a genes.gtf -o counts.txt -p -s 2 sample1.bam sample2.bam sample3.bam,count reads per gene for paired-end RNA-seq with reverse-strand library,rna-seq
featurecounts,featurecounts_02,-T 8 -a genes.gtf -o counts.txt -s 0 sample.bam,count reads per gene for unstranded single-end RNA-seq,rna-seq
featurecounts,featurecounts_03,-T 8 -a genes.gtf -o counts.txt -p -s 2 --primary -M -O sample.bam,count reads allowing multi-mapping reads to be counted,rna-seq
featurecounts,featurecounts_04,-T 4 -a peaks.saf -F SAF -o chip_counts.txt sample_sorted.bam,count ChIP-seq reads per peak region using BED file,rna-seq
featurecounts,featurecounts_05,-T 8 -f -a genes.gtf -o exon_counts.txt -p -s 2 sample.bam,count exon-level reads for exon usage analysis,rna-seq
featurecounts,featurecounts_06,-T 8 -a genes.gtf -o counts.txt -p -s 2 sample1.bam sample2.bam sample3.bam --verbose,count reads per gene for paired-end RNA-seq with reverse-strand library with verbose output,rna-seq
featurecounts,featurecounts_07,-T 8 -a genes.gtf -o counts.txt -s 0 sample.bam -t 4,count reads per gene for unstranded single-end RNA-seq using multiple threads,rna-seq
featurecounts,featurecounts_08,-T 8 -a genes.gtf -o counts.txt -p -s 2 --primary -M -O sample.bam,count reads allowing multi-mapping reads to be counted and write output to a file,rna-seq
featurecounts,featurecounts_09,-T 4 -a peaks.saf -F SAF -o chip_counts.txt sample_sorted.bam --quiet,count ChIP-seq reads per peak region using BED file in quiet mode,rna-seq
featurecounts,featurecounts_10,-T 8 -f -a genes.gtf -o exon_counts.txt -p -s 2 sample.bam,count exon-level reads for exon usage analysis with default parameters,rna-seq
find,find_01,. -type f -size +100M,find all files larger than 100 MB in the current directory tree,filesystem
find,find_02,. -name '*.py' -mtime -7,find all Python files modified in the last 7 days,filesystem
find,find_03,/tmp -name '*.tmp' -type f -delete,find and delete all .tmp files in a directory tree,filesystem
find,find_04,. -iname 'readme*',find files by name case-insensitively,filesystem
find,find_05,. -maxdepth 1 -type d,find all directories in the current directory (depth 1 only),filesystem
find,find_06,. -empty,find empty files and directories,filesystem
find,find_07,. -name '*.log' -exec gzip {} \;,find files and execute a command on each match,filesystem
find,find_08,/home -user alice -type f,find files owned by a specific user,filesystem
find,find_09,. -type f -newer reference_file.txt,find recently modified files and sort by modification time,filesystem
find,find_10,. -type f -perm /o+w,find files with specific permissions,filesystem
flye,flye_01,--nano-raw reads.fastq.gz --genome-size 5m --out-dir flye_output/ --threads 16,assemble bacterial genome from Oxford Nanopore reads,assembly
flye,flye_02,--pacbio-hifi hifi_reads.fastq.gz --genome-size 3g --out-dir hifi_assembly/ --threads 32,assemble genome from PacBio HiFi reads,assembly
flye,flye_03,--meta --nano-raw meta_reads.fastq.gz --out-dir meta_flye/ --threads 32,assemble metagenomic community from ONT reads,assembly
flye,flye_04,--nano-hq hq_reads.fastq.gz --genome-size 4.5m --out-dir hq_assembly/ --threads 16 --iterations 2,"assemble with high-quality ONT reads (R10, Q20+)",assembly
flye,flye_05,--nano-raw reads.fastq.gz --genome-size 5m --out-dir flye_output/ --threads 16 --resume,resume an interrupted Flye assembly,assembly
flye,flye_06,--nano-raw reads.fastq.gz --genome-size 5m --out-dir flye_output/ --threads 16 --verbose,assemble bacterial genome from Oxford Nanopore reads with verbose output,assembly
flye,flye_07,--pacbio-hifi hifi_reads.fastq.gz --genome-size 3g --out-dir hifi_assembly/ --threads 32,assemble genome from PacBio HiFi reads using multiple threads,assembly
flye,flye_08,--meta --nano-raw meta_reads.fastq.gz --out-dir meta_flye/ --threads 32 -o output.txt,assemble metagenomic community from ONT reads and write output to a file,assembly
flye,flye_09,--nano-hq hq_reads.fastq.gz --genome-size 4.5m --out-dir hq_assembly/ --threads 16 --iterations 2 --quiet,"assemble with high-quality ONT reads (R10, Q20+) in quiet mode",assembly
flye,flye_10,--nano-raw reads.fastq.gz --genome-size 5m --out-dir flye_output/ --threads 16 --resume,resume an interrupted Flye assembly with default parameters,assembly
freebayes,freebayes_01,-f reference.fa -b sample.bam > variants.vcf,call germline variants from a single sample BAM file,variant-calling
freebayes,freebayes_02,-f reference.fa --min-alternate-count 3 --min-alternate-fraction 0.2 -b sample.bam > filtered_variants.vcf,call variants with minimum coverage and allele frequency filters,variant-calling
freebayes,freebayes_03,-f reference.fa sample1.bam sample2.bam sample3.bam > cohort_variants.vcf,call variants jointly from multiple samples,variant-calling
freebayes,freebayes_04,-f reference.fa -r chr1 -b sample.bam > chr1_variants.vcf,call variants restricted to a specific genomic region,variant-calling
freebayes,freebayes_05,-f reference.fa --variant-input known_variants.vcf --only-use-input-alleles -b sample.bam > genotyped.vcf,call variants with population priors from a VCF,variant-calling
freebayes,freebayes_06,-f reference.fa -b sample.bam > variants.vcf,call germline variants from a single sample BAM file,variant-calling
freebayes,freebayes_07,-f reference.fa --min-alternate-count 3 --min-alternate-fraction 0.2 -b sample.bam > filtered_variants.vcf,call variants with minimum coverage and allele frequency filters,variant-calling
freebayes,freebayes_08,-f reference.fa sample1.bam sample2.bam sample3.bam > cohort_variants.vcf,call variants jointly from multiple samples,variant-calling
freebayes,freebayes_09,-f reference.fa -r chr1 -b sample.bam > chr1_variants.vcf,call variants restricted to a specific genomic region,variant-calling
freebayes,freebayes_10,-f reference.fa --variant-input known_variants.vcf --only-use-input-alleles -b sample.bam > genotyped.vcf,call variants with population priors from a VCF with default parameters,variant-calling
gatk,gatk_01,HaplotypeCaller -R reference.fa -I sorted_markdup.bam -O output.g.vcf.gz -ERC GVCF,call germline variants from a BAM file using HaplotypeCaller,variant-calling
gatk,gatk_02,HaplotypeCaller -R reference.fa -I sorted_markdup.bam -O variants.vcf.gz,genotype a single sample directly (not GVCF mode),variant-calling
gatk,gatk_03,MarkDuplicates -I input.bam -O markdup.bam -M metrics.txt,mark PCR duplicates in a BAM file,variant-calling
gatk,gatk_04,Mutect2 -R reference.fa -I tumor.bam -I normal.bam -normal normal_sample_name -O somatic.vcf.gz,call somatic mutations with Mutect2 using matched normal,variant-calling
gatk,gatk_05,FilterMutectCalls -R reference.fa -V somatic.vcf.gz -O filtered_somatic.vcf.gz,filter Mutect2 variants with FilterMutectCalls,variant-calling
gatk,gatk_06,CreateSequenceDictionary -R reference.fa,create a sequence dictionary for a reference FASTA,variant-calling
gatk,gatk_07,AddOrReplaceReadGroups -I input.bam -O output_rg.bam -RGID sample1 -RGLB lib1 -RGPL ILLUMINA -RGPU unit1 -RGSM sample1,add read group to a BAM file (required before GATK variant calling),variant-calling
gatk,gatk_08,BaseRecalibrator -R hg38.fa -I markdup.bam --known-sites dbsnp.vcf -O recal.table,perform base quality score recalibration (BQSR step 1) on markdup BAM with hg38 reference and dbSNP known sites,variant-calling
gatk,gatk_09,ApplyBQSR -R hg38.fa -I markdup.bam --bqsr-recal-file recal.table -O recal.bam,apply base quality score recalibration (BQSR step 2) to produce recalibrated BAM,variant-calling
gatk,gatk_10,SelectVariants -V variants.vcf -O SNPs.vcf --select-type-to-include SNP,select only SNPs from a variants VCF,variant-calling
git,git_01,clone --depth 1 --branch main https://github.com/user/repo.git,clone a repository with shallow history (last commit only) on a specific branch,version-control
git,git_02,"commit -a -m ""fix: resolve null pointer in parser""",stage all changes and commit with a message,version-control
git,git_03,push -u origin main,push the current branch to origin and set upstream tracking,version-control
git,git_04,checkout -b feature/new-api,create and switch to a new branch,version-control
git,git_05,log --oneline --graph --decorate --all,view the commit log with one-line summaries and branch graph,version-control
git,git_06,diff HEAD,show unstaged and staged changes,version-control
git,git_07,"stash push -m ""WIP: experiment with new feature""",stash current working tree changes to switch branches cleanly,version-control
git,git_08,rebase origin/main,rebase current branch onto main to update with upstream changes,version-control
git,git_09,rm --cached secrets.env,stop tracking a file without deleting it from disk,version-control
git,git_10,pull --rebase origin main,pull latest changes from remote and rebase local commits on top,version-control
grep,grep_01,"-in ""error"" application.log","search for a keyword in a file, ignoring case, with line numbers",text-processing
grep,grep_02,"-rn ""def connect"" --include='*.py' src/",recursively search all Python files for a function definition,text-processing
grep,grep_03,"-C 3 ""NullPointerException"" error.log",show context lines around each match,text-processing
grep,grep_04,"-c ""^ERROR"" server.log",count the number of matching lines in a file,text-processing
grep,grep_05,"-E ""(error|warning|fatal)"" app.log",search for multiple patterns using extended regex,text-processing
grep,grep_06,"-rl ""TODO"" src/",find files containing a pattern (list filenames only),text-processing
grep,grep_07,"-v ""^#"" config.ini",invert match: show lines that do NOT contain a pattern,text-processing
grep,grep_08,"-oE ""[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+"" access.log",extract only the matching part of each line,text-processing
grep,grep_09,"-Hn ""import"" *.py",search in multiple files and show filename with each match,text-processing
grep,grep_10,"-F ""error[0]"" debug.log",search for a fixed string (no regex interpretation),text-processing
gtdbtk,gtdbtk_01,classify_wf --genome_dir bins/ --out_dir gtdbtk_output/ --cpus 32 --extension fa,classify a directory of genome bins with GTDB-Tk,metagenomics
gtdbtk,gtdbtk_02,classify_wf --genome_dir bins/ --out_dir gtdbtk_output/ --cpus 32 --extension fasta --skip_ani_screen,classify genomes with custom GTDB database path,metagenomics
gtdbtk,gtdbtk_03,identify --genome_dir bins/ --out_dir gtdbtk_identify/ --cpus 16 --extension fa,run only the identification step (marker gene identification),metagenomics
gtdbtk,gtdbtk_04,classify_wf --genome_dir bins/ --out_dir gtdbtk_output/ --cpus 32 --extension fa --quiet,classify a directory of genome bins with GTDB-Tk in quiet mode,metagenomics
gtdbtk,gtdbtk_05,classify_wf --genome_dir bins/ --out_dir gtdbtk_output/ --cpus 32 --extension fasta --skip_ani_screen,classify genomes with custom GTDB database path with default parameters,metagenomics
gtdbtk,gtdbtk_06,identify --genome_dir bins/ --out_dir gtdbtk_identify/ --cpus 16 --extension fa --verbose,run only the identification step (marker gene identification) with verbose output,metagenomics
gtdbtk,gtdbtk_07,classify_wf --genome_dir bins/ --out_dir gtdbtk_output/ --cpus 32 --extension fa -t 4,classify a directory of genome bins with GTDB-Tk using multiple threads,metagenomics
gtdbtk,gtdbtk_08,classify_wf --genome_dir bins/ --out_dir gtdbtk_output/ --cpus 32 --extension fasta --skip_ani_screen -o output.txt,classify genomes with custom GTDB database path and write output to a file,metagenomics
gtdbtk,gtdbtk_09,identify --genome_dir bins/ --out_dir gtdbtk_identify/ --cpus 16 --extension fa --quiet,run only the identification step (marker gene identification) in quiet mode,metagenomics
gtdbtk,gtdbtk_10,classify_wf --genome_dir bins/ --out_dir gtdbtk_output/ --cpus 32 --extension fa,classify a directory of genome bins with GTDB-Tk with default parameters,metagenomics
hap_py,hap_py_01,-r reference.fa GIAB_truth.vcf.gz query_calls.vcf.gz -o benchmark_results --engine vcfeval -f HG001_highconf.bed --threads 8,benchmark a variant caller VCF against GIAB truth set,variant-calling
hap_py,hap_py_02,-r reference.fa truth.vcf.gz query.vcf.gz -o results --engine vcfeval -f confident.bed --report-prefix detailed_report --threads 8,benchmark with Stratification for SNPs and indels separately,variant-calling
hap_py,hap_py_03,-r reference.fa GIAB_truth.vcf.gz query_calls.vcf.gz -o benchmark_results --engine vcfeval -f HG001_highconf.bed --threads 8,benchmark a variant caller VCF against GIAB truth set and write output to a file,variant-calling
hap_py,hap_py_04,-r reference.fa truth.vcf.gz query.vcf.gz -o results --engine vcfeval -f confident.bed --report-prefix detailed_report --threads 8 --quiet,benchmark with Stratification for SNPs and indels separately in quiet mode,variant-calling
hap_py,hap_py_05,-r reference.fa GIAB_truth.vcf.gz query_calls.vcf.gz -o benchmark_results --engine vcfeval -f HG001_highconf.bed --threads 8,benchmark a variant caller VCF against GIAB truth set with default parameters,variant-calling
hap_py,hap_py_06,-r reference.fa truth.vcf.gz query.vcf.gz -o results --engine vcfeval -f confident.bed --report-prefix detailed_report --threads 8 --verbose,benchmark with Stratification for SNPs and indels separately with verbose output,variant-calling
hap_py,hap_py_07,-r reference.fa GIAB_truth.vcf.gz query_calls.vcf.gz -o benchmark_results --engine vcfeval -f HG001_highconf.bed --threads 8,benchmark a variant caller VCF against GIAB truth set using multiple threads,variant-calling
hap_py,hap_py_08,-r reference.fa truth.vcf.gz query.vcf.gz -o results --engine vcfeval -f confident.bed --report-prefix detailed_report --threads 8,benchmark with Stratification for SNPs and indels separately and write output to a file,variant-calling
hap_py,hap_py_09,-r reference.fa GIAB_truth.vcf.gz query_calls.vcf.gz -o benchmark_results --engine vcfeval -f HG001_highconf.bed --threads 8 --quiet,benchmark a variant caller VCF against GIAB truth set in quiet mode,variant-calling
hap_py,hap_py_10,-r reference.fa truth.vcf.gz query.vcf.gz -o results --engine vcfeval -f confident.bed --report-prefix detailed_report --threads 8,benchmark with Stratification for SNPs and indels separately with default parameters,variant-calling
hifiasm,hifiasm_01,-o assembly -t 32 hifi_reads.fastq.gz,assemble genome from PacBio HiFi reads,assembly
hifiasm,hifiasm_02,-o phased_assembly -t 32 --h1 hic_R1.fastq.gz --h2 hic_R2.fastq.gz hifi_reads.fastq.gz,haplotype-resolved assembly with Hi-C phasing data,assembly
hifiasm,hifiasm_03,-o assembly -t 32 --n-hap 4 hifi_reads.fastq.gz,assemble genome with custom number of haplotype rounds,assembly
hifiasm,hifiasm_04,-o assembly -t 32 --ul ultralong_reads.fastq.gz hifi_reads.fastq.gz,assemble with ultra-long ONT reads for improved scaffolding,assembly
hifiasm,hifiasm_05,-o assembly -t 32 -l 3 hifi_reads.fastq.gz,assemble with aggressive duplicate purging,assembly
hifiasm,hifiasm_06,-o assembly -t 32 hifi_reads.fastq.gz --verbose,assemble genome from PacBio HiFi reads with verbose output,assembly
hifiasm,hifiasm_07,-o phased_assembly -t 32 --h1 hic_R1.fastq.gz --h2 hic_R2.fastq.gz hifi_reads.fastq.gz,haplotype-resolved assembly with Hi-C phasing data using multiple threads,assembly
hifiasm,hifiasm_08,-o assembly -t 32 --n-hap 4 hifi_reads.fastq.gz,assemble genome with custom number of haplotype rounds and write output to a file,assembly
hifiasm,hifiasm_09,-o assembly -t 32 --ul ultralong_reads.fastq.gz hifi_reads.fastq.gz --quiet,assemble with ultra-long ONT reads for improved scaffolding in quiet mode,assembly
hifiasm,hifiasm_10,-o assembly -t 32 -l 3 hifi_reads.fastq.gz,assemble with aggressive duplicate purging with default parameters,assembly
hisat2,hisat2_01,hisat2-build -p 8 genome.fa genome_index,build a HISAT2 genome index from a reference FASTA,alignment
hisat2,hisat2_02,-p 8 -x genome_index -1 R1.fastq.gz -2 R2.fastq.gz --dta -S aligned.sam,align paired-end RNA-seq reads to the genome with 8 threads,alignment
hisat2,hisat2_03,-p 8 -x genome_index -1 R1.fastq.gz -2 R2.fastq.gz --dta | samtools sort -@ 4 -o sorted.bam,align paired-end RNA-seq reads and output sorted BAM directly,alignment
hisat2,hisat2_04,-p 8 -x genome_index -1 R1.fastq.gz -2 R2.fastq.gz --rna-strandness RF --dta -S aligned.sam,align strand-specific paired-end RNA-seq (reverse-strand library),alignment
hisat2,hisat2_05,-p 4 -x genome_index -U reads.fastq.gz --dta -S aligned.sam,align single-end RNA-seq reads,alignment
hisat2,hisat2_06,hisat2-build -p 8 genome.fa genome_spliceaware_index --ss splice_sites.txt --exon exons.txt,build splice-site aware index using GTF annotation for improved RNA-seq,alignment
hisat2,hisat2_07,-p 8 -x genome_index -1 R1.fastq.gz -2 R2.fastq.gz --rna-strandness RF --dta -S aligned.sam 2> align_summary.txt,align paired-end reads with strand information and save alignment statistics,alignment
hisat2,hisat2_08,-p 8 -x genome_index -U reads.fastq.gz --no-spliced-alignment -S aligned.sam,align single-end reads in genomic (non-spliced) mode for DNA-seq,alignment
hisat2,hisat2_09,-p 8 -x genome_index -1 R1.fastq.gz -2 R2.fastq.gz --dta --no-unal -S aligned.sam,align paired-end reads and discard unmapped reads,alignment
hisat2,hisat2_10,-p 8 -x genome_index -1 R1.fastq.gz -2 R2.fastq.gz --dta | samtools view -b -q 1 -o unique_aligned.bam,align paired-end reads and output only uniquely mapped reads,alignment
hmmer,hmmer_01,hmmscan --cpu 8 --tblout pfam_hits.tbl --domtblout pfam_domains.tbl -E 1e-5 Pfam-A.hmm proteins.faa > pfam_output.txt,search a protein database against Pfam HMM profiles (domain annotation),sequence-utilities
hmmer,hmmer_02,hmmsearch --cpu 8 --tblout hits.tbl --domtblout domain_hits.tbl -E 1e-10 gene_family.hmm sequences.faa > hmmsearch_out.txt,search a protein HMM profile against a sequence database,sequence-utilities
hmmer,hmmer_03,hmmbuild --cpu 8 gene_family.hmm aligned_sequences.sto,build a profile HMM from a multiple sequence alignment,sequence-utilities
hmmer,hmmer_04,hmmpress Pfam-A.hmm,press Pfam database for hmmscan indexing,sequence-utilities
hmmer,hmmer_05,phmmer --cpu 8 --tblout phmmer_hits.tbl -E 1e-5 query_protein.faa target_database.faa > phmmer_out.txt,search proteins with phmmer (BLAST-like single sequence query),sequence-utilities
hmmer,hmmer_06,hmmscan --cpu 8 --tblout pfam_hits.tbl --domtblout pfam_domains.tbl -E 1e-5 Pfam-A.hmm proteins.faa > pfam_output.txt,search a protein database against Pfam HMM profiles (domain annotation),sequence-utilities
hmmer,hmmer_07,hmmsearch --cpu 8 --tblout hits.tbl --domtblout domain_hits.tbl -E 1e-10 gene_family.hmm sequences.faa > hmmsearch_out.txt,search a protein HMM profile against a sequence database,sequence-utilities
hmmer,hmmer_08,hmmbuild --cpu 8 gene_family.hmm aligned_sequences.sto -o output.txt,build a profile HMM from a multiple sequence alignment and write output to a file,sequence-utilities
hmmer,hmmer_09,hmmpress Pfam-A.hmm --quiet,press Pfam database for hmmscan indexing in quiet mode,sequence-utilities
hmmer,hmmer_10,phmmer --cpu 8 --tblout phmmer_hits.tbl -E 1e-5 query_protein.faa target_database.faa > phmmer_out.txt,search proteins with phmmer (BLAST-like single sequence query) with default parameters,sequence-utilities
homer,homer_01,makeTagDirectory chipseq_tags/ sample.bam -genome hg38 -checkGC,create a HOMER tag directory from a BAM file,epigenomics
homer,homer_02,findPeaks chipseq_tags/ -style factor -i input_tags/ -o peaks.txt,call narrow transcription factor peaks with an input control,epigenomics
homer,homer_03,findPeaks chipseq_tags/ -style histone -i input_tags/ -o broad_peaks.txt,"call broad histone modification peaks (e.g., H3K27me3)",epigenomics
homer,homer_04,annotatePeaks.pl peaks.txt hg38 -gtf genes.gtf > annotated_peaks.txt,annotate peaks with genomic features using hg38 RefSeq annotation,epigenomics
homer,homer_05,findMotifsGenome.pl peaks.txt hg38 motif_output/ -size 200 -mask -p 8,run de novo and known motif analysis on ChIP-seq peaks,epigenomics
homer,homer_06,mergePeaks rep1_peaks.txt rep2_peaks.txt -d 100 -prefix merged_peaks -venn venn.txt,merge peak files from two ChIP-seq replicates,epigenomics
homer,homer_07,pos2bed.pl peaks.txt > peaks.bed,convert HOMER peak file to BED format,epigenomics
homer,homer_08,makeTagDirectory chipseq_tags/ sample.bam -genome hg38 -checkGC -o output.txt,create a HOMER tag directory from a BAM file and write output to a file,epigenomics
homer,homer_09,findPeaks chipseq_tags/ -style factor -i input_tags/ -o peaks.txt --quiet,call narrow transcription factor peaks with an input control in quiet mode,epigenomics
homer,homer_10,findPeaks chipseq_tags/ -style histone -i input_tags/ -o broad_peaks.txt,"call broad histone modification peaks (e.g., H3K27me3) with default parameters",epigenomics
igvtools,igvtools_01,count -z 5 -w 25 sorted.bam coverage.tdf hg38,create coverage TDF track from BAM file,utilities
igvtools,igvtools_02,index variants.vcf,index a VCF file for IGV,utilities
igvtools,igvtools_03,sort input.bed sorted.bed,sort a BED file for IGV indexing,utilities
igvtools,igvtools_04,count -z 5 -w 25 sorted.bam coverage.tdf hg38 --quiet,create coverage TDF track from BAM file in quiet mode,utilities
igvtools,igvtools_05,index variants.vcf,index a VCF file for IGV with default parameters,utilities
igvtools,igvtools_06,sort input.bed sorted.bed --verbose,sort a BED file for IGV indexing with verbose output,utilities
igvtools,igvtools_07,count -z 5 -w 25 sorted.bam coverage.tdf hg38 -t 4,create coverage TDF track from BAM file using multiple threads,utilities
igvtools,igvtools_08,index variants.vcf -o output.txt,index a VCF file for IGV and write output to a file,utilities
igvtools,igvtools_09,sort input.bed sorted.bed --quiet,sort a BED file for IGV indexing in quiet mode,utilities
igvtools,igvtools_10,count -z 5 -w 25 sorted.bam coverage.tdf hg38,create coverage TDF track from BAM file with default parameters,utilities
iqtree2,iqtree2_01,-s alignment.fasta -m MFP --prefix my_tree -T AUTO,infer maximum-likelihood tree with automatic model selection,phylogenetics
iqtree2,iqtree2_02,-s alignment.fasta -m MFP -B 1000 --bnni --prefix bootstrap_tree -T 8,infer tree with ultrafast bootstrap and model selection,phylogenetics
iqtree2,iqtree2_03,-s protein_alignment.fasta -st AA -m TEST -B 1000 --bnni --prefix protein_tree -T 8,infer phylogenetic tree for protein sequences,phylogenetics
iqtree2,iqtree2_04,-s alignment.fasta -m MFP -B 1000 --prefix main_tree -T 8 --gcf gene_trees.txt --scfl 100,infer concordance factor analysis for assessing gene tree discordance,phylogenetics
iqtree2,iqtree2_05,-s alignment.fasta -m MFP -b 100 -o outgroup_taxon --prefix rooted_tree -T 8,infer tree with standard bootstrap and specified outgroup,phylogenetics
iqtree2,iqtree2_06,-s alignment.fasta -m MFP --prefix my_tree -T AUTO --verbose,infer maximum-likelihood tree with automatic model selection with verbose output,phylogenetics
iqtree2,iqtree2_07,-s alignment.fasta -m MFP -B 1000 --bnni --prefix bootstrap_tree -T 8 -t 4,infer tree with ultrafast bootstrap and model selection using multiple threads,phylogenetics
iqtree2,iqtree2_08,-s protein_alignment.fasta -st AA -m TEST -B 1000 --bnni --prefix protein_tree -T 8 -o output.txt,infer phylogenetic tree for protein sequences and write output to a file,phylogenetics
iqtree2,iqtree2_09,-s alignment.fasta -m MFP -B 1000 --prefix main_tree -T 8 --gcf gene_trees.txt --scfl 100 --quiet,infer concordance factor analysis for assessing gene tree discordance in quiet mode,phylogenetics
iqtree2,iqtree2_10,-s alignment.fasta -m MFP -b 100 -o outgroup_taxon --prefix rooted_tree -T 8,infer tree with standard bootstrap and specified outgroup with default parameters,phylogenetics
java,java_01,-version,check installed Java version,programming
java,java_02,-Xmx16g -jar picard.jar SortSam I=input.bam O=sorted.bam SORT_ORDER=coordinate,run a JAR-based tool with increased heap memory,programming
java,java_03,-Xmx8g -XX:+UseG1GC -Djava.io.tmpdir=/scratch/tmp -jar gatk.jar HaplotypeCaller -R ref.fa -I input.bam -O out.vcf,run GATK with custom tmp directory and GC settings,programming
java,java_04,-Xmx2g -jar fastqc.jar --threads 4 sample.fastq.gz,run FastQC via its JAR directly,programming
java,java_05,-XshowSettings:all -version,show all system properties and JVM settings,programming
java,java_06,-XX:+PrintFlagsFinal -version,list available JVM garbage collectors and tuning flags,programming
java,java_07,-Xmx4g -jar trimmomatic.jar PE -threads 8 R1.fastq.gz R2.fastq.gz R1_trimmed.fastq.gz R1_unpaired.fastq.gz R2_trimmed.fastq.gz R2_unpaired.fastq.gz ILLUMINACLIP:adapters.fa:2:30:10,run Trimmomatic via its JAR,programming
java,java_08,-XX:+PrintFlagsFinal -version 2>&1,check available JVM memory settings,programming
java,java_09,-cp /path/to/lib1.jar:/path/to/lib2.jar com.example.MainClass arg1 arg2,run a JAR with a custom classpath,programming
java,java_10,-version,check installed Java version with default parameters,programming
julia,julia_01,script.jl,run a Julia script,programming
julia,julia_02,--project=. script.jl,run a script in a specific project environment,programming
julia,julia_03,--threads auto script.jl,run a script with multiple threads,programming
julia,julia_04,"-e 'using Pkg; Pkg.add(""BioSequences"")'",install a package from the Julia REPL (batch mode),programming
julia,julia_05,-e 'using Pkg; Pkg.status()',show installed packages in the current environment,programming
julia,julia_06,"-e 'using Pkg; Pkg.add([""BioSequences"",""FASTX"",""GenomicFeatures""])'",add BioJulia packages for bioinformatics,programming
julia,julia_07,--startup-file=no --project=. script.jl,run script without loading startup.jl (for CI/pipelines),programming
julia,julia_08,-e 'println(VERSION); println(DEPOT_PATH)',check Julia version and depot paths,programming
julia,julia_09,--compile=all -O2 script.jl,compile a script ahead of time to reduce startup latency,programming
julia,julia_10,-e 'import Pluto; Pluto.run(port=1234)',run a Pluto notebook server on a specific port,programming
kallisto,kallisto_01,index -i transcriptome.idx transcriptome.fa,build a kallisto index from a transcriptome FASTA,rna-seq
kallisto,kallisto_02,quant -i transcriptome.idx -o sample_output -b 100 --threads 8 R1.fastq.gz R2.fastq.gz,quantify paired-end RNA-seq reads,rna-seq
kallisto,kallisto_03,quant -i transcriptome.idx -o sample_output --single -l 200 -s 20 -b 100 --threads 8 reads.fastq.gz,quantify single-end RNA-seq reads with fragment length parameters,rna-seq
kallisto,kallisto_04,quant -i transcriptome.idx -o sample_output --rf-stranded -b 100 --threads 8 R1.fastq.gz R2.fastq.gz,quantify strand-specific reverse-strand paired-end RNA-seq,rna-seq
kallisto,kallisto_05,quant -i transcriptome.idx -o sample1_out -b 50 --threads 4 sample1_R1.fq.gz sample1_R2.fq.gz,quantify multiple samples in batch,rna-seq
kallisto,kallisto_06,index -i transcriptome.idx transcriptome.fa --verbose,build a kallisto index from a transcriptome FASTA with verbose output,rna-seq
kallisto,kallisto_07,quant -i transcriptome.idx -o sample_output -b 100 --threads 8 R1.fastq.gz R2.fastq.gz,quantify paired-end RNA-seq reads using multiple threads,rna-seq
kallisto,kallisto_08,quant -i transcriptome.idx -o sample_output --single -l 200 -s 20 -b 100 --threads 8 reads.fastq.gz,quantify single-end RNA-seq reads with fragment length parameters and write output to a file,rna-seq
kallisto,kallisto_09,quant -i transcriptome.idx -o sample_output --rf-stranded -b 100 --threads 8 R1.fastq.gz R2.fastq.gz --quiet,quantify strand-specific reverse-strand paired-end RNA-seq in quiet mode,rna-seq
kallisto,kallisto_10,quant -i transcriptome.idx -o sample1_out -b 50 --threads 4 sample1_R1.fq.gz sample1_R2.fq.gz,quantify multiple samples in batch with default parameters,rna-seq
kb,kb_01,ref -i index.idx -g t2g.txt -f1 cdna.fasta genome.fa genes.gtf,build kb reference from genome and GTF,single-cell
kb,kb_02,count -i index.idx -g t2g.txt -x 10xv3 -o output_dir/ -t 16 R1.fastq.gz R2.fastq.gz,process 10x Chromium v3 scRNA-seq FASTQ files,single-cell
kb,kb_03,count -i spliced_unspliced.idx -g t2g.txt -x 10xv3 --workflow lamanno -o velocity_output/ -t 16 R1.fastq.gz R2.fastq.gz,process scRNA-seq with RNA velocity output,single-cell
kb,kb_04,count -i index.idx -g t2g.txt -x 10xv3 --h5ad -o output_dir/ -t 16 R1.fastq.gz R2.fastq.gz,process 10x Chromium v3 and output AnnData for Scanpy,single-cell
kb,kb_05,ref -i index.idx -g t2g.txt -f1 cdna.fasta genome.fa genes.gtf,build kb reference from genome and GTF with default parameters,single-cell
kb,kb_06,count -i index.idx -g t2g.txt -x 10xv3 -o output_dir/ -t 16 R1.fastq.gz R2.fastq.gz --verbose,process 10x Chromium v3 scRNA-seq FASTQ files with verbose output,single-cell
kb,kb_07,count -i spliced_unspliced.idx -g t2g.txt -x 10xv3 --workflow lamanno -o velocity_output/ -t 16 R1.fastq.gz R2.fastq.gz,process scRNA-seq with RNA velocity output using multiple threads,single-cell
kb,kb_08,count -i index.idx -g t2g.txt -x 10xv3 --h5ad -o output_dir/ -t 16 R1.fastq.gz R2.fastq.gz,process 10x Chromium v3 and output AnnData for Scanpy and write output to a file,single-cell
kb,kb_09,ref -i index.idx -g t2g.txt -f1 cdna.fasta genome.fa genes.gtf --quiet,build kb reference from genome and GTF in quiet mode,single-cell
kb,kb_10,count -i index.idx -g t2g.txt -x 10xv3 -o output_dir/ -t 16 R1.fastq.gz R2.fastq.gz,process 10x Chromium v3 scRNA-seq FASTQ files with default parameters,single-cell
kraken2,kraken2_01,--db /path/to/kraken2_db --paired --threads 8 --output kraken_output.txt --report kraken_report.txt R1.fastq.gz R2.fastq.gz,classify paired-end metagenomic reads against the standard database,metagenomics
kraken2,kraken2_02,--db /path/to/kraken2_db --paired --confidence 0.1 --threads 8 --output kraken_out.txt --report kraken_report.txt --unclassified-out unclassified#.fastq R1.fastq.gz R2.fastq.gz,classify reads with confidence threshold and save unclassified reads,metagenomics
kraken2,kraken2_03,--db /path/to/kraken2_db --threads 8 --output kraken_out.txt --report kraken_report.txt reads.fastq.gz,classify single-end reads and generate report,metagenomics
kraken2,kraken2_04,--db /path/to/kraken2_db --paired --threads 8 --output kraken_out.txt --report kraken_report.txt --classified-out classified#.fastq R1.fastq.gz R2.fastq.gz,classify reads and extract classified reads for downstream analysis,metagenomics
kraken2,kraken2_05,--db /path/to/kraken2_db --paired --threads 8 --output kraken_output.txt --report kraken_report.txt R1.fastq.gz R2.fastq.gz,classify paired-end metagenomic reads against the standard database with default parameters,metagenomics
kraken2,kraken2_06,--db /path/to/kraken2_db --paired --confidence 0.1 --threads 8 --output kraken_out.txt --report kraken_report.txt --unclassified-out unclassified#.fastq R1.fastq.gz R2.fastq.gz --verbose,classify reads with confidence threshold and save unclassified reads with verbose output,metagenomics
kraken2,kraken2_07,--db /path/to/kraken2_db --threads 8 --output kraken_out.txt --report kraken_report.txt reads.fastq.gz,classify single-end reads and generate report using multiple threads,metagenomics
kraken2,kraken2_08,--db /path/to/kraken2_db --paired --threads 8 --output kraken_out.txt --report kraken_report.txt --classified-out classified#.fastq R1.fastq.gz R2.fastq.gz,classify reads and extract classified reads for downstream analysis and write output to a file,metagenomics
kraken2,kraken2_09,--db /path/to/kraken2_db --paired --threads 8 --output kraken_output.txt --report kraken_report.txt R1.fastq.gz R2.fastq.gz --quiet,classify paired-end metagenomic reads against the standard database in quiet mode,metagenomics
kraken2,kraken2_10,--db /path/to/kraken2_db --paired --confidence 0.1 --threads 8 --output kraken_out.txt --report kraken_report.txt --unclassified-out unclassified#.fastq R1.fastq.gz R2.fastq.gz,classify reads with confidence threshold and save unclassified reads with default parameters,metagenomics
liftoff,liftoff_01,target.fasta reference.fasta -g reference.gff3 -o lifted.gff3 -u unmapped.txt,lift annotations from reference GFF3 to a new assembly,annotation
liftoff,liftoff_02,target.fasta reference.fasta -g reference.gff3 -o lifted.gff3 -copies -u unmapped.txt,lift annotations and copy multi-copy gene families,annotation
liftoff,liftoff_03,target.fasta reference.fasta -g reference.gff3 -o lifted.gff3 -s 0.85 -a 0.85 -u unmapped.txt,lift annotations between closely related species with lower identity threshold,annotation
liftoff,liftoff_04,target.fasta reference.fasta -db reference.db -o lifted.gff3 -u unmapped.txt,speed up repeated runs using a pre-built gffutils database,annotation
liftoff,liftoff_05,target.fasta reference.fasta -g reference.gff3 -o lifted.gff3 -dir scratch_dir/ -p 16 -u unmapped.txt,lift annotations and write output to a specific directory with minimap2 intermediates,annotation
liftoff,liftoff_06,target.fasta reference.fasta -g reference.gff3 -o lifted.gff3 -f gene -u unmapped.txt,"lift only specific feature types (e.g., just genes)",annotation
liftoff,liftoff_07,target.fasta reference.fasta -g reference.gff3 -o lifted.gff3 -u unmapped.txt -t 4,lift annotations from reference GFF3 to a new assembly using multiple threads,annotation
liftoff,liftoff_08,target.fasta reference.fasta -g reference.gff3 -o lifted.gff3 -copies -u unmapped.txt,lift annotations and copy multi-copy gene families and write output to a file,annotation
liftoff,liftoff_09,target.fasta reference.fasta -g reference.gff3 -o lifted.gff3 -s 0.85 -a 0.85 -u unmapped.txt --quiet,lift annotations between closely related species with lower identity threshold in quiet mode,annotation
liftoff,liftoff_10,target.fasta reference.fasta -db reference.db -o lifted.gff3 -u unmapped.txt,speed up repeated runs using a pre-built gffutils database with default parameters,annotation
longshot,longshot_01,-b sorted.bam -f reference.fa -o snps.vcf,call SNPs from Oxford Nanopore aligned reads,variant-calling
longshot,longshot_02,-b sorted.bam -f reference.fa -o chr1_snps.vcf -r chr1:1000000-2000000,call SNPs restricted to a specific region,variant-calling
longshot,longshot_03,-b sorted.bam -f reference.fa -o snps_filtered.vcf -m 10 -q 20,call SNPs with minimum coverage filter,variant-calling
longshot,longshot_04,-b sorted.bam -f reference.fa -o snps.vcf --quiet,call SNPs from Oxford Nanopore aligned reads in quiet mode,variant-calling
longshot,longshot_05,-b sorted.bam -f reference.fa -o chr1_snps.vcf -r chr1:1000000-2000000,call SNPs restricted to a specific region with default parameters,variant-calling
longshot,longshot_06,-b sorted.bam -f reference.fa -o snps_filtered.vcf -m 10 -q 20 --verbose,call SNPs with minimum coverage filter with verbose output,variant-calling
longshot,longshot_07,-b sorted.bam -f reference.fa -o snps.vcf -t 4,call SNPs from Oxford Nanopore aligned reads using multiple threads,variant-calling
longshot,longshot_08,-b sorted.bam -f reference.fa -o chr1_snps.vcf -r chr1:1000000-2000000,call SNPs restricted to a specific region and write output to a file,variant-calling
longshot,longshot_09,-b sorted.bam -f reference.fa -o snps_filtered.vcf -m 10 -q 20 --quiet,call SNPs with minimum coverage filter in quiet mode,variant-calling
longshot,longshot_10,-b sorted.bam -f reference.fa -o snps.vcf,call SNPs from Oxford Nanopore aligned reads with default parameters,variant-calling
macs2,macs2_01,callpeak -t chip.bam -c input.bam -f BAM -g hs -n sample_chip -q 0.05 --outdir chip_peaks/,call narrow peaks from ChIP-seq data with input control,epigenomics
macs2,macs2_02,callpeak -t h3k27me3.bam -c input.bam -f BAM -g hs --broad --broad-cutoff 0.1 -n h3k27me3 --outdir broad_peaks/,call broad peaks for histone mark (H3K27me3) ChIP-seq,epigenomics
macs2,macs2_03,callpeak -t atac.bam -f BAM -g hs --nomodel --shift -100 --extsize 200 -n atac_sample -q 0.05 --outdir atac_peaks/,call ATAC-seq peaks using nucleosome-free region model,epigenomics
macs2,macs2_04,callpeak -t atac_pe.bam -f BAMPE -g hs -n atac_pe_sample -q 0.05 --outdir atac_pe_peaks/,call peaks from paired-end ATAC-seq BAM,epigenomics
macs2,macs2_05,callpeak -t atac.bam -f BAM -g hs --nomodel --shift -100 --extsize 200 --keep-dup all -n open_chromatin --outdir atac_out/,call peaks without control for ATAC-seq open chromatin,epigenomics
macs2,macs2_06,callpeak -t chip.bam -c input.bam -f BAM -g hs -n sample_chip -q 0.05 --outdir chip_peaks/ --verbose,call narrow peaks from ChIP-seq data with input control with verbose output,epigenomics
macs2,macs2_07,callpeak -t h3k27me3.bam -c input.bam -f BAM -g hs --broad --broad-cutoff 0.1 -n h3k27me3 --outdir broad_peaks/,call broad peaks for histone mark (H3K27me3) ChIP-seq using multiple threads,epigenomics
macs2,macs2_08,callpeak -t atac.bam -f BAM -g hs --nomodel --shift -100 --extsize 200 -n atac_sample -q 0.05 --outdir atac_peaks/ -o output.txt,call ATAC-seq peaks using nucleosome-free region model and write output to a file,epigenomics
macs2,macs2_09,callpeak -t atac_pe.bam -f BAMPE -g hs -n atac_pe_sample -q 0.05 --outdir atac_pe_peaks/ --quiet,call peaks from paired-end ATAC-seq BAM in quiet mode,epigenomics
macs2,macs2_10,callpeak -t atac.bam -f BAM -g hs --nomodel --shift -100 --extsize 200 --keep-dup all -n open_chromatin --outdir atac_out/,call peaks without control for ATAC-seq open chromatin with default parameters,epigenomics
mafft,mafft_01,--auto --thread 8 proteins.fasta > aligned_proteins.fasta,align multiple protein sequences with automatic algorithm selection,phylogenetics
mafft,mafft_02,--localpair --maxiterate 1000 --thread 8 sequences.fasta > aligned_localpair.fasta,highly accurate multiple sequence alignment for fewer than 200 sequences,phylogenetics
mafft,mafft_03,--auto --adjustdirectionaccurately --thread 8 rna_sequences.fasta > aligned_rna.fasta,align RNA sequences adjusting for strand orientation,phylogenetics
mafft,mafft_04,--auto --thread 8 --phylipout sequences.fasta > aligned.phy,align sequences and output in PHYLIP format for phylogenetic analysis,phylogenetics
mafft,mafft_05,--add new_sequences.fasta --thread 8 existing_alignment.fasta > updated_alignment.fasta,add new sequences to existing alignment,phylogenetics
mafft,mafft_06,--auto --thread 8 proteins.fasta > aligned_proteins.fasta,align multiple protein sequences with automatic algorithm selection,phylogenetics
mafft,mafft_07,--localpair --maxiterate 1000 --thread 8 sequences.fasta > aligned_localpair.fasta,highly accurate multiple sequence alignment for fewer than 200 sequences,phylogenetics
mafft,mafft_08,--auto --adjustdirectionaccurately --thread 8 rna_sequences.fasta > aligned_rna.fasta,align RNA sequences adjusting for strand orientation,phylogenetics
mafft,mafft_09,--auto --thread 8 --phylipout sequences.fasta > aligned.phy,align sequences and output in PHYLIP format for phylogenetic analysis,phylogenetics
mafft,mafft_10,--add new_sequences.fasta --thread 8 existing_alignment.fasta > updated_alignment.fasta,add new sequences to existing alignment with default parameters,phylogenetics
mash,mash_01,sketch -o genomes_db *.fasta,sketch a collection of genome FASTA files into a single database,sequence-utilities
mash,mash_02,dist genome1.fasta genome2.fasta,compute pairwise distances between two genome sketches,sequence-utilities
mash,mash_03,dist -p 16 genomes_db.msh query.fasta | sort -k3 -n | head -20,query all genomes in a database against a query genome,sequence-utilities
mash,mash_04,sketch -m 2 -s 10000 -o reads_sketch reads.fastq.gz,sketch raw sequencing reads with error filtering,sequence-utilities
mash,mash_05,screen -w -p 8 refdb.msh metagenome.fastq.gz | sort -gr -k1 > screen_results.txt,screen a metagenome for known reference genomes,sequence-utilities
mash,mash_06,triangle -p 16 genomes_db.msh > distances.tsv,compute all-vs-all distance triangle for genome clustering,sequence-utilities
mash,mash_07,sketch -o genomes_db *.fasta -t 4,sketch a collection of genome FASTA files into a single database using multiple threads,sequence-utilities
mash,mash_08,dist genome1.fasta genome2.fasta -o output.txt,compute pairwise distances between two genome sketches and write output to a file,sequence-utilities
mash,mash_09,dist -p 16 genomes_db.msh query.fasta | sort -k3 -n | head -20,query all genomes in a database against a query genome,sequence-utilities
mash,mash_10,sketch -m 2 -s 10000 -o reads_sketch reads.fastq.gz,sketch raw sequencing reads with error filtering with default parameters,sequence-utilities
medaka,medaka_01,medaka_consensus -i reads.fastq.gz -d draft_assembly.fasta -o medaka_output/ -t 8 -m r941_min_hac_g507,polish an ONT assembly with Medaka (all-in-one pipeline),variant-calling
medaka,medaka_02,medaka_haploid_variant -i reads.fastq.gz -r reference.fasta -o medaka_variants/ -t 8 -m r941_min_hac_g507,call variants from ONT reads (haploid),variant-calling
medaka,medaka_03,tools list_models,list available Medaka models,variant-calling
medaka,medaka_04,medaka_consensus -i reads.fastq.gz -d draft.fasta -o medaka_gpu/ -t 2 -m r1041_e82_400bps_hac_v4.2.0 --gpu,run Medaka consensus with GPU acceleration,variant-calling
medaka,medaka_05,medaka_consensus -i reads.fastq.gz -d draft_assembly.fasta -o medaka_output/ -t 8 -m r941_min_hac_g507,polish an ONT assembly with Medaka (all-in-one pipeline) with default parameters,variant-calling
medaka,medaka_06,medaka_haploid_variant -i reads.fastq.gz -r reference.fasta -o medaka_variants/ -t 8 -m r941_min_hac_g507 --verbose,call variants from ONT reads (haploid) with verbose output,variant-calling
medaka,medaka_07,tools list_models -t 4,list available Medaka models using multiple threads,variant-calling
medaka,medaka_08,medaka_consensus -i reads.fastq.gz -d draft.fasta -o medaka_gpu/ -t 2 -m r1041_e82_400bps_hac_v4.2.0 --gpu,run Medaka consensus with GPU acceleration and write output to a file,variant-calling
medaka,medaka_09,medaka_consensus -i reads.fastq.gz -d draft_assembly.fasta -o medaka_output/ -t 8 -m r941_min_hac_g507 --quiet,polish an ONT assembly with Medaka (all-in-one pipeline) in quiet mode,variant-calling
medaka,medaka_10,medaka_haploid_variant -i reads.fastq.gz -r reference.fasta -o medaka_variants/ -t 8 -m r941_min_hac_g507,call variants from ONT reads (haploid) with default parameters,variant-calling
megahit,megahit_01,-1 R1.fastq.gz -2 R2.fastq.gz -o megahit_output/ --num-cpu-threads 16 --min-contig-len 500,assemble a metagenome from paired-end reads,assembly
megahit,megahit_02,-1 R1.fastq.gz -2 R2.fastq.gz -o large_meta/ --num-cpu-threads 32 --presets meta-large --min-contig-len 500,assemble a large complex metagenome with meta-large preset,assembly
megahit,megahit_03,"-1 s1_R1.fq.gz,s2_R1.fq.gz -2 s1_R2.fq.gz,s2_R2.fq.gz -o coassembly/ --num-cpu-threads 32 --min-contig-len 500",assemble metagenome from multiple samples combined,assembly
megahit,megahit_04,-1 R1.fastq.gz -2 R2.fastq.gz -o custom_k/ --num-cpu-threads 16 --k-min 27 --k-max 127 --k-step 10,assemble with custom k-mer range for specific data type,assembly
megahit,megahit_05,-1 R1.fastq.gz -2 R2.fastq.gz -o megahit_output/ --num-cpu-threads 16 --min-contig-len 500,assemble a metagenome from paired-end reads with default parameters,assembly
megahit,megahit_06,-1 R1.fastq.gz -2 R2.fastq.gz -o large_meta/ --num-cpu-threads 32 --presets meta-large --min-contig-len 500 --verbose,assemble a large complex metagenome with meta-large preset with verbose output,assembly
megahit,megahit_07,"-1 s1_R1.fq.gz,s2_R1.fq.gz -2 s1_R2.fq.gz,s2_R2.fq.gz -o coassembly/ --num-cpu-threads 32 --min-contig-len 500 -t 4",assemble metagenome from multiple samples combined using multiple threads,assembly
megahit,megahit_08,-1 R1.fastq.gz -2 R2.fastq.gz -o custom_k/ --num-cpu-threads 16 --k-min 27 --k-max 127 --k-step 10,assemble with custom k-mer range for specific data type and write output to a file,assembly
megahit,megahit_09,-1 R1.fastq.gz -2 R2.fastq.gz -o megahit_output/ --num-cpu-threads 16 --min-contig-len 500 --quiet,assemble a metagenome from paired-end reads in quiet mode,assembly
megahit,megahit_10,-1 R1.fastq.gz -2 R2.fastq.gz -o large_meta/ --num-cpu-threads 32 --presets meta-large --min-contig-len 500,assemble a large complex metagenome with meta-large preset with default parameters,assembly
meme,meme_01,-dna -mod zoops -nmotifs 10 -minw 6 -maxw 20 -oc meme_output peaks.fasta,discover de novo motifs in ChIP-seq peak sequences,utilities
meme,meme_02,fimo --thresh 1e-4 --oc fimo_output $MEME/share/meme/db/motif_databases/JASPAR/JASPAR2022_CORE_vertebrates_non-redundant_v2.meme peaks.fasta,scan sequences for known TF binding motifs with FIMO,utilities
meme,meme_03,tomtom -oc tomtom_output meme_output/meme.xml $MEME/share/meme/db/motif_databases/JASPAR/JASPAR2022_CORE_vertebrates_non-redundant_v2.meme,compare discovered motifs against a known database with TOMTOM,utilities
meme,meme_04,ame --oc ame_output --control shuffled_bg.fasta peaks.fasta $MEME/share/meme/db/motif_databases/HOCOMOCO/HOCOMOCOv11_core_HUMAN_mono_meme_format.meme,test motif enrichment in a foreground vs background with AME,utilities
meme,meme_05,streme --oc streme_output --dna --p peaks.fasta --n shuffled.fasta,run STREME for fast short motif discovery,utilities
meme,meme_06,bedtools getfasta -fi genome.fa -bed peaks.bed -fo peaks.fasta,extract sequences for peak regions using bedtools first,utilities
meme,meme_07,-dna -revcomp -mod zoops -nmotifs 5 -oc meme_rc peaks.fasta,run MEME with reverse complement consideration,utilities
meme,meme_08,-dna -mod zoops -nmotifs 10 -minw 6 -maxw 20 -oc meme_output peaks.fasta -o output.txt,discover de novo motifs in ChIP-seq peak sequences and write output to a file,utilities
meme,meme_09,fimo --thresh 1e-4 --oc fimo_output $MEME/share/meme/db/motif_databases/JASPAR/JASPAR2022_CORE_vertebrates_non-redundant_v2.meme peaks.fasta --quiet,scan sequences for known TF binding motifs with FIMO in quiet mode,utilities
meme,meme_10,tomtom -oc tomtom_output meme_output/meme.xml $MEME/share/meme/db/motif_databases/JASPAR/JASPAR2022_CORE_vertebrates_non-redundant_v2.meme,compare discovered motifs against a known database with TOMTOM with default parameters,utilities
metabat2,metabat2_01,jgi_summarize_bam_contig_depths --outputDepth contig_depths.txt sample1.bam sample2.bam sample3.bam,compute contig depths from BAM files for MetaBAT2,metagenomics
metabat2,metabat2_02,-i assembly.fasta -a contig_depths.txt -o bins/bin -m 2500 -t 8,bin metagenomic assembly contigs into MAGs,metagenomics
metabat2,metabat2_03,-i assembly.fasta -o bins/bin -m 1500 -t 8,run MetaBAT2 binning without coverage information (tetranucleotide only),metagenomics
metabat2,metabat2_04,-i assembly.fasta -a contig_depths.txt -o bins/bin --sensitive -m 2000 -t 8,bin with custom sensitivity settings,metagenomics
metabat2,metabat2_05,jgi_summarize_bam_contig_depths --outputDepth contig_depths.txt sample1.bam sample2.bam sample3.bam,compute contig depths from BAM files for MetaBAT2 with default parameters,metagenomics
metabat2,metabat2_06,-i assembly.fasta -a contig_depths.txt -o bins/bin -m 2500 -t 8 --verbose,bin metagenomic assembly contigs into MAGs with verbose output,metagenomics
metabat2,metabat2_07,-i assembly.fasta -o bins/bin -m 1500 -t 8,run MetaBAT2 binning without coverage information (tetranucleotide only) using multiple threads,metagenomics
metabat2,metabat2_08,-i assembly.fasta -a contig_depths.txt -o bins/bin --sensitive -m 2000 -t 8,bin with custom sensitivity settings and write output to a file,metagenomics
metabat2,metabat2_09,jgi_summarize_bam_contig_depths --outputDepth contig_depths.txt sample1.bam sample2.bam sample3.bam --quiet,compute contig depths from BAM files for MetaBAT2 in quiet mode,metagenomics
metabat2,metabat2_10,-i assembly.fasta -a contig_depths.txt -o bins/bin -m 2500 -t 8,bin metagenomic assembly contigs into MAGs with default parameters,metagenomics
metaphlan,metaphlan_01,--input_type fastq --bowtie2db /path/to/mpa_db --index mpa_vJan21_CHOCOPhlAnSGB_202103 --nproc 8 reads.fastq.gz -o sample_profile.txt,profile microbial community from single-end FASTQ reads,metagenomics
metaphlan,metaphlan_02,"--input_type fastq --bowtie2db /path/to/mpa_db --index mpa_vJan21_CHOCOPhlAnSGB_202103 --nproc 8 -o sample_profile.txt R1.fastq.gz,R2.fastq.gz",profile paired-end metagenomic reads,metagenomics
metaphlan,metaphlan_03,--input_type fastq --bowtie2db /path/to/mpa_db --index mpa_vJan21_CHOCOPhlAnSGB_202103 --nproc 8 --bowtie2out sample.bowtie2.bz2 -o sample_profile.txt reads.fastq.gz,save bowtie2 alignments for faster re-runs and profile,metagenomics
metaphlan,metaphlan_04,sample1_profile.txt sample2_profile.txt sample3_profile.txt > merged_profiles.txt,merge multiple MetaPhlAn profiles into a single table,metagenomics
metaphlan,metaphlan_05,--input_type fastq --bowtie2db /path/to/mpa_db --index mpa_vJan21_CHOCOPhlAnSGB_202103 --nproc 8 reads.fastq.gz -o sample_profile.txt,profile microbial community from single-end FASTQ reads with default parameters,metagenomics
metaphlan,metaphlan_06,"--input_type fastq --bowtie2db /path/to/mpa_db --index mpa_vJan21_CHOCOPhlAnSGB_202103 --nproc 8 -o sample_profile.txt R1.fastq.gz,R2.fastq.gz --verbose",profile paired-end metagenomic reads with verbose output,metagenomics
metaphlan,metaphlan_07,--input_type fastq --bowtie2db /path/to/mpa_db --index mpa_vJan21_CHOCOPhlAnSGB_202103 --nproc 8 --bowtie2out sample.bowtie2.bz2 -o sample_profile.txt reads.fastq.gz -t 4,save bowtie2 alignments for faster re-runs and profile using multiple threads,metagenomics
metaphlan,metaphlan_08,sample1_profile.txt sample2_profile.txt sample3_profile.txt > merged_profiles.txt,merge multiple MetaPhlAn profiles into a single table,metagenomics
metaphlan,metaphlan_09,--input_type fastq --bowtie2db /path/to/mpa_db --index mpa_vJan21_CHOCOPhlAnSGB_202103 --nproc 8 reads.fastq.gz -o sample_profile.txt --quiet,profile microbial community from single-end FASTQ reads in quiet mode,metagenomics
metaphlan,metaphlan_10,"--input_type fastq --bowtie2db /path/to/mpa_db --index mpa_vJan21_CHOCOPhlAnSGB_202103 --nproc 8 -o sample_profile.txt R1.fastq.gz,R2.fastq.gz",profile paired-end metagenomic reads with default parameters,metagenomics
miniasm,miniasm_01,-x ava-ont -t 16 reads.fastq.gz reads.fastq.gz | gzip > overlaps.paf.gz,compute all-vs-all overlaps for ONT reads with minimap2,assembly
miniasm,miniasm_02,-f reads.fastq.gz overlaps.paf.gz > assembly.gfa,assemble ONT reads from precomputed overlaps,assembly
miniasm,miniasm_03,"/^S/ {print "">""$2""\n""$3}",convert miniasm GFA output to FASTA,assembly
miniasm,miniasm_04,-x ava-ont -t 16 reads.fastq.gz reads.fastq.gz | gzip > overlaps.paf.gz,compute all-vs-all overlaps for ONT reads with minimap2,assembly
miniasm,miniasm_05,-f reads.fastq.gz overlaps.paf.gz > assembly.gfa,assemble ONT reads from precomputed overlaps with default parameters,assembly
miniasm,miniasm_06,"/^S/ {print "">""$2""\n""$3}",convert miniasm GFA output to FASTA,assembly
miniasm,miniasm_07,-x ava-ont -t 16 reads.fastq.gz reads.fastq.gz | gzip > overlaps.paf.gz,compute all-vs-all overlaps for ONT reads with minimap2,assembly
miniasm,miniasm_08,-f reads.fastq.gz overlaps.paf.gz > assembly.gfa,assemble ONT reads from precomputed overlaps,assembly
miniasm,miniasm_09,"/^S/ {print "">""$2""\n""$3}",convert miniasm GFA output to FASTA,assembly
miniasm,miniasm_10,-x ava-ont -t 16 reads.fastq.gz reads.fastq.gz | gzip > overlaps.paf.gz,compute all-vs-all overlaps for ONT reads with minimap2 with default parameters,assembly
minimap2,minimap2_01,-ax map-ont -t 8 reference.fa nanopore_reads.fastq.gz | samtools sort -@ 4 -o aligned_sorted.bam,align Oxford Nanopore reads to a reference genome,alignment
minimap2,minimap2_02,-ax map-hifi -t 8 reference.fa hifi_reads.fastq.gz | samtools sort -@ 4 -o hifi_aligned.bam,align PacBio HiFi (CCS) reads to a reference genome,alignment
minimap2,minimap2_03,-ax splice -t 8 --junc-bed known_junctions.bed reference.fa rna_reads.fastq.gz | samtools sort -o rna_aligned.bam,align Nanopore cDNA reads for RNA-seq spliced alignment,alignment
minimap2,minimap2_04,-ax asm5 -t 8 reference.fa assembly.fa | samtools sort -o assembly_vs_ref.bam,compare two genome assemblies (assembly vs reference),alignment
minimap2,minimap2_05,-x map-ont -t 8 -c reference.fa reads.fastq.gz > aligned.paf,map long reads and output in PAF format for structural variant analysis,alignment
minimap2,minimap2_06,-x ava-ont -t 16 reads.fastq.gz reads.fastq.gz | gzip > overlaps.paf.gz,compute all-vs-all overlaps for de novo ONT assembly,alignment
minimap2,minimap2_07,-d reference_ont.mmi -x map-ont reference.fa,build a reusable minimap2 index for repeated ONT alignments,alignment
minimap2,minimap2_08,-ax map-ont -t 8 reference.fa nanopore_reads.fastq.gz | samtools sort -@ 4 -o aligned_sorted.bam,align Oxford Nanopore reads to a reference genome,alignment
minimap2,minimap2_09,-ax map-hifi -t 8 reference.fa hifi_reads.fastq.gz | samtools sort -@ 4 -o hifi_aligned.bam,align PacBio HiFi (CCS) reads to a reference genome,alignment
minimap2,minimap2_10,-ax splice -t 8 --junc-bed known_junctions.bed reference.fa rna_reads.fastq.gz | samtools sort -o rna_aligned.bam,align Nanopore cDNA reads for RNA-seq spliced alignment with default parameters,alignment
mmseqs2,mmseqs2_01,easy-search query.fasta uniref50.fasta results.m8 tmp --format-mode 0 --threads 16 -s 7.5,search protein FASTA against UniRef50 and output BLAST tabular results,sequence-utilities
mmseqs2,mmseqs2_02,easy-cluster proteins.fasta cluster_90 tmp --min-seq-id 0.9 -c 0.8 --cov-mode 0 --threads 16,cluster protein sequences at 90% identity,sequence-utilities
mmseqs2,mmseqs2_03,easy-linclust proteins.fasta cluster_50 tmp --min-seq-id 0.5 -c 0.8 --threads 32,fast linear-time clustering of large metagenomic protein set at 50% identity,sequence-utilities
mmseqs2,mmseqs2_04,createdb proteins.fasta proteinsDB,build an MMseqs2 database from a FASTA file,sequence-utilities
mmseqs2,mmseqs2_05,search queryDB targetDB resultDB tmp -s 6 --threads 16 && convertalis queryDB targetDB resultDB results.tsv --format-mode 4,search one MMseqs2 DB against another and convert results to TSV,sequence-utilities
mmseqs2,mmseqs2_06,result2repseq proteinsDB proteinsDB cluster_result repseqDB && convert2fasta repseqDB representatives.fasta,extract representative sequences from a cluster result,sequence-utilities
mmseqs2,mmseqs2_07,easy-search reads.fasta proteins.fasta hits.m8 tmp --search-type 2 --threads 16,perform translated nucleotide-to-protein search,sequence-utilities
mmseqs2,mmseqs2_08,easy-search query.fasta uniref50.fasta results.m8 tmp --format-mode 0 --threads 16 -s 7.5 -o output.txt,search protein FASTA against UniRef50 and output BLAST tabular results and write output to a file,sequence-utilities
mmseqs2,mmseqs2_09,easy-cluster proteins.fasta cluster_90 tmp --min-seq-id 0.9 -c 0.8 --cov-mode 0 --threads 16 --quiet,cluster protein sequences at 90% identity in quiet mode,sequence-utilities
mmseqs2,mmseqs2_10,easy-linclust proteins.fasta cluster_50 tmp --min-seq-id 0.5 -c 0.8 --threads 32,fast linear-time clustering of large metagenomic protein set at 50% identity with default parameters,sequence-utilities
modkit,modkit_01,pileup --ref reference.fasta --mod-code m --cpg input.bam output.bedmethyl --threads 16,generate a bedMethyl pileup of 5mC methylation from a BAM file,epigenomics
modkit,modkit_02,pileup --ref reference.fasta --cpg --combine-strands -t 16 input.bam output_combined.bedmethyl,generate bedMethyl and combine CpG sites on both strands,epigenomics
modkit,modkit_03,extract --ref reference.fasta --mod-code m input.bam per_read_mods.tsv --threads 16,extract per-read modification data to TSV,epigenomics
modkit,modkit_04,summary input.bam --threads 8,get a summary of modification calls in a BAM,epigenomics
modkit,modkit_05,motif-bed reference.fasta CG 0 > cpg_positions.bed,generate a BED file of all CpG positions in a reference for use as motif targets,epigenomics
modkit,modkit_06,pileup --ref reference.fasta --region chr1:1-10000000 --mod-code m input.bam region_output.bedmethyl --threads 8,pileup restricted to a specific genomic region,epigenomics
modkit,modkit_07,sample-probs --mod-code m input.bam --threads 8,sample modification probabilities to assess threshold distribution,epigenomics
modkit,modkit_08,pileup --ref reference.fasta --mod-code m --cpg input.bam output.bedmethyl --threads 16 -o output.txt,generate a bedMethyl pileup of 5mC methylation from a BAM file and write output to a file,epigenomics
modkit,modkit_09,pileup --ref reference.fasta --cpg --combine-strands -t 16 input.bam output_combined.bedmethyl --quiet,generate bedMethyl and combine CpG sites on both strands in quiet mode,epigenomics
modkit,modkit_10,extract --ref reference.fasta --mod-code m input.bam per_read_mods.tsv --threads 16,extract per-read modification data to TSV with default parameters,epigenomics
mosdepth,mosdepth_01,--by 500 -t 8 --prefix sample_coverage sample_sorted.bam,calculate genome-wide depth of coverage in 500bp windows,utilities
mosdepth,mosdepth_02,--by targets.bed -t 4 --prefix wes_coverage sample_sorted.bam,calculate coverage over target regions for WES,utilities
mosdepth,mosdepth_03,-t 4 -Q 20 -F 1796 --prefix filtered_coverage sample_sorted.bam,calculate per-base depth with MAPQ filter,utilities
mosdepth,mosdepth_04,-n -t 8 --prefix summary_only sample_sorted.bam,get summary statistics only without per-base output,utilities
mosdepth,mosdepth_05,--by 500 -t 8 --prefix sample_coverage sample_sorted.bam,calculate genome-wide depth of coverage in 500bp windows with default parameters,utilities
mosdepth,mosdepth_06,--by targets.bed -t 4 --prefix wes_coverage sample_sorted.bam --verbose,calculate coverage over target regions for WES with verbose output,utilities
mosdepth,mosdepth_07,-t 4 -Q 20 -F 1796 --prefix filtered_coverage sample_sorted.bam,calculate per-base depth with MAPQ filter using multiple threads,utilities
mosdepth,mosdepth_08,-n -t 8 --prefix summary_only sample_sorted.bam -o output.txt,get summary statistics only without per-base output and write output to a file,utilities
mosdepth,mosdepth_09,--by 500 -t 8 --prefix sample_coverage sample_sorted.bam --quiet,calculate genome-wide depth of coverage in 500bp windows in quiet mode,utilities
mosdepth,mosdepth_10,--by targets.bed -t 4 --prefix wes_coverage sample_sorted.bam,calculate coverage over target regions for WES with default parameters,utilities
multiqc,multiqc_01,. -o multiqc_report/ -f,aggregate all QC results from the current directory into a single report,qc
multiqc,multiqc_02,/path/to/results/ -o /path/to/qc_summary/ -n project_qc_report -f,aggregate QC results from a specific results directory,qc
multiqc,multiqc_03,/results/ --ignore /results/old_run/ -o multiqc_output/ -f,run multiqc ignoring a specific subdirectory,qc
multiqc,multiqc_04,. --flat -o flat_report/ -f,generate a multiqc report with flat (non-interactive) output suitable for PDF,qc
multiqc,multiqc_05,fastqc_results/ trimmomatic_logs/ -o summary_qc/ -f,run multiqc on only FastQC and Trimmomatic outputs,qc
multiqc,multiqc_06,. -o multiqc_report/ -f --verbose,aggregate all QC results from the current directory into a single report with verbose output,qc
multiqc,multiqc_07,/path/to/results/ -o /path/to/qc_summary/ -n project_qc_report -f -t 4,aggregate QC results from a specific results directory using multiple threads,qc
multiqc,multiqc_08,/results/ --ignore /results/old_run/ -o multiqc_output/ -f,run multiqc ignoring a specific subdirectory and write output to a file,qc
multiqc,multiqc_09,. --flat -o flat_report/ -f --quiet,generate a multiqc report with flat (non-interactive) output suitable for PDF in quiet mode,qc
multiqc,multiqc_10,fastqc_results/ trimmomatic_logs/ -o summary_qc/ -f,run multiqc on only FastQC and Trimmomatic outputs with default parameters,qc
mummer,mummer_01,nucmer --prefix=myrun reference.fna query.fna,align a query genome to a reference genome,alignment
mummer,mummer_02,dnadiff reference.fna query.fna,generate a comprehensive pairwise genome comparison report,alignment
mummer,mummer_03,delta-filter -1 myrun.delta > myrun.filtered.delta && show-snps -Clr myrun.filtered.delta > myrun.snps,filter alignments to 1-to-1 (unique) and extract SNPs,alignment
mummer,mummer_04,show-coords -r -c -l myrun.delta > myrun.coords,show alignment coordinates,alignment
mummer,mummer_05,mummerplot --png --prefix=dotplot myrun.delta,generate a synteny dot-plot image,alignment
mummer,mummer_06,nucmer --mum -p compare reference.fa query.fa && show-snps -Clrx compare.delta,compare two genomes with verbose SNP output,alignment
mummer,mummer_07,nucmer -c 100 -l 20 --prefix large_genome ref.fa query.fa,align with a custom minimum match length,alignment
mummer,mummer_08,nucmer --prefix=myrun reference.fna query.fna -o output.txt,align a query genome to a reference genome and write output to a file,alignment
mummer,mummer_09,dnadiff reference.fna query.fna --quiet,generate a comprehensive pairwise genome comparison report in quiet mode,alignment
mummer,mummer_10,delta-filter -1 myrun.delta > myrun.filtered.delta && show-snps -Clr myrun.filtered.delta > myrun.snps,filter alignments to 1-to-1 (unique) and extract SNPs with default parameters,alignment
muscle,muscle_01,-align proteins.fasta -output aligned_proteins.fasta -threads 8,align multiple protein sequences with MUSCLE v5,phylogenetics
muscle,muscle_02,-super5 large_dataset.fasta -output large_aligned.fasta -threads 16,align a large dataset with MUSCLE v5 super5 mode,phylogenetics
muscle,muscle_03,-in sequences.fasta -out aligned.fasta,align sequences with MUSCLE v3 syntax (legacy),phylogenetics
muscle,muscle_04,-align sequences.fasta -output aligned.fasta -replicates 5 -threads 8,generate multiple alignment replicates for uncertainty estimation,phylogenetics
muscle,muscle_05,-align proteins.fasta -output aligned_proteins.fasta -threads 8,align multiple protein sequences with MUSCLE v5 with default parameters,phylogenetics
muscle,muscle_06,-super5 large_dataset.fasta -output large_aligned.fasta -threads 16 --verbose,align a large dataset with MUSCLE v5 super5 mode with verbose output,phylogenetics
muscle,muscle_07,-in sequences.fasta -out aligned.fasta -t 4,align sequences with MUSCLE v3 syntax (legacy) using multiple threads,phylogenetics
muscle,muscle_08,-align sequences.fasta -output aligned.fasta -replicates 5 -threads 8 -o output.txt,generate multiple alignment replicates for uncertainty estimation and write output to a file,phylogenetics
muscle,muscle_09,-align proteins.fasta -output aligned_proteins.fasta -threads 8 --quiet,align multiple protein sequences with MUSCLE v5 in quiet mode,phylogenetics
muscle,muscle_10,-super5 large_dataset.fasta -output large_aligned.fasta -threads 16,align a large dataset with MUSCLE v5 super5 mode with default parameters,phylogenetics
nextflow,nextflow_01,"run nf-core/rnaseq -profile singularity,slurm --input samplesheet.csv --genome GRCh38 -resume",run an nf-core pipeline with Singularity on a Slurm cluster,workflow-manager
nextflow,nextflow_02,run main.nf -work-dir /scratch/$USER/nxf-work,run a pipeline with a custom work directory,workflow-manager
nextflow,nextflow_03,pull nf-core/sarek -revision 3.4.0,pull a specific pipeline version from nf-core,workflow-manager
nextflow,nextflow_04,run main.nf -resume,resume a failed pipeline run,workflow-manager
nextflow,nextflow_05,run nf-core/chipseq -c custom.config --input samplesheet.csv,run pipeline with a custom config file,workflow-manager
nextflow,nextflow_06,list,show the list of cached pipeline assets,workflow-manager
nextflow,nextflow_07,clean -but last,clean up work directory keeping only the last run's intermediate files,workflow-manager
nextflow,nextflow_08,run nf-core/rnaseq -profile singularity --singularity.cacheDir /shared/singularity-cache --input samplesheet.csv,run a pipeline with Singularity image cache set,workflow-manager
nextflow,nextflow_09,-version,check Nextflow version and environment,workflow-manager
nextflow,nextflow_10,run main.nf -with-report report.html -with-timeline timeline.html,generate a run report and timeline,workflow-manager
orthofinder,orthofinder_01,-f proteomes/ -t 32 -a 8,run OrthoFinder on a directory of species proteomes,comparative-genomics
orthofinder,orthofinder_02,-f proteomes/ -M msa -S diamond -A mafft -T iqtree -t 32 -a 8,run OrthoFinder with MSA-based gene trees using MAFFT and IQ-TREE,comparative-genomics
orthofinder,orthofinder_03,-f proteomes/ -og -t 32,infer orthogroups only without gene trees for fast proteome comparison,comparative-genomics
orthofinder,orthofinder_04,-b proteomes/OrthoFinder/Results_Jan01/ -f new_species/ -t 32 -a 8,restart OrthoFinder from existing DIAMOND results (add a new species),comparative-genomics
orthofinder,orthofinder_05,-f proteomes/ -S mmseqs2 -t 32 -a 8,use MMseqs2 instead of DIAMOND for faster all-vs-all search,comparative-genomics
orthofinder,orthofinder_06,-f proteomes/ -o results/orthofinder_run -t 32 -a 8,run OrthoFinder with a fixed output directory name,comparative-genomics
orthofinder,orthofinder_07,-f proteomes/ -t 32 -a 8,run OrthoFinder on a directory of species proteomes using multiple threads,comparative-genomics
orthofinder,orthofinder_08,-f proteomes/ -M msa -S diamond -A mafft -T iqtree -t 32 -a 8 -o output.txt,run OrthoFinder with MSA-based gene trees using MAFFT and IQ-TREE and write output to a file,comparative-genomics
orthofinder,orthofinder_09,-f proteomes/ -og -t 32 --quiet,infer orthogroups only without gene trees for fast proteome comparison in quiet mode,comparative-genomics
orthofinder,orthofinder_10,-b proteomes/OrthoFinder/Results_Jan01/ -f new_species/ -t 32 -a 8,restart OrthoFinder from existing DIAMOND results (add a new species) with default parameters,comparative-genomics
pairtools,pairtools_01,parse --min-mapq 30 --walks-policy mask --max-inter-align-gap 30 -N sample --chroms-path chromsizes.txt sorted.bam > sample.pairs.gz,parse Hi-C BWA alignments to pairs format,epigenomics
pairtools,pairtools_02,sort sample.pairs.gz --nproc 16 --tmpdir /tmp/ > sample_sorted.pairs.gz,sort pairs file for deduplication,epigenomics
pairtools,pairtools_03,dedup --nproc 16 --output-stats dedup_stats.txt sample_sorted.pairs.gz > sample_dedup.pairs.gz,deduplicate sorted pairs file,epigenomics
pairtools,pairtools_04,cload pairs --chrom1 2 --pos1 3 --chrom2 4 --pos2 5 chromsizes.txt:5000 sample_dedup.pairs.gz sample_5kb.cool,bin pairs into contact matrix using cooler,epigenomics
pairtools,pairtools_05,parse --min-mapq 30 --walks-policy mask --max-inter-align-gap 30 -N sample --chroms-path chromsizes.txt sorted.bam > sample.pairs.gz,parse Hi-C BWA alignments to pairs format with default parameters,epigenomics
pairtools,pairtools_06,sort sample.pairs.gz --nproc 16 --tmpdir /tmp/ > sample_sorted.pairs.gz,sort pairs file for deduplication,epigenomics
pairtools,pairtools_07,dedup --nproc 16 --output-stats dedup_stats.txt sample_sorted.pairs.gz > sample_dedup.pairs.gz,deduplicate sorted pairs file,epigenomics
pairtools,pairtools_08,cload pairs --chrom1 2 --pos1 3 --chrom2 4 --pos2 5 chromsizes.txt:5000 sample_dedup.pairs.gz sample_5kb.cool -o output.txt,bin pairs into contact matrix using cooler and write output to a file,epigenomics
pairtools,pairtools_09,parse --min-mapq 30 --walks-policy mask --max-inter-align-gap 30 -N sample --chroms-path chromsizes.txt sorted.bam > sample.pairs.gz,parse Hi-C BWA alignments to pairs format,epigenomics
pairtools,pairtools_10,sort sample.pairs.gz --nproc 16 --tmpdir /tmp/ > sample_sorted.pairs.gz,sort pairs file for deduplication with default parameters,epigenomics
pbfusion,pbfusion_01,--bam isoseq_aligned.bam --gtf genes.gtf --output-dir fusion_output/ --threads 8,detect gene fusions from PacBio IsoSeq aligned data,rna-seq
pbfusion,pbfusion_02,--bam isoseq_aligned.bam --gtf genes.gtf --output-dir fusion_output/ --min-support 3 --threads 8,detect fusions with minimum supporting reads,rna-seq
pbfusion,pbfusion_03,--bam isoseq_aligned.bam --gtf genes.gtf --output-dir fusion_output/ --threads 8,detect gene fusions from PacBio IsoSeq aligned data and write output to a file,rna-seq
pbfusion,pbfusion_04,--bam isoseq_aligned.bam --gtf genes.gtf --output-dir fusion_output/ --min-support 3 --threads 8 --quiet,detect fusions with minimum supporting reads in quiet mode,rna-seq
pbfusion,pbfusion_05,--bam isoseq_aligned.bam --gtf genes.gtf --output-dir fusion_output/ --threads 8,detect gene fusions from PacBio IsoSeq aligned data with default parameters,rna-seq
pbfusion,pbfusion_06,--bam isoseq_aligned.bam --gtf genes.gtf --output-dir fusion_output/ --min-support 3 --threads 8 --verbose,detect fusions with minimum supporting reads with verbose output,rna-seq
pbfusion,pbfusion_07,--bam isoseq_aligned.bam --gtf genes.gtf --output-dir fusion_output/ --threads 8,detect gene fusions from PacBio IsoSeq aligned data using multiple threads,rna-seq
pbfusion,pbfusion_08,--bam isoseq_aligned.bam --gtf genes.gtf --output-dir fusion_output/ --min-support 3 --threads 8,detect fusions with minimum supporting reads and write output to a file,rna-seq
pbfusion,pbfusion_09,--bam isoseq_aligned.bam --gtf genes.gtf --output-dir fusion_output/ --threads 8 --quiet,detect gene fusions from PacBio IsoSeq aligned data in quiet mode,rna-seq
pbfusion,pbfusion_10,--bam isoseq_aligned.bam --gtf genes.gtf --output-dir fusion_output/ --min-support 3 --threads 8,detect fusions with minimum supporting reads with default parameters,rna-seq
pbmm2,pbmm2_01,align --preset HIFI --sort -j 16 --sort-threads 4 reference.fa hifi_reads.bam aligned_sorted.bam,align PacBio HiFi reads to reference genome,alignment
pbmm2,pbmm2_02,align --preset ISOSEQ --sort -j 8 reference.fa isoseq_reads.bam isoseq_aligned.bam,align PacBio IsoSeq transcriptome reads,alignment
pbmm2,pbmm2_03,index reference.fa reference.mmi,index reference genome for repeated pbmm2 use,alignment
pbmm2,pbmm2_04,align --preset SUBREAD --sort -j 16 reference.fa subreads.bam clr_aligned.bam,align CLR (subread) PacBio reads,alignment
pbmm2,pbmm2_05,align --preset HIFI --sort -j 16 --sort-threads 4 reference.fa hifi_reads.bam aligned_sorted.bam,align PacBio HiFi reads to reference genome with default parameters,alignment
pbmm2,pbmm2_06,align --preset ISOSEQ --sort -j 8 reference.fa isoseq_reads.bam isoseq_aligned.bam --verbose,align PacBio IsoSeq transcriptome reads with verbose output,alignment
pbmm2,pbmm2_07,index reference.fa reference.mmi -t 4,index reference genome for repeated pbmm2 use using multiple threads,alignment
pbmm2,pbmm2_08,align --preset SUBREAD --sort -j 16 reference.fa subreads.bam clr_aligned.bam -o output.txt,align CLR (subread) PacBio reads and write output to a file,alignment
pbmm2,pbmm2_09,align --preset HIFI --sort -j 16 --sort-threads 4 reference.fa hifi_reads.bam aligned_sorted.bam --quiet,align PacBio HiFi reads to reference genome in quiet mode,alignment
pbmm2,pbmm2_10,align --preset ISOSEQ --sort -j 8 reference.fa isoseq_reads.bam isoseq_aligned.bam,align PacBio IsoSeq transcriptome reads with default parameters,alignment
pbsv,pbsv_01,discover --hifi sorted.bam sample.svsig.gz,discover SV signatures from PacBio HiFi aligned BAM,variant-calling
pbsv,pbsv_02,call --hifi reference.fa sample.svsig.gz output_svs.vcf,call SVs from a single sample's signature file,variant-calling
pbsv,pbsv_03,call --hifi reference.fa sample1.svsig.gz sample2.svsig.gz sample3.svsig.gz cohort_svs.vcf,call SVs jointly from multiple samples,variant-calling
pbsv,pbsv_04,discover --hifi --tandem-repeats hg38.trf.bed sorted.bam sample.svsig.gz,discover with tandem repeat annotation for better accuracy,variant-calling
pbsv,pbsv_05,discover --hifi sorted.bam sample.svsig.gz,discover SV signatures from PacBio HiFi aligned BAM with default parameters,variant-calling
pbsv,pbsv_06,call --hifi reference.fa sample.svsig.gz output_svs.vcf --verbose,call SVs from a single sample's signature file with verbose output,variant-calling
pbsv,pbsv_07,call --hifi reference.fa sample1.svsig.gz sample2.svsig.gz sample3.svsig.gz cohort_svs.vcf -t 4,call SVs jointly from multiple samples using multiple threads,variant-calling
pbsv,pbsv_08,discover --hifi --tandem-repeats hg38.trf.bed sorted.bam sample.svsig.gz -o output.txt,discover with tandem repeat annotation for better accuracy and write output to a file,variant-calling
pbsv,pbsv_09,discover --hifi sorted.bam sample.svsig.gz --quiet,discover SV signatures from PacBio HiFi aligned BAM in quiet mode,variant-calling
pbsv,pbsv_10,call --hifi reference.fa sample.svsig.gz output_svs.vcf,call SVs from a single sample's signature file with default parameters,variant-calling
perl,perl_01,script.pl input.txt,run a Perl script,programming
perl,perl_02,-V,print the Perl version and module search paths,programming
perl,perl_03,-ne 'print if /^>/' sequences.fasta,one-liner: print lines matching a pattern,programming
perl,perl_04,"-lane 'print join(""\t"", @F[0,2,4])' data.tsv",one-liner: extract specific columns from a TSV,programming
perl,perl_05,-i.bak -pe 's/chr/Chr/g' genome.fa,in-place substitution (edit file directly),programming
perl,perl_06,"-ne '$c++ if /^>/; END { print ""$c sequences\n"" }' input.fa",count FASTA sequences in a file,programming
perl,perl_07,"-MCPAN -e 'CPAN::Shell->install(""Bio::SeqIO"")'",install a module via CPAN one-liner,programming
perl,perl_08,-Mlocal::lib,set up a local user-space Perl module directory,programming
perl,perl_09,-MBio::SeqIO -e 1,check if a required module is installed,programming
perl,perl_10,-I /path/to/bioperl-lib script.pl input.fasta,run a bioinformatics script with a custom library path,programming
picard,picard_01,MarkDuplicates -I sorted.bam -O marked_dup.bam -M markdup_metrics.txt --CREATE_INDEX true,mark PCR duplicates in a sorted BAM file,alignment
picard,picard_02,AddOrReplaceReadGroups -I input.bam -O rg_added.bam --RGLB lib1 --RGPL ILLUMINA --RGPU unit1 --RGSM sample1 --CREATE_INDEX true,add or replace read groups in a BAM file,alignment
picard,picard_03,SortSam -I input.bam -O sorted.bam --SORT_ORDER coordinate --CREATE_INDEX true,sort a BAM file by coordinate using Picard,alignment
picard,picard_04,CollectAlignmentSummaryMetrics -I aligned.bam -O alignment_metrics.txt -R reference.fa,collect alignment summary metrics from a BAM file,alignment
picard,picard_05,CollectInsertSizeMetrics -I sorted.bam -O insert_size_metrics.txt -H insert_size_histogram.pdf,collect insert size distribution metrics from paired-end BAM,alignment
picard,picard_06,SortSam -I input.sam -O sorted.bam --SORT_ORDER coordinate --CREATE_INDEX true,convert SAM to sorted BAM with index,alignment
picard,picard_07,ValidateSamFile -I input.bam -O validation_report.txt --MODE SUMMARY,validate a BAM file for GATK compatibility,alignment
picard,picard_08,MarkDuplicates -I sorted.bam -O marked_dup.bam -M markdup_metrics.txt --CREATE_INDEX true -o output.txt,mark PCR duplicates in a sorted BAM file and write output to a file,alignment
picard,picard_09,AddOrReplaceReadGroups -I input.bam -O rg_added.bam --RGLB lib1 --RGPL ILLUMINA --RGPU unit1 --RGSM sample1 --CREATE_INDEX true --quiet,add or replace read groups in a BAM file in quiet mode,alignment
picard,picard_10,SortSam -I input.bam -O sorted.bam --SORT_ORDER coordinate --CREATE_INDEX true,sort a BAM file by coordinate using Picard with default parameters,alignment
pilon,pilon_01,-Xmx64g -jar pilon.jar --genome draft.fasta --frags aligned.sorted.bam --output polished --changes --threads 16,polish a draft assembly with paired-end Illumina reads,assembly
pilon,pilon_02,-Xmx128g -jar pilon.jar --genome draft.fasta --frags pe.sorted.bam --jumps mp.sorted.bam --output polished_v2 --threads 16,polish with mate-pair and paired-end libraries combined,assembly
pilon,pilon_03,-Xmx64g -jar pilon.jar --genome draft.fasta --frags aligned.sorted.bam --output polished --fix bases --threads 16,run Pilon fixing only SNPs and small indels (not structural),assembly
pilon,pilon_04,-Xmx64g -jar pilon.jar --genome draft.fasta --frags aligned.sorted.bam --output variants --variant --threads 16,generate a VCF of variants found in the assembly,assembly
pilon,pilon_05,-Xmx32g -jar pilon.jar --genome contigs.fasta --frags aligned.sorted.bam --output polished_contigs --targets contig_list.txt --threads 8,"polish a specific set of sequences (e.g., unplaced contigs only)",assembly
pilon,pilon_06,-Xmx64g -jar pilon.jar --genome polished.fasta --frags re_aligned.sorted.bam --output polished_r2 --changes --threads 16,second round of polishing after re-aligning reads to first round output,assembly
pilon,pilon_07,-Xmx64g -jar pilon.jar --genome draft.fasta --frags aligned.sorted.bam --output polished --changes --threads 16,polish a draft assembly with paired-end Illumina reads using multiple threads,assembly
pilon,pilon_08,-Xmx128g -jar pilon.jar --genome draft.fasta --frags pe.sorted.bam --jumps mp.sorted.bam --output polished_v2 --threads 16,polish with mate-pair and paired-end libraries combined and write output to a file,assembly
pilon,pilon_09,-Xmx64g -jar pilon.jar --genome draft.fasta --frags aligned.sorted.bam --output polished --fix bases --threads 16 --quiet,run Pilon fixing only SNPs and small indels (not structural) in quiet mode,assembly
pilon,pilon_10,-Xmx64g -jar pilon.jar --genome draft.fasta --frags aligned.sorted.bam --output variants --variant --threads 16,generate a VCF of variants found in the assembly with default parameters,assembly
plink2,plink2_01,"--vcf variants.vcf --make-pgen --out plink_dataset --set-missing-var-ids @:#[b37]\$r,\$a --max-alleles 2",convert VCF to PLINK2 binary format,population-genomics
plink2,plink2_02,--pfile plink_dataset --maf 0.01 --geno 0.05 --mind 0.1 --hwe 1e-6 --make-pgen --out qc_filtered,perform quality control filtering on PLINK dataset,population-genomics
plink2,plink2_03,--pfile plink_dataset --indep-pairwise 50 10 0.1 --out ld_prune && plink2 --pfile plink_dataset --extract ld_prune.prune.in --pca 20 --out pca_results,perform LD pruning and compute PCA,population-genomics
plink2,plink2_04,--pfile plink_dataset --pheno phenotypes.txt --pheno-name case_control --covar covariates.txt --glm hide-covar --out gwas_results,run genome-wide association study (GWAS) for binary phenotype,population-genomics
plink2,plink2_05,--pfile plink_dataset --extract ld_prune.prune.in --make-king-table --out kinship_matrix,compute kinship/relatedness matrix,population-genomics
plink2,plink2_06,"--vcf variants.vcf --make-pgen --out plink_dataset --set-missing-var-ids @:#[b37]\$r,\$a --max-alleles 2 --verbose",convert VCF to PLINK2 binary format with verbose output,population-genomics
plink2,plink2_07,--pfile plink_dataset --maf 0.01 --geno 0.05 --mind 0.1 --hwe 1e-6 --make-pgen --out qc_filtered -t 4,perform quality control filtering on PLINK dataset using multiple threads,population-genomics
plink2,plink2_08,--pfile plink_dataset --indep-pairwise 50 10 0.1 --out ld_prune && plink2 --pfile plink_dataset --extract ld_prune.prune.in --pca 20 --out pca_results -o output.txt,perform LD pruning and compute PCA and write output to a file,population-genomics
plink2,plink2_09,--pfile plink_dataset --pheno phenotypes.txt --pheno-name case_control --covar covariates.txt --glm hide-covar --out gwas_results --quiet,run genome-wide association study (GWAS) for binary phenotype in quiet mode,population-genomics
plink2,plink2_10,--pfile plink_dataset --extract ld_prune.prune.in --make-king-table --out kinship_matrix,compute kinship/relatedness matrix with default parameters,population-genomics
porechop,porechop_01,-i reads.fastq.gz -o trimmed_reads.fastq.gz --threads 8,trim adapters from Oxford Nanopore FASTQ reads,qc
porechop,porechop_02,-i reads.fastq.gz -o trimmed_no_chimeras.fastq.gz --discard_middle --threads 8,trim adapters and remove chimeric reads,qc
porechop,porechop_03,-i barcoded_reads.fastq.gz -b demultiplexed_reads/ --threads 8,demultiplex barcoded ONT reads into separate files,qc
porechop,porechop_04,-i reads.fastq.gz -o trimmed.fastq.gz --min_split_read_size 1000 --threads 8,trim adapters and set minimum length output,qc
porechop,porechop_05,-i reads.fastq.gz -o trimmed_reads.fastq.gz --threads 8,trim adapters from Oxford Nanopore FASTQ reads with default parameters,qc
porechop,porechop_06,-i reads.fastq.gz -o trimmed_no_chimeras.fastq.gz --discard_middle --threads 8 --verbose,trim adapters and remove chimeric reads with verbose output,qc
porechop,porechop_07,-i barcoded_reads.fastq.gz -b demultiplexed_reads/ --threads 8,demultiplex barcoded ONT reads into separate files using multiple threads,qc
porechop,porechop_08,-i reads.fastq.gz -o trimmed.fastq.gz --min_split_read_size 1000 --threads 8,trim adapters and set minimum length output and write output to a file,qc
porechop,porechop_09,-i reads.fastq.gz -o trimmed_reads.fastq.gz --threads 8 --quiet,trim adapters from Oxford Nanopore FASTQ reads in quiet mode,qc
porechop,porechop_10,-i reads.fastq.gz -o trimmed_no_chimeras.fastq.gz --discard_middle --threads 8,trim adapters and remove chimeric reads with default parameters,qc
prodigal,prodigal_01,-i genome.fasta -a proteins.faa -d cds.fna -f gff -o gene_predictions.gff,predict genes in a bacterial genome and output protein and GFF files,annotation
prodigal,prodigal_02,-i metagenomic_contigs.fasta -a meta_proteins.faa -d meta_cds.fna -f gff -o meta_genes.gff -p meta,predict genes in metagenomic contigs,annotation
prodigal,prodigal_03,-i mycoplasma_genome.fasta -a mycoplasma_proteins.faa -f gff -o mycoplasma_genes.gff -g 4,predict genes with non-standard genetic code (Mycoplasma),annotation
prodigal,prodigal_04,-i assembly.fasta -a proteins.faa -f gbk -o predictions.gbk,predict genes and output in GenBank format for import into annotation tools,annotation
prodigal,prodigal_05,-i genome.fasta -a proteins.faa -d cds.fna -f gff -o gene_predictions.gff,predict genes in a bacterial genome and output protein and GFF files with default parameters,annotation
prodigal,prodigal_06,-i metagenomic_contigs.fasta -a meta_proteins.faa -d meta_cds.fna -f gff -o meta_genes.gff -p meta --verbose,predict genes in metagenomic contigs with verbose output,annotation
prodigal,prodigal_07,-i mycoplasma_genome.fasta -a mycoplasma_proteins.faa -f gff -o mycoplasma_genes.gff -g 4 -t 4,predict genes with non-standard genetic code (Mycoplasma) using multiple threads,annotation
prodigal,prodigal_08,-i assembly.fasta -a proteins.faa -f gbk -o predictions.gbk,predict genes and output in GenBank format for import into annotation tools and write output to a file,annotation
prodigal,prodigal_09,-i genome.fasta -a proteins.faa -d cds.fna -f gff -o gene_predictions.gff --quiet,predict genes in a bacterial genome and output protein and GFF files in quiet mode,annotation
prodigal,prodigal_10,-i metagenomic_contigs.fasta -a meta_proteins.faa -d meta_cds.fna -f gff -o meta_genes.gff -p meta,predict genes in metagenomic contigs with default parameters,annotation
prokka,prokka_01,--kingdom Bacteria --genus Escherichia --species coli --strain K12 --cpus 8 --outdir prokka_output --prefix ecoli_K12 assembly.fasta,annotate a bacterial genome assembly,metagenomics
prokka,prokka_02,--metagenome --cpus 8 --outdir mag_annotation --prefix bin001 bin001_contigs.fasta,annotate a metagenome-assembled genome (MAG),metagenomics
prokka,prokka_03,--kingdom Archaea --cpus 8 --outdir archaea_output --prefix archaea_sample archaea_assembly.fasta,annotate archaea genome,metagenomics
prokka,prokka_04,--kingdom Bacteria --proteins custom_proteins.faa --cpus 8 --outdir custom_annotation --prefix sample genome.fasta,annotate genome with custom protein database for improved annotation,metagenomics
prokka,prokka_05,--kingdom Bacteria --locustag MYORG --cpus 8 --outdir annotated --prefix genome_v1 assembly.fasta,annotate genome and add specific locus tag prefix,metagenomics
prokka,prokka_06,--kingdom Bacteria --genus Escherichia --species coli --strain K12 --cpus 8 --outdir prokka_output --prefix ecoli_K12 assembly.fasta --verbose,annotate a bacterial genome assembly with verbose output,metagenomics
prokka,prokka_07,--metagenome --cpus 8 --outdir mag_annotation --prefix bin001 bin001_contigs.fasta -t 4,annotate a metagenome-assembled genome (MAG) using multiple threads,metagenomics
prokka,prokka_08,--kingdom Archaea --cpus 8 --outdir archaea_output --prefix archaea_sample archaea_assembly.fasta -o output.txt,annotate archaea genome and write output to a file,metagenomics
prokka,prokka_09,--kingdom Bacteria --proteins custom_proteins.faa --cpus 8 --outdir custom_annotation --prefix sample genome.fasta --quiet,annotate genome with custom protein database for improved annotation in quiet mode,metagenomics
prokka,prokka_10,--kingdom Bacteria --locustag MYORG --cpus 8 --outdir annotated --prefix genome_v1 assembly.fasta,annotate genome and add specific locus tag prefix with default parameters,metagenomics
python,python_01,script.py,run a Python script,programming
python,python_02,"-c ""print('Hello, World!')""",run a Python one-liner,programming
python,python_03,-m http.server 8080,"run a module as a script (e.g., start an HTTP server)",programming
python,python_04,-m venv .venv,create a virtual environment,programming
python,python_05,-m pytest tests/ -v,run a script with an additional module search path,programming
python,python_06,"-c ""import json,sys; data=json.load(sys.stdin); [print(r['name']) for r in data]""",process JSON from stdin with a one-liner,programming
python,python_07,-u pipeline_script.py,run a script with unbuffered output (for pipelines),programming
python,python_08,-m cProfile -s cumtime slow_script.py,profile a script and show cumulative time,programming
python,python_09,--version,check Python version,programming
python,python_10,-W all script.py,run a script with a warning filter to show all deprecation warnings,programming
qualimap,qualimap_01,bamqc -bam sorted.bam --java-mem-size 8G -nt 8 -outdir qualimap_results/,"run BAM QC on a sorted, indexed BAM",qc
qualimap,qualimap_02,rnaseq -bam sorted.bam -gtf annotation.gtf -p strand-specific-reverse --java-mem-size 8G -outdir qualimap_rnaseq/,run RNA-seq QC with strand information,qc
qualimap,qualimap_03,multi-bamqc -d samples.txt --java-mem-size 4G -outdir multiqc_qualimap/,run multi-sample QC and aggregate report,qc
qualimap,qualimap_04,bamqc -bam sorted.bam -gd HUMAN --java-mem-size 8G -nt 8 -outdir qualimap_gc/,run BAM QC with GC bias correction (human),qc
qualimap,qualimap_05,bamqc -bam sorted.bam -outformat PDF:HTML --java-mem-size 4G -outdir qualimap_output/,generate PDF and HTML reports,qc
qualimap,qualimap_06,bamqc -bam wgs.bam -gd HUMAN --java-mem-size 16G -nt 16 --paint-chromosome-limits -outdir wgs_qualimap/,run BAM QC on whole-genome sequencing data,qc
qualimap,qualimap_07,counts -d counts.txt -c 2 -s C -outdir counts_qc/,count QC for differential expression count matrices,qc
qualimap,qualimap_08,bamqc -bam sorted.bam --java-mem-size 8G -nt 8 -outdir qualimap_results/ -o output.txt,"run BAM QC on a sorted, indexed BAM and write output to a file",qc
qualimap,qualimap_09,rnaseq -bam sorted.bam -gtf annotation.gtf -p strand-specific-reverse --java-mem-size 8G -outdir qualimap_rnaseq/ --quiet,run RNA-seq QC with strand information in quiet mode,qc
qualimap,qualimap_10,multi-bamqc -d samples.txt --java-mem-size 4G -outdir multiqc_qualimap/,run multi-sample QC and aggregate report with default parameters,qc
quast,quast_01,-r reference.fasta -g genes.gff assembly.fasta -o quast_output/ --threads 8,assess assembly quality with reference genome,assembly
quast,quast_02,spades_assembly.fasta megahit_assembly.fasta flye_assembly.fasta -o assembly_comparison/ --threads 8,compare multiple assemblies without reference genome,assembly
quast,quast_03,"metaquast.py -r reference1.fasta,reference2.fasta assembly.fasta -o metaquast_output/ --threads 16",assess metagenome assembly quality with metaQUAST,assembly
quast,quast_04,-r reference.fasta assembly.fasta -o quast_out/ --min-contig 1000 --threads 8,assess assembly with minimum contig length filter,assembly
quast,quast_05,-r reference.fasta -g genes.gff assembly.fasta -o quast_output/ --threads 8,assess assembly quality with reference genome with default parameters,assembly
quast,quast_06,spades_assembly.fasta megahit_assembly.fasta flye_assembly.fasta -o assembly_comparison/ --threads 8 --verbose,compare multiple assemblies without reference genome with verbose output,assembly
quast,quast_07,"metaquast.py -r reference1.fasta,reference2.fasta assembly.fasta -o metaquast_output/ --threads 16",assess metagenome assembly quality with metaQUAST using multiple threads,assembly
quast,quast_08,-r reference.fasta assembly.fasta -o quast_out/ --min-contig 1000 --threads 8,assess assembly with minimum contig length filter and write output to a file,assembly
quast,quast_09,-r reference.fasta -g genes.gff assembly.fasta -o quast_output/ --threads 8 --quiet,assess assembly quality with reference genome in quiet mode,assembly
quast,quast_10,spades_assembly.fasta megahit_assembly.fasta flye_assembly.fasta -o assembly_comparison/ --threads 8,compare multiple assemblies without reference genome with default parameters,assembly
r,r_01,Rscript analysis.R,run an R script non-interactively,programming
r,r_02,Rscript analysis.R --input data.csv --output results.csv,run an R script with command-line arguments,programming
r,r_03,"Rscript -e ""cat(paste(1:10, collapse=','), '\n')""",execute a one-liner R expression,programming
r,r_04,"Rscript -e ""install.packages('ggplot2', repos='https://cloud.r-project.org', lib=Sys.getenv('R_LIBS_USER'))""",install a CRAN package into the user library,programming
r,r_05,"Rscript -e ""BiocManager::install(c('DESeq2','edgeR'))""",install Bioconductor packages,programming
r,r_06,"Rscript -e ""packageVersion('DESeq2')""",check installed package version,programming
r,r_07,"Rscript -e ""ip <- installed.packages(lib.loc=.libPaths()[1]); cat(paste(ip[,'Package'],ip[,'Version'],sep='='), sep='\n')""",list user-installed packages and their versions,programming
r,r_08,Rscript --vanilla --quiet analysis.R,run R script suppressing startup messages,programming
r,r_09,"Rscript -e "".libPaths()""",show R library paths,programming
r,r_10,"Rscript -e ""rmarkdown::render('report.Rmd', output_format='html_document')""",render an Rmarkdown document to HTML,programming
racon,racon_01,-t 16 reads.fastq.gz mapping.paf draft_assembly.fasta > polished_round1.fasta,run one round of Racon polishing on an ONT assembly,assembly
racon,racon_02,-t 16 reads.fastq.gz round2_mapping.paf polished_round1.fasta > polished_round2.fasta,run second round of Racon polishing,assembly
racon,racon_03,-t 16 reads.fastq.gz alignment.sam draft_assembly.fasta > polished_assembly.fasta,run Racon polishing using SAM alignment instead of PAF,assembly
racon,racon_04,-t 16 reads.fastq.gz mapping.paf draft_assembly.fasta > polished_round1.fasta,run one round of Racon polishing on an ONT assembly,assembly
racon,racon_05,-t 16 reads.fastq.gz round2_mapping.paf polished_round1.fasta > polished_round2.fasta,run second round of Racon polishing with default parameters,assembly
racon,racon_06,-t 16 reads.fastq.gz alignment.sam draft_assembly.fasta > polished_assembly.fasta,run Racon polishing using SAM alignment instead of PAF,assembly
racon,racon_07,-t 16 reads.fastq.gz mapping.paf draft_assembly.fasta > polished_round1.fasta,run one round of Racon polishing on an ONT assembly,assembly
racon,racon_08,-t 16 reads.fastq.gz round2_mapping.paf polished_round1.fasta > polished_round2.fasta,run second round of Racon polishing,assembly
racon,racon_09,-t 16 reads.fastq.gz alignment.sam draft_assembly.fasta > polished_assembly.fasta,run Racon polishing using SAM alignment instead of PAF,assembly
racon,racon_10,-t 16 reads.fastq.gz mapping.paf draft_assembly.fasta > polished_round1.fasta,run one round of Racon polishing on an ONT assembly with default parameters,assembly
repeatmasker,repeatmasker_01,-species human -xsmall -pa 16 -dir repeatmasker_output/ genome.fasta,softmask repeats in a mammalian genome assembly,annotation
repeatmasker,repeatmasker_02,-species arabidopsis -pa 8 -dir masked_output/ genome.fasta,hard-mask repeats in a plant genome,annotation
repeatmasker,repeatmasker_03,-lib custom_repeats.fasta -xsmall -pa 8 -dir custom_masked/ genome.fasta,mask repeats using a custom library,annotation
repeatmasker,repeatmasker_04,-noint -xsmall -pa 4 -dir simple_masked/ genome.fasta,mask only simple repeats and low-complexity regions,annotation
repeatmasker,repeatmasker_05,-species human -xsmall -pa 16 -dir repeatmasker_output/ genome.fasta,softmask repeats in a mammalian genome assembly with default parameters,annotation
repeatmasker,repeatmasker_06,-species arabidopsis -pa 8 -dir masked_output/ genome.fasta --verbose,hard-mask repeats in a plant genome with verbose output,annotation
repeatmasker,repeatmasker_07,-lib custom_repeats.fasta -xsmall -pa 8 -dir custom_masked/ genome.fasta -t 4,mask repeats using a custom library using multiple threads,annotation
repeatmasker,repeatmasker_08,-noint -xsmall -pa 4 -dir simple_masked/ genome.fasta -o output.txt,mask only simple repeats and low-complexity regions and write output to a file,annotation
repeatmasker,repeatmasker_09,-species human -xsmall -pa 16 -dir repeatmasker_output/ genome.fasta --quiet,softmask repeats in a mammalian genome assembly in quiet mode,annotation
repeatmasker,repeatmasker_10,-species arabidopsis -pa 8 -dir masked_output/ genome.fasta,hard-mask repeats in a plant genome with default parameters,annotation
rm,rm_01,file.txt,remove a single file,filesystem
rm,rm_02,-v *.tmp,remove multiple files matching a pattern,filesystem
rm,rm_03,-r old_results/,remove a directory and all its contents,filesystem
rm,rm_04,-i *.log,"interactively remove files, asking for confirmation before each deletion",filesystem
rm,rm_05,-rf temp_build/,force-remove a directory and its contents without prompts,filesystem
rm,rm_06,-rf /tmp/stale_dir/,force-remove a stale build directory,filesystem
rm,rm_07,-- -weirdfile.txt,remove a file with a name starting with a dash,filesystem
rm,rm_08,-d emptydir/,remove an empty directory,filesystem
rm,rm_09,-v *.bak,verbosely remove all files of a specific type in the current directory,filesystem
rm,rm_10,symlink_name,remove a symbolic link without following it to the target,filesystem
rsem,rsem_01,rsem-prepare-reference --gtf genes.gtf --num-threads 8 genome.fa rsem_index/genome,prepare RSEM reference from genome FASTA and GTF annotation,rna-seq
rsem,rsem_02,rsem-calculate-expression --paired-end --num-threads 8 --strandedness reverse R1.fastq.gz R2.fastq.gz rsem_index/genome sample_output,quantify paired-end RNA-seq reads using RSEM with Bowtie2,rna-seq
rsem,rsem_03,rsem-calculate-expression --paired-end --star --num-threads 8 R1.fastq.gz R2.fastq.gz rsem_index/genome sample_output,quantify RNA-seq using RSEM with STAR aligner,rna-seq
rsem,rsem_04,rsem-prepare-reference --num-threads 4 transcriptome.fa rsem_transcript_index/transcripts,prepare RSEM reference directly from transcriptome FASTA,rna-seq
rsem,rsem_05,rsem-generate-data-matrix sample1.genes.results sample2.genes.results sample3.genes.results > gene_count_matrix.txt,generate count matrix from multiple RSEM results files for DESeq2,rna-seq
rsem,rsem_06,rsem-calculate-expression --num-threads 8 reads.fastq.gz rsem_index/genome sample_output,quantify single-end RNA-seq reads using RSEM,rna-seq
rsem,rsem_07,rsem-prepare-reference --gtf genes.gtf --num-threads 8 --polyA genome.fa rsem_polyA_index/genome,prepare RSEM reference with Bowtie2 and poly-A trimming for scRNA-seq,rna-seq
rsem,rsem_08,rsem-calculate-expression --paired-end --num-threads 8 --estimate-rspd --strandedness none R1.fq.gz R2.fq.gz rsem_index/genome sample,quantify with RSEM and estimate read start position distribution,rna-seq
rsem,rsem_09,rsem-generate-data-matrix sample1.genes.results sample2.genes.results sample3.genes.results > count_matrix.txt,extract TPM column from RSEM gene results for cross-sample comparison,rna-seq
rsem,rsem_10,rsem-calculate-expression --paired-end --num-threads 8 --calc-ci R1.fastq.gz R2.fastq.gz rsem_index/genome sample_ci,calculate expression with confidence intervals for uncertainty estimation,rna-seq
rseqc,rseqc_01,infer_experiment.py -r hg38.bed -i sorted.bam -s 2000000,infer library strandedness from a BAM,qc
rseqc,rseqc_02,read_distribution.py -r hg38.bed -i sorted.bam,get read distribution across genomic features,qc
rseqc,rseqc_03,junction_annotation.py -r hg38.bed -i sorted.bam -o sample_junctions,annotate splice junctions,qc
rseqc,rseqc_04,junction_saturation.py -r hg38.bed -i sorted.bam -o sample_sat,check saturation of junction detection,qc
rseqc,rseqc_05,bam_stat.py -i sorted.bam,compute BAM statistics,qc
rseqc,rseqc_06,tin.py -i sorted.bam -r hg38.bed,measure transcript integrity (RNA quality),qc
rseqc,rseqc_07,inner_distance.py -r hg38.bed -i sorted.bam -o inner_dist,estimate inner distance for paired-end RNA-seq,qc
rseqc,rseqc_08,read_duplication.py -i sorted.bam -o duplication,check for read duplication rate,qc
rseqc,rseqc_09,infer_experiment.py -r hg38.bed -i sorted.bam -s 2000000 --quiet,infer library strandedness from a BAM in quiet mode,qc
rseqc,rseqc_10,read_distribution.py -r hg38.bed -i sorted.bam,get read distribution across genomic features with default parameters,qc
rsync,rsync_01,-avz /local/data/ user@remote:/remote/data/,sync a local directory to a remote server with verbose output and compression,networking
rsync,rsync_02,-avzn /source/ /dest/,dry-run to preview what would be transferred,networking
rsync,rsync_03,-avz --delete /source/ /dest/,"mirror source to destination, deleting removed files",networking
rsync,rsync_04,-avz user@remote:/remote/data/ /local/backup/,sync from remote server to local directory,networking
rsync,rsync_05,-avzP user@remote:/path/large-file.tar.gz /local/,resume a large interrupted transfer,networking
rsync,rsync_06,-avz --exclude='.git' --exclude='*.pyc' --exclude='__pycache__' /src/ user@remote:/dest/,sync excluding specific directories and patterns,networking
rsync,rsync_07,-avz -e 'ssh -p 2222' /local/data/ user@remote:/data/,sync using a non-standard SSH port,networking
rsync,rsync_08,-avz --info=progress2 /source/ /dest/,show total transfer progress instead of per-file progress,networking
rsync,rsync_09,-avzH /source/ /dest/,copy files preserving hard links,networking
rsync,rsync_10,-avz --update /source/ /dest/,sync only files newer than a reference file,networking
salmon,salmon_01,index -t transcriptome.fa -i salmon_index --threads 8,build a Salmon transcriptome index,rna-seq
salmon,salmon_02,quant -i salmon_index -l A -1 R1.fastq.gz -2 R2.fastq.gz -p 8 --gcBias --validateMappings -o sample_quant,quantify paired-end RNA-seq reads with automatic library type detection,rna-seq
salmon,salmon_03,quant -i salmon_index -l A -r reads.fastq.gz -p 8 --gcBias -o sample_quant,quantify single-end RNA-seq reads,rna-seq
salmon,salmon_04,index -t gentrome.fa -d decoys.txt -i salmon_index_decoy --threads 8,build decoy-aware salmon index for more accurate quantification,rna-seq
salmon,salmon_05,quant -i salmon_index -l ISR -1 R1.fastq.gz -2 R2.fastq.gz -p 8 --gcBias --seqBias --validateMappings -o sample_quant,quantify bulk RNA-seq with strand-specific reverse library,rna-seq
salmon,salmon_06,index -t transcriptome.fa -i salmon_index --threads 8 --verbose,build a Salmon transcriptome index with verbose output,rna-seq
salmon,salmon_07,quant -i salmon_index -l A -1 R1.fastq.gz -2 R2.fastq.gz -p 8 --gcBias --validateMappings -o sample_quant,quantify paired-end RNA-seq reads with automatic library type detection using multiple threads,rna-seq
salmon,salmon_08,quant -i salmon_index -l A -r reads.fastq.gz -p 8 --gcBias -o sample_quant,quantify single-end RNA-seq reads and write output to a file,rna-seq
salmon,salmon_09,index -t gentrome.fa -d decoys.txt -i salmon_index_decoy --threads 8 --quiet,build decoy-aware salmon index for more accurate quantification in quiet mode,rna-seq
salmon,salmon_10,quant -i salmon_index -l ISR -1 R1.fastq.gz -2 R2.fastq.gz -p 8 --gcBias --seqBias --validateMappings -o sample_quant,quantify bulk RNA-seq with strand-specific reverse library with default parameters,rna-seq
samtools,samtools_01,sort -@ 4 -o sorted.bam input.bam,sort a BAM file by genomic coordinates,alignment
samtools,samtools_02,index sorted.bam,create an index for a sorted BAM file,alignment
samtools,samtools_03,view -b -f 2 -F 256 -F 2048 -o proper_paired.bam input.bam,filter to keep only properly paired primary alignments,alignment
samtools,samtools_04,flagstat input.bam,"get alignment statistics (mapped, unmapped, duplicates)",alignment
samtools,samtools_05,fastq -@ 4 -1 R1.fastq.gz -2 R2.fastq.gz -0 /dev/null -s /dev/null -n input.bam,convert BAM to FASTQ for paired-end reads,alignment
samtools,samtools_06,view -b -o region.bam input.bam chr1:100000-200000,extract reads mapping to chromosome 1 between 100000 and 200000,alignment
samtools,samtools_07,markdup -@ 4 -f stats.txt input_namesorted.bam output_markdup.bam,mark PCR duplicates,alignment
samtools,samtools_08,merge -@ 4 -f merged.bam sample1.bam sample2.bam sample3.bam,merge multiple BAM files into one,alignment
samtools,samtools_09,depth -a -o coverage.txt input.bam,compute per-base depth of coverage,alignment
samtools,samtools_10,view -H input.bam,view the BAM header,alignment
sed,sed_01,-i 's/foo/bar/g' file.txt,replace all occurrences of a word in a file in-place,text-processing
sed,sed_02,-i.bak 's/old_host/new_host/g' config.conf,replace text in-place and keep a backup,text-processing
sed,sed_03,'/^$/d' file.txt,delete all blank lines from a file,text-processing
sed,sed_04,-i '/^#/d' script.sh,delete lines containing a pattern,text-processing
sed,sed_05,-n '/error/p' app.log,print only lines matching a pattern (like grep),text-processing
sed,sed_06,-E 's/([0-9]{4})-([0-9]{2})-([0-9]{2})/\3\/\2\/\1/' dates.txt,extract and reformat date using capture groups,text-processing
sed,sed_07,'s/^/PREFIX: /' input.txt,add a prefix to every line in a file,text-processing
sed,sed_08,-E 's/[[:space:]]+$//' file.txt,remove trailing whitespace from all lines,text-processing
sed,sed_09,'10s/old/new/' file.txt,replace only on a specific line number,text-processing
sed,sed_10,'/^\[section\]/a new_key=value' config.ini,insert a line after a matching pattern,text-processing
seqkit,seqkit_01,stats -a reads.fastq.gz,"get basic statistics of a FASTQ file (read count, total bases, average length)",sequence-utilities
seqkit,seqkit_02,seq -m 100 -j 4 -o filtered.fastq.gz input.fastq.gz,filter reads shorter than 100 bp and write to a new file,sequence-utilities
seqkit,seqkit_03,seq -r -p -j 4 input.fa -o revcomp.fa,get the reverse complement of all sequences in a FASTA file,sequence-utilities
seqkit,seqkit_04,grep -f id_list.txt input.fa -o subset.fa,extract sequences by name from a list file,sequence-utilities
seqkit,seqkit_05,sample -n 10000 -s 42 -j 4 -o sample.fastq.gz input.fastq.gz,randomly sample 10000 reads from a large FASTQ file,sequence-utilities
seqkit,seqkit_06,fq2fa -j 4 input.fastq.gz -o output.fa.gz,convert FASTQ to FASTA,sequence-utilities
seqkit,seqkit_07,split2 -s 1000 -j 4 -O split_output input.fa,split FASTA file into chunks of 1000 sequences each,sequence-utilities
seqkit,seqkit_08,stats -a reads.fastq.gz -o output.txt,"get basic statistics of a FASTQ file (read count, total bases, average length) and write output to a file",sequence-utilities
seqkit,seqkit_09,seq -m 100 -j 4 -o filtered.fastq.gz input.fastq.gz --quiet,filter reads shorter than 100 bp and write to a new file in quiet mode,sequence-utilities
seqkit,seqkit_10,seq -r -p -j 4 input.fa -o revcomp.fa,get the reverse complement of all sequences in a FASTA file with default parameters,sequence-utilities
seqtk,seqtk_01,sample -s 42 R1.fastq.gz 1000000 | gzip > sub_R1.fastq.gz,subsample 1 million read pairs from paired-end FASTQ files,sequence-utilities
seqtk,seqtk_02,seq -a reads.fastq.gz > reads.fasta,convert FASTQ to FASTA format,sequence-utilities
seqtk,seqtk_03,seq -r sequences.fasta > revcomp.fasta,reverse complement all sequences in a FASTA file,sequence-utilities
seqtk,seqtk_04,subseq reads.fastq.gz read_names.txt > extracted_reads.fastq,extract specific sequences by name from a FASTQ file,sequence-utilities
seqtk,seqtk_05,trimfq -q 0.05 reads.fastq.gz | gzip > trimmed.fastq.gz,quality trim reads below Phred 20 from both ends,sequence-utilities
seqtk,seqtk_06,sample -s 100 reads.fastq.gz 0.1 > subsampled_10pct.fastq,subsample 10% of reads with reproducible seed,sequence-utilities
seqtk,seqtk_07,sample -s 42 R1.fastq.gz 1000000 | gzip > sub_R1.fastq.gz,subsample 1 million read pairs from paired-end FASTQ files,sequence-utilities
seqtk,seqtk_08,seq -a reads.fastq.gz > reads.fasta,convert FASTQ to FASTA format,sequence-utilities
seqtk,seqtk_09,seq -r sequences.fasta > revcomp.fasta,reverse complement all sequences in a FASTA file,sequence-utilities
seqtk,seqtk_10,subseq reads.fastq.gz read_names.txt > extracted_reads.fastq,extract specific sequences by name from a FASTQ file with default parameters,sequence-utilities
shapeit4,shapeit4_01,--input variants.vcf.gz --map genetic_map_chr1.txt --region chr1 --output phased_chr1.vcf.gz --thread 8,phase a chromosome using SHAPEIT4,population-genomics
shapeit4,shapeit4_02,--input variants.vcf.gz --scaffold reference_panel.vcf.gz --map genetic_map_chr22.txt --region chr22 --output phased_chr22.vcf.gz --thread 8,phase with a reference haplotype scaffold panel,population-genomics
shapeit4,shapeit4_03,"--input variants.vcf.gz --map genetic_map.txt --region chr1 --sequencing --output phased.vcf.gz --thread 16 --mcmc-iterations 8b,1p,1b,1p,1b,1p,5m",phase sequencing data with higher accuracy settings,population-genomics
shapeit4,shapeit4_04,--input variants.vcf.gz --map genetic_map_chr1.txt --region chr1 --output phased_chr1.vcf.gz --thread 8 --quiet,phase a chromosome using SHAPEIT4 in quiet mode,population-genomics
shapeit4,shapeit4_05,--input variants.vcf.gz --scaffold reference_panel.vcf.gz --map genetic_map_chr22.txt --region chr22 --output phased_chr22.vcf.gz --thread 8,phase with a reference haplotype scaffold panel with default parameters,population-genomics
shapeit4,shapeit4_06,"--input variants.vcf.gz --map genetic_map.txt --region chr1 --sequencing --output phased.vcf.gz --thread 16 --mcmc-iterations 8b,1p,1b,1p,1b,1p,5m --verbose",phase sequencing data with higher accuracy settings with verbose output,population-genomics
shapeit4,shapeit4_07,--input variants.vcf.gz --map genetic_map_chr1.txt --region chr1 --output phased_chr1.vcf.gz --thread 8 -t 4,phase a chromosome using SHAPEIT4 using multiple threads,population-genomics
shapeit4,shapeit4_08,--input variants.vcf.gz --scaffold reference_panel.vcf.gz --map genetic_map_chr22.txt --region chr22 --output phased_chr22.vcf.gz --thread 8,phase with a reference haplotype scaffold panel and write output to a file,population-genomics
shapeit4,shapeit4_09,"--input variants.vcf.gz --map genetic_map.txt --region chr1 --sequencing --output phased.vcf.gz --thread 16 --mcmc-iterations 8b,1p,1b,1p,1b,1p,5m --quiet",phase sequencing data with higher accuracy settings in quiet mode,population-genomics
shapeit4,shapeit4_10,--input variants.vcf.gz --map genetic_map_chr1.txt --region chr1 --output phased_chr1.vcf.gz --thread 8,phase a chromosome using SHAPEIT4 with default parameters,population-genomics
snakemake,snakemake_01,--cores all --use-conda,run a workflow using all available cores,workflow-manager
snakemake,snakemake_02,--dry-run --printshellcmds,dry-run to see what would be executed,workflow-manager
snakemake,snakemake_03,--executor slurm --jobs 50 --default-resources mem_mb=4096 runtime=60 --use-conda,run a workflow on a Slurm cluster,workflow-manager
snakemake,snakemake_04,--configfile config/config.yaml --cores 8,run with a configuration file,workflow-manager
snakemake,snakemake_05,--profile slurm,use a named profile for cluster execution,workflow-manager
snakemake,snakemake_06,--forcerun trimming alignment --cores 16,force re-run of specific rules,workflow-manager
snakemake,snakemake_07,--unlock,unlock a workflow after a crash,workflow-manager
snakemake,snakemake_08,--dag | dot -Tpng > dag.png,generate a rule dependency graph (DAG),workflow-manager
snakemake,snakemake_09,--rerun-incomplete --cores all,clean up incomplete output files and restart,workflow-manager
snakemake,snakemake_10,--use-singularity --singularity-args '--bind /scratch' --cores 8,run with Singularity containers,workflow-manager
sniffles,sniffles_01,--input sorted.bam --vcf output_svs.vcf --threads 8,call SVs from a single Oxford Nanopore BAM file,variant-calling
sniffles,sniffles_02,--input sorted.bam --vcf output_svs.vcf --minsupport 5 --minsvlen 50 --threads 8,call SVs with minimum read support of 5 and minimum SV length of 50 bp,variant-calling
sniffles,sniffles_03,--input sample1.bam --snf sample1.snf --vcf sample1.vcf --threads 8,generate SNF file for multi-sample population SV calling,variant-calling
sniffles,sniffles_04,--input sample1.snf sample2.snf sample3.snf --vcf population_svs.vcf --threads 8,combine multiple SNF files for population-level SV calling,variant-calling
sniffles,sniffles_05,--input tumor.bam --vcf mosaic_svs.vcf --mosaic --threads 8,call mosaic or somatic SVs with low frequency support,variant-calling
sniffles,sniffles_06,--input sorted.bam --vcf output_svs.vcf --threads 8 --verbose,call SVs from a single Oxford Nanopore BAM file with verbose output,variant-calling
sniffles,sniffles_07,--input sorted.bam --vcf output_svs.vcf --minsupport 5 --minsvlen 50 --threads 8,call SVs with minimum read support of 5 and minimum SV length of 50 bp using multiple threads,variant-calling
sniffles,sniffles_08,--input sample1.bam --snf sample1.snf --vcf sample1.vcf --threads 8 -o output.txt,generate SNF file for multi-sample population SV calling and write output to a file,variant-calling
sniffles,sniffles_09,--input sample1.snf sample2.snf sample3.snf --vcf population_svs.vcf --threads 8 --quiet,combine multiple SNF files for population-level SV calling in quiet mode,variant-calling
sniffles,sniffles_10,--input tumor.bam --vcf mosaic_svs.vcf --mosaic --threads 8,call mosaic or somatic SVs with low frequency support with default parameters,variant-calling
snpeff,snpeff_01,ann -v GRCh38.105 variants.vcf > annotated.vcf,annotate variants in a VCF file using the GRCh38 human genome database,variant-annotation
snpeff,snpeff_02,ann -v -stats snpeff_summary.html GRCh38.105 variants.vcf > annotated.vcf,annotate variants and generate an HTML statistics report,variant-annotation
snpeff,snpeff_03,ann -v hg19 variants.vcf > annotated_hg19.vcf,annotate variants from hg19/GRCh37 genome,variant-annotation
snpeff,snpeff_04,ann -v -no-downstream -no-upstream -no-intron -no-intergenic GRCh38.105 variants.vcf > coding_annotated.vcf,annotate variants and filter by quality for clinical reporting,variant-annotation
snpeff,snpeff_05,build -gff3 -v MyGenome,build a custom SnpEff genome database from GFF3 annotation,variant-annotation
snpeff,snpeff_06,ann -v GRCh38.105 variants.vcf > annotated.vcf,annotate variants in a VCF file using the GRCh38 human genome database,variant-annotation
snpeff,snpeff_07,ann -v -stats snpeff_summary.html GRCh38.105 variants.vcf > annotated.vcf,annotate variants and generate an HTML statistics report,variant-annotation
snpeff,snpeff_08,ann -v hg19 variants.vcf > annotated_hg19.vcf,annotate variants from hg19/GRCh37 genome,variant-annotation
snpeff,snpeff_09,ann -v -no-downstream -no-upstream -no-intron -no-intergenic GRCh38.105 variants.vcf > coding_annotated.vcf,annotate variants and filter by quality for clinical reporting,variant-annotation
snpeff,snpeff_10,build -gff3 -v MyGenome,build a custom SnpEff genome database from GFF3 annotation with default parameters,variant-annotation
sourmash,sourmash_01,"sketch dna -p k=31,scaled=1000 genome.fasta -o genome.sig",sketch a genome FASTA file at default parameters,sequence-utilities
sourmash,sourmash_02,"sketch dna -p k=31,scaled=1000 *.fasta --output-dir sigs/",sketch multiple genome files and store in one database,sequence-utilities
sourmash,sourmash_03,compare sigs/*.sig --csv similarity_matrix.csv -k 31,compare all signatures in a directory and output similarity matrix,sequence-utilities
sourmash,sourmash_04,gather sample.sig gtdb_rs207.k31.zip -k 31 --threshold-bp 50000 -o gather_results.csv,decompose a metagenome sample against a reference database,sequence-utilities
sourmash,sourmash_05,taxonomy annotate -g gather_results.csv -t gtdb-rs207.taxonomy.csv -o annotated_results.csv,add taxonomy to gather results,sequence-utilities
sourmash,sourmash_06,search query.sig refdb.zip -k 31 --threshold 0.1 -n 20 -o search_results.csv,search a signature against a database for top hits,sequence-utilities
sourmash,sourmash_07,index refdb.zip sigs/*.sig -k 31,build an indexed database from many signature files for fast search,sequence-utilities
sourmash,sourmash_08,"sketch dna -p k=31,scaled=1000 genome.fasta -o genome.sig",sketch a genome FASTA file at default parameters and write output to a file,sequence-utilities
sourmash,sourmash_09,"sketch dna -p k=31,scaled=1000 *.fasta --output-dir sigs/ --quiet",sketch multiple genome files and store in one database in quiet mode,sequence-utilities
sourmash,sourmash_10,compare sigs/*.sig --csv similarity_matrix.csv -k 31,compare all signatures in a directory and output similarity matrix with default parameters,sequence-utilities
spades,spades_01,-1 R1.fastq.gz -2 R2.fastq.gz -o spades_output/ --threads 16 --memory 32 --careful,assemble a bacterial genome from paired-end reads,assembly
spades,spades_02,--meta -1 R1.fastq.gz -2 R2.fastq.gz -o metaspades_output/ --threads 32 --memory 128,assemble a metagenome from paired-end reads,assembly
spades,spades_03,--plasmid -1 R1.fastq.gz -2 R2.fastq.gz -o plasmidspades_output/ --threads 8 --memory 16,assemble plasmids from paired-end reads,assembly
spades,spades_04,--sc -1 R1.fastq.gz -2 R2.fastq.gz -o sc_spades_output/ --threads 8 --memory 32,assemble single-cell MDA amplified data,assembly
spades,spades_05,-o spades_output/ --continue,resume interrupted SPAdes assembly,assembly
spades,spades_06,-1 short_R1.fastq.gz -2 short_R2.fastq.gz --nanopore long_reads.fastq.gz -o hybrid_output/ --threads 16 --memory 64,assemble with both paired-end and long reads (hybrid assembly),assembly
spades,spades_07,-1 R1.fastq.gz -2 R2.fastq.gz -o spades_output/ --threads 16 --memory 32 --careful,assemble a bacterial genome from paired-end reads using multiple threads,assembly
spades,spades_08,--meta -1 R1.fastq.gz -2 R2.fastq.gz -o metaspades_output/ --threads 32 --memory 128,assemble a metagenome from paired-end reads and write output to a file,assembly
spades,spades_09,--plasmid -1 R1.fastq.gz -2 R2.fastq.gz -o plasmidspades_output/ --threads 8 --memory 16 --quiet,assemble plasmids from paired-end reads in quiet mode,assembly
spades,spades_10,--sc -1 R1.fastq.gz -2 R2.fastq.gz -o sc_spades_output/ --threads 8 --memory 32,assemble single-cell MDA amplified data with default parameters,assembly
sra-tools,sra-tools_01,fasterq-dump SRR123456 -O output_directory/ -e 8,download and convert an SRA accession to FASTQ,utilities
sra-tools,sra-tools_02,prefetch SRR123456 -O sra_downloads/,download SRA file first then convert (more reliable),utilities
sra-tools,sra-tools_03,prefetch --option-file accession_list.txt -O sra_downloads/,download multiple SRA accessions in batch,utilities
sra-tools,sra-tools_04,fasterq-dump SRR123456 -O output/ -e 8 && gzip output/SRR123456_1.fastq output/SRR123456_2.fastq,convert SRA to compressed FASTQ,utilities
sra-tools,sra-tools_05,vdb-validate SRR123456.sra,validate an SRA file integrity,utilities
sra-tools,sra-tools_06,sra-stat --quick --xml SRR123456,get statistics for an SRA run without downloading reads,utilities
sra-tools,sra-tools_07,prefetch ERR123456 -O sra_downloads/,download an ENA/EBI accession using prefetch,utilities
sra-tools,sra-tools_08,fasterq-dump SRR123456 --stdout -e 4 | head -40,list all reads in an SRA file,utilities
sra-tools,sra-tools_09,fasterq-dump SRR123456 --check-space,check available disk space before a large download,utilities
sra-tools,sra-tools_10,fasterq-dump SRR123456 -O output/ -e 8 --skip-technical,download a single-end SRA accession and skip technical reads,utilities
ssh,ssh_01,user@hostname,connect to a remote server as a specific user,networking
ssh,ssh_02,-i ~/.ssh/id_ed25519 user@hostname,connect using a specific private key file,networking
ssh,ssh_03,-p 2222 user@hostname,connect on a non-standard port,networking
ssh,ssh_04,-L 8080:localhost:80 user@hostname,forward a local port to a remote service (local port forwarding),networking
ssh,ssh_05,user@hostname 'df -h && free -h',run a command on a remote host without an interactive shell,networking
ssh,ssh_06,-X user@hostname,enable X11 forwarding to run graphical applications remotely,networking
ssh,ssh_07,-R 9090:localhost:3000 user@hostname,connect and set up reverse port forwarding (expose local service to remote),networking
ssh,ssh_08,-D 1080 -N user@hostname,create a SOCKS5 proxy tunnel through the remote host,networking
ssh,ssh_09,-o ServerAliveInterval=60 -o ServerAliveCountMax=3 user@hostname,keep connection alive and reconnect automatically,networking
ssh,ssh_10,-J bastion_user@bastion_host target_user@target_host,use jump host (bastion) to reach a machine not directly accessible,networking
star,star_01,--runMode genomeGenerate --genomeDir /path/to/star_index --genomeFastaFiles genome.fa --sjdbGTFfile genes.gtf --runThreadN 8,build genome index for STAR alignment,alignment
star,star_02,--runMode alignReads --genomeDir /path/to/star_index --readFilesIn R1.fastq.gz R2.fastq.gz --readFilesCommand zcat --runThreadN 8 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample1/ --outSAMattributes NH HI AS NM,align paired-end RNA-seq gzipped FASTQ files to the genome,alignment
star,star_03,--runMode alignReads --genomeDir /star_index --readFilesIn reads.fastq.gz --readFilesCommand zcat --twopassMode Basic --runThreadN 8 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample/,align single-end RNA-seq reads with two-pass mode for better junction detection,alignment
star,star_04,--runMode alignReads --genomeDir /star_index --readFilesIn R1.fq.gz R2.fq.gz --readFilesCommand zcat --runThreadN 8 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample/ --outReadsUnmapped Fastx,align reads and output unmapped reads to a FASTQ file,alignment
star,star_05,--runMode genomeGenerate --genomeDir /path/to/star_index --genomeFastaFiles genome.fa --sjdbGTFfile genes.gtf --runThreadN 8,build genome index for STAR alignment with default parameters,alignment
star,star_06,--runMode alignReads --genomeDir /path/to/star_index --readFilesIn R1.fastq.gz R2.fastq.gz --readFilesCommand zcat --runThreadN 8 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample1/ --outSAMattributes NH HI AS NM --verbose,align paired-end RNA-seq gzipped FASTQ files to the genome with verbose output,alignment
star,star_07,--runMode alignReads --genomeDir /star_index --readFilesIn reads.fastq.gz --readFilesCommand zcat --twopassMode Basic --runThreadN 8 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample/ -t 4,align single-end RNA-seq reads with two-pass mode for better junction detection using multiple threads,alignment
star,star_08,--runMode alignReads --genomeDir /star_index --readFilesIn R1.fq.gz R2.fq.gz --readFilesCommand zcat --runThreadN 8 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample/ --outReadsUnmapped Fastx -o output.txt,align reads and output unmapped reads to a FASTQ file and write output to a file,alignment
star,star_09,--runMode genomeGenerate --genomeDir /path/to/star_index --genomeFastaFiles genome.fa --sjdbGTFfile genes.gtf --runThreadN 8 --quiet,build genome index for STAR alignment in quiet mode,alignment
star,star_10,--runMode alignReads --genomeDir /path/to/star_index --readFilesIn R1.fastq.gz R2.fastq.gz --readFilesCommand zcat --runThreadN 8 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample1/ --outSAMattributes NH HI AS NM,align paired-end RNA-seq gzipped FASTQ files to the genome with default parameters,alignment
starsolo,starsolo_01,--soloType CB_UMI_Simple --soloCBwhitelist 3M-february-2018.txt --soloCBstart 1 --soloCBlen 16 --soloUMIstart 17 --soloUMIlen 12 --genomeDir /path/to/star_genome/ --readFilesIn R2.fastq.gz R1.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --outSAMattributes NH HI nM AS CR UR CB UB GX GN sS sQ sM --runThreadN 16 --outFileNamePrefix sample_starsolo/,process 10x Chromium v3 scRNA-seq with STARsolo,single-cell
starsolo,starsolo_02,--soloType CB_UMI_Simple --soloCBwhitelist 737K-august-2016.txt --soloCBlen 16 --soloUMIlen 10 --genomeDir /star_genome/ --readFilesIn R2.fastq.gz R1.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --runThreadN 16 --outFileNamePrefix sample_v2/,process 10x Chromium v2 scRNA-seq with STARsolo,single-cell
starsolo,starsolo_03,--soloType CB_UMI_Simple --soloCBwhitelist 3M-february-2018.txt --soloCBlen 16 --soloUMIlen 12 --soloFeatures Gene Velocyto --genomeDir /star_genome/ --readFilesIn R2.fastq.gz R1.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --runThreadN 16 --outFileNamePrefix velocity_sample/,run STARsolo with RNA velocity output,single-cell
starsolo,starsolo_04,--soloType CB_UMI_Simple --soloCBwhitelist 3M-february-2018.txt --soloCBstart 1 --soloCBlen 16 --soloUMIstart 17 --soloUMIlen 12 --genomeDir /path/to/star_genome/ --readFilesIn R2.fastq.gz R1.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --outSAMattributes NH HI nM AS CR UR CB UB GX GN sS sQ sM --runThreadN 16 --outFileNamePrefix sample_starsolo/ --quiet,process 10x Chromium v3 scRNA-seq with STARsolo in quiet mode,single-cell
starsolo,starsolo_05,--soloType CB_UMI_Simple --soloCBwhitelist 737K-august-2016.txt --soloCBlen 16 --soloUMIlen 10 --genomeDir /star_genome/ --readFilesIn R2.fastq.gz R1.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --runThreadN 16 --outFileNamePrefix sample_v2/,process 10x Chromium v2 scRNA-seq with STARsolo with default parameters,single-cell
starsolo,starsolo_06,--soloType CB_UMI_Simple --soloCBwhitelist 3M-february-2018.txt --soloCBlen 16 --soloUMIlen 12 --soloFeatures Gene Velocyto --genomeDir /star_genome/ --readFilesIn R2.fastq.gz R1.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --runThreadN 16 --outFileNamePrefix velocity_sample/ --verbose,run STARsolo with RNA velocity output with verbose output,single-cell
starsolo,starsolo_07,--soloType CB_UMI_Simple --soloCBwhitelist 3M-february-2018.txt --soloCBstart 1 --soloCBlen 16 --soloUMIstart 17 --soloUMIlen 12 --genomeDir /path/to/star_genome/ --readFilesIn R2.fastq.gz R1.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --outSAMattributes NH HI nM AS CR UR CB UB GX GN sS sQ sM --runThreadN 16 --outFileNamePrefix sample_starsolo/ -t 4,process 10x Chromium v3 scRNA-seq with STARsolo using multiple threads,single-cell
starsolo,starsolo_08,--soloType CB_UMI_Simple --soloCBwhitelist 737K-august-2016.txt --soloCBlen 16 --soloUMIlen 10 --genomeDir /star_genome/ --readFilesIn R2.fastq.gz R1.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --runThreadN 16 --outFileNamePrefix sample_v2/ -o output.txt,process 10x Chromium v2 scRNA-seq with STARsolo and write output to a file,single-cell
starsolo,starsolo_09,--soloType CB_UMI_Simple --soloCBwhitelist 3M-february-2018.txt --soloCBlen 16 --soloUMIlen 12 --soloFeatures Gene Velocyto --genomeDir /star_genome/ --readFilesIn R2.fastq.gz R1.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --runThreadN 16 --outFileNamePrefix velocity_sample/ --quiet,run STARsolo with RNA velocity output in quiet mode,single-cell
starsolo,starsolo_10,--soloType CB_UMI_Simple --soloCBwhitelist 3M-february-2018.txt --soloCBstart 1 --soloCBlen 16 --soloUMIstart 17 --soloUMIlen 12 --genomeDir /path/to/star_genome/ --readFilesIn R2.fastq.gz R1.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --outSAMattributes NH HI nM AS CR UR CB UB GX GN sS sQ sM --runThreadN 16 --outFileNamePrefix sample_starsolo/,process 10x Chromium v3 scRNA-seq with STARsolo with default parameters,single-cell
strelka2,strelka2_01,configureStrelkaGermlineWorkflow.py --bam sorted.bam --referenceFasta reference.fa --runDir strelka_germline && python strelka_germline/runWorkflow.py -m local -j 8,configure and run Strelka2 germline variant calling,variant-calling
strelka2,strelka2_02,configureStrelkaSomaticWorkflow.py --normalBam normal.bam --tumourBam tumor.bam --referenceFasta reference.fa --runDir strelka_somatic && python strelka_somatic/runWorkflow.py -m local -j 8,configure and run Strelka2 somatic variant calling for tumor-normal pair,variant-calling
strelka2,strelka2_03,configureStrelkaGermlineWorkflow.py --bam sorted.bam --referenceFasta reference.fa --exome --callRegions targets.bed.gz --runDir strelka_wes && python strelka_wes/runWorkflow.py -m local -j 8,run Strelka2 germline on WES data with target regions,variant-calling
strelka2,strelka2_04,configureStrelkaSomaticWorkflow.py --normalBam normal.bam --tumourBam tumor.bam --referenceFasta reference.fa --indelCandidates manta_results/results/variants/candidateSmallIndels.vcf.gz --runDir strelka_with_manta && python strelka_with_manta/runWorkflow.py -m local -j 8,run Strelka2 somatic with Manta indel candidates for improved accuracy,variant-calling
strelka2,strelka2_05,configureStrelkaGermlineWorkflow.py --bam sorted.bam --referenceFasta reference.fa --runDir strelka_germline && python strelka_germline/runWorkflow.py -m local -j 8,configure and run Strelka2 germline variant calling with default parameters,variant-calling
strelka2,strelka2_06,configureStrelkaSomaticWorkflow.py --normalBam normal.bam --tumourBam tumor.bam --referenceFasta reference.fa --runDir strelka_somatic && python strelka_somatic/runWorkflow.py -m local -j 8 --verbose,configure and run Strelka2 somatic variant calling for tumor-normal pair with verbose output,variant-calling
strelka2,strelka2_07,configureStrelkaGermlineWorkflow.py --bam sorted.bam --referenceFasta reference.fa --exome --callRegions targets.bed.gz --runDir strelka_wes && python strelka_wes/runWorkflow.py -m local -j 8,run Strelka2 germline on WES data with target regions using multiple threads,variant-calling
strelka2,strelka2_08,configureStrelkaSomaticWorkflow.py --normalBam normal.bam --tumourBam tumor.bam --referenceFasta reference.fa --indelCandidates manta_results/results/variants/candidateSmallIndels.vcf.gz --runDir strelka_with_manta && python strelka_with_manta/runWorkflow.py -m local -j 8 -o output.txt,run Strelka2 somatic with Manta indel candidates for improved accuracy and write output to a file,variant-calling
strelka2,strelka2_09,configureStrelkaGermlineWorkflow.py --bam sorted.bam --referenceFasta reference.fa --runDir strelka_germline && python strelka_germline/runWorkflow.py -m local -j 8 --quiet,configure and run Strelka2 germline variant calling in quiet mode,variant-calling
strelka2,strelka2_10,configureStrelkaSomaticWorkflow.py --normalBam normal.bam --tumourBam tumor.bam --referenceFasta reference.fa --runDir strelka_somatic && python strelka_somatic/runWorkflow.py -m local -j 8,configure and run Strelka2 somatic variant calling for tumor-normal pair with default parameters,variant-calling
stringtie,stringtie_01,-G genes.gtf -o sample1.gtf -p 8 --rf sample1_sorted.bam,assemble transcripts from HISAT2-aligned RNA-seq BAM with reference annotation,rna-seq
stringtie,stringtie_02,--merge -G genes.gtf -o merged.gtf sample1.gtf sample2.gtf sample3.gtf,merge per-sample StringTie GTFs into unified transcript catalog,rna-seq
stringtie,stringtie_03,-e -B -G merged.gtf -o sample1_re/sample1.gtf -p 8 --rf sample1_sorted.bam,re-quantify known and assembled transcripts using merged annotation (for count extraction),rna-seq
stringtie,stringtie_04,-o novel_transcripts.gtf -p 8 --rf sample1_sorted.bam,assemble and quantify without reference annotation (novel transcript discovery),rna-seq
stringtie,stringtie_05,-i sample_list.txt -g gene_count_matrix.csv -e transcript_count_matrix.csv,extract count matrix from StringTie -e output for DESeq2 with prepDE.py3,rna-seq
stringtie,stringtie_06,-G genes.gtf -o sample1.gtf -p 8 --rf sample1_sorted.bam --verbose,assemble transcripts from HISAT2-aligned RNA-seq BAM with reference annotation with verbose output,rna-seq
stringtie,stringtie_07,--merge -G genes.gtf -o merged.gtf sample1.gtf sample2.gtf sample3.gtf -t 4,merge per-sample StringTie GTFs into unified transcript catalog using multiple threads,rna-seq
stringtie,stringtie_08,-e -B -G merged.gtf -o sample1_re/sample1.gtf -p 8 --rf sample1_sorted.bam,re-quantify known and assembled transcripts using merged annotation (for count extraction) and write output to a file,rna-seq
stringtie,stringtie_09,-o novel_transcripts.gtf -p 8 --rf sample1_sorted.bam --quiet,assemble and quantify without reference annotation (novel transcript discovery) in quiet mode,rna-seq
stringtie,stringtie_10,-i sample_list.txt -g gene_count_matrix.csv -e transcript_count_matrix.csv,extract count matrix from StringTie -e output for DESeq2 with prepDE.py3 with default parameters,rna-seq
survivor,survivor_01,merge vcf_list.txt 500 2 1 1 0 50 merged_svs.vcf,merge SV VCFs from multiple callers requiring support from at least 2 callers,variant-calling
survivor,survivor_02,merge sample_vcfs.txt 1000 1 1 1 0 50 cohort_svs.vcf,merge SV calls from a single caller across multiple samples,variant-calling
survivor,survivor_03,stats -i calls.vcf -o sv_stats.txt,get summary statistics for SVs in a VCF,variant-calling
survivor,survivor_04,filter -i calls.vcf -o filtered.vcf -s 50 -e 100000 -f 0,filter SVs to a high-confidence set by size and minimum quality,variant-calling
survivor,survivor_05,simSV reference.fasta parameter_file.txt 0 0 simulated,simulate structural variants on a reference genome for benchmarking,variant-calling
survivor,survivor_06,ls sniffles.vcf pbsv.vcf cutesv.vcf > vcf_list.txt && merge vcf_list.txt 500 2 1 1 0 50 consensus_svs.vcf,create a VCF list file and merge three caller outputs,variant-calling
survivor,survivor_07,merge vcf_list.txt 500 2 1 1 0 50 merged_svs.vcf -t 4,merge SV VCFs from multiple callers requiring support from at least 2 callers using multiple threads,variant-calling
survivor,survivor_08,merge sample_vcfs.txt 1000 1 1 1 0 50 cohort_svs.vcf -o output.txt,merge SV calls from a single caller across multiple samples and write output to a file,variant-calling
survivor,survivor_09,stats -i calls.vcf -o sv_stats.txt --quiet,get summary statistics for SVs in a VCF in quiet mode,variant-calling
survivor,survivor_10,filter -i calls.vcf -o filtered.vcf -s 50 -e 100000 -f 0,filter SVs to a high-confidence set by size and minimum quality with default parameters,variant-calling
tabix,tabix_01,variants.vcf.gz,compress a VCF file with bgzip and create tabix index,utilities
tabix,tabix_02,-p vcf variants.vcf.gz,create tabix index for a bgzipped VCF file,utilities
tabix,tabix_03,-h variants.vcf.gz chr1:1000000-2000000 > chr1_region.vcf,query a specific genomic region from an indexed VCF,utilities
tabix,tabix_04,-p bed regions.bed.gz,create tabix index for a bgzipped BED file,utilities
tabix,tabix_05,-l variants.vcf.gz,list all chromosomes/contigs in an indexed VCF,utilities
tabix,tabix_06,-C variants.vcf.gz,create a CSI index for large genomes with contigs >512 Mb,utilities
tabix,tabix_07,-h variants.vcf.gz chr1:1000000-2000000 chr2:500000-1000000 > multi_region.vcf,query multiple regions at once from an indexed VCF,utilities
tabix,tabix_08,-p gff annotation.gff3.gz,index a bgzipped GFF3 annotation file,utilities
tabix,tabix_09,-h https://example.com/variants.vcf.gz chr1:1000-2000 > remote_region.vcf,fetch a remote indexed VCF region without downloading the whole file,utilities
tabix,tabix_10,-s 1 -b 2 -e 3 custom_format.bed.gz,reindex a tabix file using a custom sequence dictionary order,utilities
tar,tar_01,-czf archive.tar.gz data/,create a gzip-compressed archive of a directory,filesystem
tar,tar_02,-xzf archive.tar.gz,extract a gzip archive into the current directory,filesystem
tar,tar_03,-xf archive.tar.gz -C /opt/myapp/,extract an archive into a specific directory,filesystem
tar,tar_04,-tf archive.tar.gz,list contents of an archive without extracting,filesystem
tar,tar_05,-cjvf backup.tar.bz2 /home/user/documents/,create a verbose bzip2-compressed archive,filesystem
tar,tar_06,-xzf project-1.0.tar.gz --strip-components=1 -C /opt/project/,extract and strip the top-level directory from the archive,filesystem
tar,tar_07,-czf backup.tar.gz project/ --exclude='*.pyc' --exclude='.git',create an archive excluding certain file patterns,filesystem
tar,tar_08,-rf existing.tar newfile.txt,add files to an existing uncompressed archive,filesystem
tar,tar_09,-cJf archive.tar.xz largedir/,create a highly compressed archive using xz,filesystem
tar,tar_10,-xzf archive.tar.gz path/inside/archive/file.txt,extract a single file from an archive,filesystem
trim_galore,trim_galore_01,--paired --quality 20 --length 36 --cores 4 --gzip -o trimmed_output/ R1.fastq.gz R2.fastq.gz,trim adapters and quality-filter paired-end Illumina reads,qc
trim_galore,trim_galore_02,--paired --rrbs --quality 20 --length 20 --cores 4 --gzip -o rrbs_trimmed/ R1.fastq.gz R2.fastq.gz,trim RRBS bisulfite sequencing data,qc
trim_galore,trim_galore_03,--quality 20 --length 36 --cores 4 --gzip -o se_trimmed/ reads.fastq.gz,trim single-end reads with automatic adapter detection,qc
trim_galore,trim_galore_04,--paired --adapter AGATCGGAAGAGCACACGTCT --adapter2 AGATCGGAAGAGCGTCGTGTA --quality 20 --cores 4 --gzip -o custom_trimmed/ R1.fastq.gz R2.fastq.gz,trim with specific adapter sequence for non-standard libraries,qc
trim_galore,trim_galore_05,--paired --quality 20 --length 36 --cores 4 --gzip -o trimmed_output/ R1.fastq.gz R2.fastq.gz,trim adapters and quality-filter paired-end Illumina reads with default parameters,qc
trim_galore,trim_galore_06,--paired --rrbs --quality 20 --length 20 --cores 4 --gzip -o rrbs_trimmed/ R1.fastq.gz R2.fastq.gz --verbose,trim RRBS bisulfite sequencing data with verbose output,qc
trim_galore,trim_galore_07,--quality 20 --length 36 --cores 4 --gzip -o se_trimmed/ reads.fastq.gz -t 4,trim single-end reads with automatic adapter detection using multiple threads,qc
trim_galore,trim_galore_08,--paired --adapter AGATCGGAAGAGCACACGTCT --adapter2 AGATCGGAAGAGCGTCGTGTA --quality 20 --cores 4 --gzip -o custom_trimmed/ R1.fastq.gz R2.fastq.gz,trim with specific adapter sequence for non-standard libraries and write output to a file,qc
trim_galore,trim_galore_09,--paired --quality 20 --length 36 --cores 4 --gzip -o trimmed_output/ R1.fastq.gz R2.fastq.gz --quiet,trim adapters and quality-filter paired-end Illumina reads in quiet mode,qc
trim_galore,trim_galore_10,--paired --rrbs --quality 20 --length 20 --cores 4 --gzip -o rrbs_trimmed/ R1.fastq.gz R2.fastq.gz,trim RRBS bisulfite sequencing data with default parameters,qc
trimmomatic,trimmomatic_01,PE -threads 8 -phred33 R1.fastq.gz R2.fastq.gz R1_paired.fastq.gz R1_unpaired.fastq.gz R2_paired.fastq.gz R2_unpaired.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36,trim adapters and quality-filter paired-end Illumina reads,qc
trimmomatic,trimmomatic_02,SE -threads 4 -phred33 reads.fastq.gz trimmed_reads.fastq.gz ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36,trim single-end reads with quality filtering,qc
trimmomatic,trimmomatic_03,PE -threads 8 -phred33 R1.fastq.gz R2.fastq.gz R1_paired.fastq.gz R1_unpaired.fastq.gz R2_paired.fastq.gz R2_unpaired.fastq.gz ILLUMINACLIP:NexteraPE-PE.fa:2:30:10:8:true LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36,trim Nextera adapters from paired-end reads,qc
trimmomatic,trimmomatic_04,PE -threads 8 -phred33 R1.fastq.gz R2.fastq.gz R1_paired.fastq.gz R1_unpaired.fastq.gz R2_paired.fastq.gz R2_unpaired.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:5 TRAILING:5 SLIDINGWINDOW:4:20 MINLEN:50,aggressive quality trimming for low-quality paired-end data,qc
trimmomatic,trimmomatic_05,PE -threads 8 -phred33 R1.fastq.gz R2.fastq.gz R1_paired.fastq.gz R1_unpaired.fastq.gz R2_paired.fastq.gz R2_unpaired.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36,trim adapters and quality-filter paired-end Illumina reads with default parameters,qc
trimmomatic,trimmomatic_06,SE -threads 4 -phred33 reads.fastq.gz trimmed_reads.fastq.gz ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 --verbose,trim single-end reads with quality filtering with verbose output,qc
trimmomatic,trimmomatic_07,PE -threads 8 -phred33 R1.fastq.gz R2.fastq.gz R1_paired.fastq.gz R1_unpaired.fastq.gz R2_paired.fastq.gz R2_unpaired.fastq.gz ILLUMINACLIP:NexteraPE-PE.fa:2:30:10:8:true LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 -t 4,trim Nextera adapters from paired-end reads using multiple threads,qc
trimmomatic,trimmomatic_08,PE -threads 8 -phred33 R1.fastq.gz R2.fastq.gz R1_paired.fastq.gz R1_unpaired.fastq.gz R2_paired.fastq.gz R2_unpaired.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:5 TRAILING:5 SLIDINGWINDOW:4:20 MINLEN:50 -o output.txt,aggressive quality trimming for low-quality paired-end data and write output to a file,qc
trimmomatic,trimmomatic_09,PE -threads 8 -phred33 R1.fastq.gz R2.fastq.gz R1_paired.fastq.gz R1_unpaired.fastq.gz R2_paired.fastq.gz R2_unpaired.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 --quiet,trim adapters and quality-filter paired-end Illumina reads in quiet mode,qc
trimmomatic,trimmomatic_10,SE -threads 4 -phred33 reads.fastq.gz trimmed_reads.fastq.gz ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36,trim single-end reads with quality filtering with default parameters,qc
trinity,trinity_01,--seqType fq --left R1.fastq.gz --right R2.fastq.gz --max_memory 50G --CPU 16 --output trinity_output/,de novo transcriptome assembly from paired-end RNA-seq reads,rna-seq
trinity,trinity_02,--genome_guided_bam star_aligned.bam --genome_guided_max_intron 10000 --max_memory 50G --CPU 16 --output genome_guided_trinity/,genome-guided Trinity assembly using STAR alignments,rna-seq
trinity,trinity_03,--seqType fq --single reads.fastq.gz --max_memory 32G --CPU 8 --output trinity_se/,de novo assembly from single-end RNA-seq reads,rna-seq
trinity,trinity_04,--seqType fq --left R1.fastq.gz --right R2.fastq.gz --SS_lib_type RF --max_memory 50G --CPU 16 --output stranded_trinity/,Trinity assembly with strand-specific library,rna-seq
trinity,trinity_05,--seqType fq --left R1.fastq.gz --right R2.fastq.gz --max_memory 50G --CPU 16 --output trinity_output/,de novo transcriptome assembly from paired-end RNA-seq reads with default parameters,rna-seq
trinity,trinity_06,--genome_guided_bam star_aligned.bam --genome_guided_max_intron 10000 --max_memory 50G --CPU 16 --output genome_guided_trinity/ --verbose,genome-guided Trinity assembly using STAR alignments with verbose output,rna-seq
trinity,trinity_07,--seqType fq --single reads.fastq.gz --max_memory 32G --CPU 8 --output trinity_se/ -t 4,de novo assembly from single-end RNA-seq reads using multiple threads,rna-seq
trinity,trinity_08,--seqType fq --left R1.fastq.gz --right R2.fastq.gz --SS_lib_type RF --max_memory 50G --CPU 16 --output stranded_trinity/,Trinity assembly with strand-specific library and write output to a file,rna-seq
trinity,trinity_09,--seqType fq --left R1.fastq.gz --right R2.fastq.gz --max_memory 50G --CPU 16 --output trinity_output/ --quiet,de novo transcriptome assembly from paired-end RNA-seq reads in quiet mode,rna-seq
trinity,trinity_10,--genome_guided_bam star_aligned.bam --genome_guided_max_intron 10000 --max_memory 50G --CPU 16 --output genome_guided_trinity/,genome-guided Trinity assembly using STAR alignments with default parameters,rna-seq
truvari,truvari_01,bench -b truth.vcf.gz -c calls.vcf.gz -f reference.fasta -o bench_output --passonly --sizemin 50,benchmark a structural variant caller VCF against a truth set,utilities
truvari,truvari_02,bench -b truth.vcf.gz -c calls.vcf.gz -f reference.fasta -o bench_output --refdist 1000 --pctsize 0.7 --passonly,benchmark with relaxed position tolerance for long-read SV calls,utilities
truvari,truvari_03,collapse -i calls.vcf.gz -o collapsed.vcf --passonly --sizemin 50 --refdist 500,collapse redundant SV calls within a single caller VCF,utilities
truvari,truvari_04,collapse -i multi_caller.vcf.gz -o merged.vcf --chain --keep common,merge SV calls from multiple callers into a consensus VCF,utilities
truvari,truvari_05,refine --reference reference.fasta --regions bench_output/candidate.refine.bed bench_output/,run truvari refine to improve benchmarking accuracy with sequence realignment,utilities
truvari,truvari_06,bench -b truth.vcf.gz -c calls.vcf.gz -f reference.fasta -o bench_output --sizemin 50 --sizemax 10000 --passonly,filter SV VCF to a specific size range before benchmarking,utilities
truvari,truvari_07,bench -b truth.vcf.gz -c calls.vcf.gz -f reference.fasta -o bench_output --passonly --sizemin 50 -t 4,benchmark a structural variant caller VCF against a truth set using multiple threads,utilities
truvari,truvari_08,bench -b truth.vcf.gz -c calls.vcf.gz -f reference.fasta -o bench_output --refdist 1000 --pctsize 0.7 --passonly,benchmark with relaxed position tolerance for long-read SV calls and write output to a file,utilities
truvari,truvari_09,collapse -i calls.vcf.gz -o collapsed.vcf --passonly --sizemin 50 --refdist 500 --quiet,collapse redundant SV calls within a single caller VCF in quiet mode,utilities
truvari,truvari_10,collapse -i multi_caller.vcf.gz -o merged.vcf --chain --keep common,merge SV calls from multiple callers into a consensus VCF with default parameters,utilities
varscan2,varscan2_01,mpileup2snp --min-coverage 8 --min-reads2 2 --min-avg-qual 15 --min-var-freq 0.01 --p-value 0.99 --output-vcf 1 > snps.vcf,call germline SNPs from a tumor or normal sample,variant-calling
varscan2,varscan2_02,somatic normal_pileup.pileup tumor_pileup.pileup --output-snp somatic.snp.vcf --output-indel somatic.indel.vcf --output-vcf 1 --min-coverage 8 --min-var-freq 0.1 --somatic-p-value 0.05,call somatic variants from tumor-normal pair,variant-calling
varscan2,varscan2_03,processSomatic somatic.snp.vcf --min-tumor-freq 0.1 --max-normal-freq 0.05 --p-value 0.05,filter somatic variants for high-confidence calls,variant-calling
varscan2,varscan2_04,mpileup2snp --min-coverage 8 --min-reads2 2 --min-avg-qual 15 --min-var-freq 0.01 --p-value 0.99 --output-vcf 1 > snps.vcf,call germline SNPs from a tumor or normal sample,variant-calling
varscan2,varscan2_05,somatic normal_pileup.pileup tumor_pileup.pileup --output-snp somatic.snp.vcf --output-indel somatic.indel.vcf --output-vcf 1 --min-coverage 8 --min-var-freq 0.1 --somatic-p-value 0.05,call somatic variants from tumor-normal pair with default parameters,variant-calling
varscan2,varscan2_06,processSomatic somatic.snp.vcf --min-tumor-freq 0.1 --max-normal-freq 0.05 --p-value 0.05 --verbose,filter somatic variants for high-confidence calls with verbose output,variant-calling
varscan2,varscan2_07,mpileup2snp --min-coverage 8 --min-reads2 2 --min-avg-qual 15 --min-var-freq 0.01 --p-value 0.99 --output-vcf 1 > snps.vcf,call germline SNPs from a tumor or normal sample,variant-calling
varscan2,varscan2_08,somatic normal_pileup.pileup tumor_pileup.pileup --output-snp somatic.snp.vcf --output-indel somatic.indel.vcf --output-vcf 1 --min-coverage 8 --min-var-freq 0.1 --somatic-p-value 0.05,call somatic variants from tumor-normal pair and write output to a file,variant-calling
varscan2,varscan2_09,processSomatic somatic.snp.vcf --min-tumor-freq 0.1 --max-normal-freq 0.05 --p-value 0.05 --quiet,filter somatic variants for high-confidence calls in quiet mode,variant-calling
varscan2,varscan2_10,mpileup2snp --min-coverage 8 --min-reads2 2 --min-avg-qual 15 --min-var-freq 0.01 --p-value 0.99 --output-vcf 1 > snps.vcf,call germline SNPs from a tumor or normal sample with default parameters,variant-calling
vcfanno,vcfanno_01,-p 8 config.toml input.vcf.gz > annotated.vcf,annotate a VCF with gnomAD allele frequencies,variant-annotation
vcfanno,vcfanno_02,-p 16 clinvar_bed_config.toml input.vcf.gz | bgzip > annotated.vcf.gz,annotate variants with ClinVar pathogenicity and a custom BED file,variant-annotation
vcfanno,vcfanno_03,-p 8 regions_config.toml input.vcf.gz > flagged.vcf,add a flag for variants overlapping a BED region of interest,variant-annotation
vcfanno,vcfanno_04,-p 8 bam_config.toml input.vcf.gz > coverage_annotated.vcf,compute mean coverage at each variant position from a BAM file,variant-annotation
vcfanno,vcfanno_05,-p 8 -lua filters.lua combined_config.toml input.vcf.gz > filtered_annotated.vcf,use a Lua postannotation to combine scores into a final filter,variant-annotation
vcfanno,vcfanno_06,-p 8 cosmic_config.toml input.vcf.gz | bcftools view -f PASS > cosmic_annotated.vcf,annotate indels with COSMIC and output only annotated variants,variant-annotation
vcfanno,vcfanno_07,-p 8 config.toml input.vcf.gz > annotated.vcf,annotate a VCF with gnomAD allele frequencies,variant-annotation
vcfanno,vcfanno_08,-p 16 clinvar_bed_config.toml input.vcf.gz | bgzip > annotated.vcf.gz,annotate variants with ClinVar pathogenicity and a custom BED file,variant-annotation
vcfanno,vcfanno_09,-p 8 regions_config.toml input.vcf.gz > flagged.vcf,add a flag for variants overlapping a BED region of interest,variant-annotation
vcfanno,vcfanno_10,-p 8 bam_config.toml input.vcf.gz > coverage_annotated.vcf,compute mean coverage at each variant position from a BAM file with default parameters,variant-annotation
vcftools,vcftools_01,--vcf variants.vcf --maf 0.05 --max-missing 0.9 --recode --recode-INFO-all --out filtered_variants,filter VCF by minor allele frequency and missingness,variant-calling
vcftools,vcftools_02,--vcf variants.vcf --site-pi --TajimaD 10000 --out popgen_stats,calculate per-site nucleotide diversity and Tajima's D statistics,variant-calling
vcftools,vcftools_03,--vcf variants.vcf --remove-indels --min-alleles 2 --max-alleles 2 --minDP 10 --recode --recode-INFO-all --out snps_only,filter VCF to biallelic SNPs with minimum depth,variant-calling
vcftools,vcftools_04,--vcf variants.vcf --weir-fst-pop pop1_samples.txt --weir-fst-pop pop2_samples.txt --fst-window-size 50000 --out fst_results,compute pairwise FST between two populations,variant-calling
vcftools,vcftools_05,--vcf variants.vcf --plink --out plink_dataset,convert VCF to PLINK format for downstream analysis,variant-calling
vcftools,vcftools_06,--vcf variants.vcf --maf 0.05 --max-missing 0.9 --recode --recode-INFO-all --out filtered_variants --verbose,filter VCF by minor allele frequency and missingness with verbose output,variant-calling
vcftools,vcftools_07,--vcf variants.vcf --site-pi --TajimaD 10000 --out popgen_stats -t 4,calculate per-site nucleotide diversity and Tajima's D statistics using multiple threads,variant-calling
vcftools,vcftools_08,--vcf variants.vcf --remove-indels --min-alleles 2 --max-alleles 2 --minDP 10 --recode --recode-INFO-all --out snps_only -o output.txt,filter VCF to biallelic SNPs with minimum depth and write output to a file,variant-calling
vcftools,vcftools_09,--vcf variants.vcf --weir-fst-pop pop1_samples.txt --weir-fst-pop pop2_samples.txt --fst-window-size 50000 --out fst_results --quiet,compute pairwise FST between two populations in quiet mode,variant-calling
vcftools,vcftools_10,--vcf variants.vcf --plink --out plink_dataset,convert VCF to PLINK format for downstream analysis with default parameters,variant-calling
vep,vep_01,--input_file variants.vcf --output_file annotated.vcf --vcf --cache --dir_cache /path/to/cache/ --assembly GRCh38 --fork 8 --offline,annotate VCF variants with VEP using offline cache,variant-annotation
vep,vep_02,--input_file variants.vcf --output_file annotated.vcf --vcf --cache --dir_cache /path/to/cache/ --assembly GRCh38 --everything --fork 8 --offline,annotate with all standard functional predictions,variant-annotation
vep,vep_03,--input_file variants.vcf --output_file annotated.vcf --vcf --cache --dir_cache /path/to/cache/ --assembly GRCh38 --pick --fork 8 --offline,annotate and pick single most severe consequence per variant,variant-annotation
vep,vep_04,--input_file variants.vcf --output_file annotated.vcf --vcf --cache --dir_cache /path/to/cache/ --assembly GRCh38 --af_gnomad --fork 8 --offline,annotate with gnomAD population frequencies,variant-annotation
vep,vep_05,--input_file variants.vcf --output_file annotated.vcf --vcf --cache --dir_cache /path/to/cache/ --assembly GRCh38 --fork 8 --offline,annotate VCF variants with VEP using offline cache with default parameters,variant-annotation
vep,vep_06,--input_file variants.vcf --output_file annotated.vcf --vcf --cache --dir_cache /path/to/cache/ --assembly GRCh38 --everything --fork 8 --offline --verbose,annotate with all standard functional predictions with verbose output,variant-annotation
vep,vep_07,--input_file variants.vcf --output_file annotated.vcf --vcf --cache --dir_cache /path/to/cache/ --assembly GRCh38 --pick --fork 8 --offline -t 4,annotate and pick single most severe consequence per variant using multiple threads,variant-annotation
vep,vep_08,--input_file variants.vcf --output_file annotated.vcf --vcf --cache --dir_cache /path/to/cache/ --assembly GRCh38 --af_gnomad --fork 8 --offline,annotate with gnomAD population frequencies and write output to a file,variant-annotation
vep,vep_09,--input_file variants.vcf --output_file annotated.vcf --vcf --cache --dir_cache /path/to/cache/ --assembly GRCh38 --fork 8 --offline --quiet,annotate VCF variants with VEP using offline cache in quiet mode,variant-annotation
vep,vep_10,--input_file variants.vcf --output_file annotated.vcf --vcf --cache --dir_cache /path/to/cache/ --assembly GRCh38 --everything --fork 8 --offline,annotate with all standard functional predictions with default parameters,variant-annotation
verkko,verkko_01,--hifi hifi_reads.fastq.gz -d assembly_out --threads 64,assemble a genome using only HiFi reads,assembly
verkko,verkko_02,--hifi hifi_reads.fastq.gz --ont ont_reads.fastq.gz -d hybrid_assembly --threads 64,assemble a genome with both HiFi and ONT reads for maximum continuity,assembly
verkko,verkko_03,--hifi hifi_reads.fastq.gz --ont ont_reads.fastq.gz --hap-kmers maternal.meryl paternal.meryl -d trio_assembly --threads 64,perform haplotype-resolved assembly with trio binning,assembly
verkko,verkko_04,"--hifi hifi_reads.fastq.gz --ont ont_reads.fastq.gz -d assembly_out --threads 4 --snakeopts ""--cluster 'sbatch -c {threads} --mem {resources.mem_gb}G' --jobs 50""",run Verkko on a cluster using Slurm via Snakemake,assembly
verkko,verkko_05,--hifi hifi_reads.fastq.gz --ont ont_reads.fastq.gz -d assembly_out --threads 64 --resume,resume an interrupted Verkko assembly,assembly
verkko,verkko_06,--ont ont_reads.fastq.gz -d ont_assembly --threads 64,assemble with ONT reads only (no HiFi),assembly
verkko,verkko_07,--hifi hifi_reads.fastq.gz -d assembly_out --threads 64,assemble a genome using only HiFi reads using multiple threads,assembly
verkko,verkko_08,--hifi hifi_reads.fastq.gz --ont ont_reads.fastq.gz -d hybrid_assembly --threads 64 -o output.txt,assemble a genome with both HiFi and ONT reads for maximum continuity and write output to a file,assembly
verkko,verkko_09,--hifi hifi_reads.fastq.gz --ont ont_reads.fastq.gz --hap-kmers maternal.meryl paternal.meryl -d trio_assembly --threads 64 --quiet,perform haplotype-resolved assembly with trio binning in quiet mode,assembly
verkko,verkko_10,"--hifi hifi_reads.fastq.gz --ont ont_reads.fastq.gz -d assembly_out --threads 4 --snakeopts ""--cluster 'sbatch -c {threads} --mem {resources.mem_gb}G' --jobs 50""",run Verkko on a cluster using Slurm via Snakemake with default parameters,assembly
wget,wget_01,https://example.com/files/data.tar.gz,download a file and save with its remote filename,networking
wget,wget_02,-O /data/myfile.csv https://example.com/export/data.csv,download a file with a custom local filename,networking
wget,wget_03,-c https://example.com/large-file.iso,resume an interrupted download,networking
wget,wget_04,-b -q https://example.com/large-dataset.tar.gz,download in the background with logging,networking
wget,wget_05,--tries=5 --timeout=30 --wait=2 https://example.com/file.tar.gz,download with retry and timeout settings,networking
wget,wget_06,-r -l 2 -np -P ./mirror https://example.com/docs/,mirror a website section without going to parent directories,networking
wget,wget_07,-r -l 3 -np -A '.pdf' https://example.com/papers/,download only PDF files recursively from a site,networking
wget,wget_08,-i urls.txt -P downloads/,download a list of URLs from a file,networking
wget,wget_09,--post-data='query=search+term' -O result.html https://example.com/search,send a POST request and download the response,networking
wget,wget_10,--user-agent='Mozilla/5.0' -O page.html https://example.com/page,download with a custom User-Agent header,networking
whatshap,whatshap_01,phase --output phased.vcf.gz --reference reference.fa variants.vcf.gz long_reads.bam,phase variants using long reads (ONT/PacBio),variant-calling
whatshap,whatshap_02,phase --output phased.vcf.gz --reference reference.fa variants.vcf.gz illumina.bam,phase variants using Illumina short reads,variant-calling
whatshap,whatshap_03,haplotag --output haplotagged.bam --reference reference.fa phased.vcf.gz sorted.bam,tag reads with haplotype information after phasing,variant-calling
whatshap,whatshap_04,stats phased.vcf.gz,compute phasing statistics from a phased VCF,variant-calling
whatshap,whatshap_05,phase --output phased.vcf.gz --reference reference.fa variants.vcf.gz long_reads.bam,phase variants using long reads (ONT/PacBio) with default parameters,variant-calling
whatshap,whatshap_06,phase --output phased.vcf.gz --reference reference.fa variants.vcf.gz illumina.bam --verbose,phase variants using Illumina short reads with verbose output,variant-calling
whatshap,whatshap_07,haplotag --output haplotagged.bam --reference reference.fa phased.vcf.gz sorted.bam -t 4,tag reads with haplotype information after phasing using multiple threads,variant-calling
whatshap,whatshap_08,stats phased.vcf.gz -o output.txt,compute phasing statistics from a phased VCF and write output to a file,variant-calling
whatshap,whatshap_09,phase --output phased.vcf.gz --reference reference.fa variants.vcf.gz long_reads.bam --quiet,phase variants using long reads (ONT/PacBio) in quiet mode,variant-calling
whatshap,whatshap_10,phase --output phased.vcf.gz --reference reference.fa variants.vcf.gz illumina.bam,phase variants using Illumina short reads with default parameters,variant-calling
wtdbg2,wtdbg2_01,-x ont -g 5m -i reads.fastq.gz -fo assembly -t 16 && wtpoa-cns -t 16 -i assembly.ctg.lay.gz -fo assembly.ctg.fa,assemble genome from Oxford Nanopore reads,assembly
wtdbg2,wtdbg2_02,-x ccs -g 3g -i hifi_reads.fastq.gz -fo hifi_assembly -t 32 && wtpoa-cns -t 32 -i hifi_assembly.ctg.lay.gz -fo hifi_assembly.ctg.fa,assemble genome from PacBio HiFi reads,assembly
wtdbg2,wtdbg2_03,-x rs -g 4m -i clr_reads.fastq.gz -fo clr_assembly -t 8 && wtpoa-cns -t 8 -i clr_assembly.ctg.lay.gz -fo clr_assembly.ctg.fa,assemble bacterial genome from PacBio CLR reads,assembly
wtdbg2,wtdbg2_04,-x ont -g 5m -i reads.fastq.gz -fo assembly -t 16 && wtpoa-cns -t 16 -i assembly.ctg.lay.gz -fo assembly.ctg.fa --quiet,assemble genome from Oxford Nanopore reads in quiet mode,assembly
wtdbg2,wtdbg2_05,-x ccs -g 3g -i hifi_reads.fastq.gz -fo hifi_assembly -t 32 && wtpoa-cns -t 32 -i hifi_assembly.ctg.lay.gz -fo hifi_assembly.ctg.fa,assemble genome from PacBio HiFi reads with default parameters,assembly
wtdbg2,wtdbg2_06,-x rs -g 4m -i clr_reads.fastq.gz -fo clr_assembly -t 8 && wtpoa-cns -t 8 -i clr_assembly.ctg.lay.gz -fo clr_assembly.ctg.fa --verbose,assemble bacterial genome from PacBio CLR reads with verbose output,assembly
wtdbg2,wtdbg2_07,-x ont -g 5m -i reads.fastq.gz -fo assembly -t 16 && wtpoa-cns -t 16 -i assembly.ctg.lay.gz -fo assembly.ctg.fa,assemble genome from Oxford Nanopore reads using multiple threads,assembly
wtdbg2,wtdbg2_08,-x ccs -g 3g -i hifi_reads.fastq.gz -fo hifi_assembly -t 32 && wtpoa-cns -t 32 -i hifi_assembly.ctg.lay.gz -fo hifi_assembly.ctg.fa -o output.txt,assemble genome from PacBio HiFi reads and write output to a file,assembly
wtdbg2,wtdbg2_09,-x rs -g 4m -i clr_reads.fastq.gz -fo clr_assembly -t 8 && wtpoa-cns -t 8 -i clr_assembly.ctg.lay.gz -fo clr_assembly.ctg.fa --quiet,assemble bacterial genome from PacBio CLR reads in quiet mode,assembly
wtdbg2,wtdbg2_10,-x ont -g 5m -i reads.fastq.gz -fo assembly -t 16 && wtpoa-cns -t 16 -i assembly.ctg.lay.gz -fo assembly.ctg.fa,assemble genome from Oxford Nanopore reads with default parameters,assembly