Fastars
Pure-Rust implementation of QC and trimming for short and long reads.
Inspired by fastp and fastplong, fastars combines both short-read and long-read processing capabilities in a single binary. Designed for high-throughput servers and large-scale parallel processing with significantly reduced memory footprint while maintaining comparable performance to fastp.
[!caution]
This project is AI-aided.
[!warning]
It is still under development and tested with limited size of samples.
Key Features
- Unified Tool: Process both short reads (Illumina) and long reads (PacBio/ONT) with one binary
- Pure Rust: No C/C++ dependencies in core logic, safe and portable
- Memory Efficient: Uses 40-98% less memory than fastp - ideal for shared servers
- High Performance: Matches or exceeds fastp speed at 4+ threads (up to 1.6x faster)
- fastp/fastplong Compatible: Familiar CLI interface for easy migration
- Auto Mode Detection: Automatically detects short/long reads based on read length
Performance
Benchmarked against fastp v1.0.1 with SRR21931795 (538K paired-end reads, ~186MB compressed).
"The metrics represent the average of five runs, following a warm-up phase.
| Threads |
fastars Time |
fastars Mem |
fastp Time |
fastp Mem |
Speedup |
Mem Saved |
| 1 |
22.34s |
23MB * |
16.81s |
1,151MB |
0.75x |
98% * |
| 4 |
7.50s |
597MB |
7.66s |
1,253MB |
1.02x |
52% |
| 8 |
4.94s |
250MB |
6.80s |
1,312MB |
1.38x |
81% |
| 14 |
4.28s |
215MB |
7.00s |
1,378MB |
1.64x |
84% |
| 16 |
4.62s |
178MB |
7.05s |
1,411MB |
1.53x |
87% |
* This is due to the single-thread mode acts in a different way than the others.
Summary:
- 4+ threads: fastars matches or beats fastp
- 8-16 threads: 1.4x-1.6x faster with 80%+ less memory
- Best for: Multi-core servers and memory-constrained environments
fastp Compatibility Verification
fastars v0.7.0 produces 100% identical output sequences to fastp v1.0.1 when using the same trimming parameters.
Verification Test
Dataset: SRR29111767 (1.4M paired-end reads)
Parameters: -3 --cut_mean_quality 20 --disable_adapter_trimming -G
| Metric |
fastars |
fastp |
Match |
| Reads passed |
1,354,558 |
1,354,558 |
✓ |
| R1 sequences |
677,279 |
677,279 |
100% |
| R2 sequences |
677,279 |
677,279 |
100% |
Sequence-level verification: All 677,279 output sequences are byte-for-byte identical between fastars and fastp for both R1 and R2.
Algorithm Compatibility
fastars implements fastp's exact trimming algorithms:
- Sliding window quality trimming: Identical window calculation and trim position logic
- Trailing N removal: After quality trimming, trailing N bases are removed (fastp behavior)
- Leading N removal: After front quality trimming, leading N bases are removed (fastp behavior)
This ensures that fastars can be used as a drop-in replacement for fastp with identical results.
Installation
From crates.io
cargo install fastars
cargo install fastars --no-default-features --features rust_backend
From source
git clone https://github.com/necoli1822/fastars
cd fastars
cargo build --release
./target/release/fastars --help
Usage
Auto Mode (Recommended)
fastars -i reads.fq.gz -o filtered.fq.gz
Short-Read Mode (Illumina)
fastars -i reads.fq.gz -o filtered.fq.gz --mode short
fastars -i R1.fq.gz -I R2.fq.gz -o out_R1.fq.gz -O out_R2.fq.gz
fastars -i R1.fq.gz -I R2.fq.gz \
-o out_R1.fq.gz -O out_R2.fq.gz \
-j report.json -h report.html
Long-Read Mode (PacBio/ONT)
fastars -i long_reads.fq.gz -o filtered.fq.gz --mode long
fastars -i long_reads.fq.gz -o filtered.fq.gz \
-s "ATCTCTCTCAACAACAACAAC" \
-E "ATCTCTCTCAACAACAACAAC"
fastars -i long_reads.fq.gz -o filtered.fq.gz -N
fastars -i long_reads.fq.gz -o filtered.fq.gz -b
Quality Trimming
fastars -i reads.fq.gz -o out.fq.gz -5 -3
fastars -i reads.fq.gz -o out.fq.gz -5 -3 --cut_mean_quality 20
Adapter Trimming
fastars -i R1.fq.gz -I R2.fq.gz -o out1.fq.gz -O out2.fq.gz --detect_adapter_for_pe
fastars -i R1.fq.gz -o out.fq.gz -a AGATCGGAAGAGC -A AGATCGGAAGAGC
Poly-X Trimming
fastars -i reads.fq.gz -o out.fq.gz -g
fastars -i reads.fq.gz -o out.fq.gz -x
UMI Processing Example
fastars -i reads.fq.gz -o out.fq.gz \
-U --umi_loc read1 --umi_len 8 --umi_prefix UMI
Paired-End Merging & Correction
fastars -i R1.fq.gz -I R2.fq.gz \
-m --merged_out merged.fq.gz
fastars -i R1.fq.gz -I R2.fq.gz \
-o out1.fq.gz -O out2.fq.gz -c
Deduplication
fastars -i reads.fq.gz -o out.fq.gz -D
Output Splitting Options
fastars -i reads.fq.gz -o out.fq.gz --split 4
CLI Options (fastp/fastplong Compatible)
Input/Output
| Option |
Description |
-i, --in1 |
Read 1 input file (required) |
-I, --in2 |
Read 2 input file (paired-end) |
--interleaved_in |
Input is interleaved paired-end data |
-o, --out1 |
Read 1 output file |
-O, --out2 |
Read 2 output file |
--stdout |
Stream output to stdout |
--stdin_format |
Input format for stdin (auto/gzip/plain) |
-j, --json |
JSON report output |
-h, --html |
HTML report output |
-R, --report_title |
Report title (default: "fastars report") |
--failed_out |
Failed reads output file |
--unpaired1_out |
Unpaired read 1 output file |
--unpaired2_out |
Unpaired read 2 output file |
--fix_mgi_id |
Fix MGI sequencer IDs to Illumina format |
--dont_overwrite |
Do not overwrite existing output files |
-w, --thread |
Worker threads (0 = auto) |
-z, --compression |
Gzip level 1-9 (default: 4) |
Mode Selection
| Option |
Description |
--mode |
Processing mode: auto, short, long (default: auto) |
--mode_detect_sample |
Reads to sample for mode detection (default: 100) |
--mode_detect_threshold |
Length threshold for mode detection (default: 500bp) |
Quality Trimming
| Option |
Description |
-5, --cut_front |
Trim from 5' end |
--cut_front_window_size |
Window size for cut_front |
--cut_front_mean_quality |
Mean quality for cut_front |
-3, --cut_tail |
Trim from 3' end |
--cut_tail_window_size |
Window size for cut_tail |
--cut_tail_mean_quality |
Mean quality for cut_tail |
--cut_right |
Scan from 5' to 3', trim when quality drops |
--cut_right_window_size |
Window size for cut_right |
--cut_right_mean_quality |
Mean quality for cut_right |
--cut_window_size |
Sliding window size (default: 4) |
--cut_mean_quality |
Quality threshold (default: 15) |
Adapter Trimming
| Option |
Description |
-a, --adapter_sequence |
R1 adapter sequence |
-A, --adapter_sequence_r2 |
R2 adapter sequence |
--adapter_fasta |
FASTA file with adapter sequences |
--detect_adapter_for_pe |
Auto-detect adapters |
--disable_adapter_trimming |
Disable adapter trimming |
Long-Read Specific (fastplong compatible)
| Option |
Description |
-s, --start_adapter |
5' adapter for long reads |
-E, --end_adapter |
3' adapter for long reads |
-d, --distance_threshold |
Adapter distance threshold (default: 0.25) |
--trimming_extension |
Extend trimming past adapter (default: 10) |
-N, --mask |
Quality masking mode |
--mask_window_size |
Window size for masking (default: 50) |
--mask_mean_quality |
Mean quality for masking (default: 10) |
-b, --break_reads |
Break reads at low-quality regions |
--break_window_size |
Window size for breaking (default: 100) |
--break_mean_quality |
Mean quality for breaking (default: 10) |
Quality Filtering
| Option |
Description |
-Q, --disable_quality_filtering |
Disable quality filtering |
-q, --qualified_quality_phred |
Min quality for a base (default: 15) |
-u, --unqualified_percent_limit |
Max % unqualified bases (default: 40) |
-e, --average_qual |
Min average quality (default: 0) |
Length Filtering
| Option |
Description |
-L, --disable_length_filtering |
Disable length filtering |
-l, --length_required |
Minimum length (default: 15) |
--length_limit |
Maximum length (0 = no limit) |
--max_len1 |
Max length for R1 (truncate) |
--max_len2 |
Max length for R2 (truncate) |
N Filtering
| Option |
Description |
-n, --n_base_limit |
Max N bases (default: 5) |
--n_percent_limit |
Max N content as % (long mode only) |
Index Barcode Filtering
| Option |
Description |
--filter_by_index1 |
Filter by index 1 barcode |
--filter_by_index2 |
Filter by index 2 barcode |
--filter_by_index_threshold |
Max mismatches for index filter (default: 0) |
Complexity Filtering
| Option |
Description |
-y, --low_complexity_filter |
Enable complexity filter |
-Y, --complexity_threshold |
Complexity threshold 0-100 (default: 30) |
Poly-X Trimming
| Option |
Description |
-g, --trim_poly_g |
Trim poly-G tails |
--poly_g_min_len |
Min poly-G length (default: 10) |
-G, --disable_trim_poly_g |
Disable poly-G trimming |
-x, --trim_poly_x |
Trim poly-X tails |
--poly_x_min_len |
Min poly-X length (default: 10) |
Global Trimming
| Option |
Description |
-f, --trim_front1 |
Trim N bases from front of R1 |
-t, --trim_tail1 |
Trim N bases from tail of R1 |
-F, --trim_front2 |
Trim N bases from front of R2 |
-T, --trim_tail2 |
Trim N bases from tail of R2 |
Deduplication
| Option |
Description |
-D, --dedup |
Enable deduplication |
--dup_calc_accuracy |
Accuracy level 1-6 (default: 3) |
--dont_eval_duplication |
Disable duplication rate evaluation |
Overrepresentation Analysis
| Option |
Description |
-p, --overrepresentation_analysis |
Enable analysis (default: on) |
-P, --overrepresentation_sampling |
Sampling rate (default: 20) |
UMI Processing
| Option |
Description |
-U, --umi |
Enable UMI processing |
--umi_loc |
UMI location: read1, read2, index, per_index |
--umi_len |
UMI length (required if --umi enabled) |
--umi_prefix |
Prefix added before UMI (default: empty) |
--umi_skip |
Skip first N bases before UMI (default: 0) |
--umi_separator |
Separator between name and UMI (default: ":") |
Paired-end Merging
| Option |
Description |
-m, --merge |
Enable PE read merging |
--merged_out |
Output file for merged reads |
--out_unmerged1 |
Output for unmerged R1 |
--out_unmerged2 |
Output for unmerged R2 |
--merge_min_overlap |
Min overlap for merging (default: 30) |
--merge_max_mismatch_ratio |
Max mismatch ratio (default: 0.1) |
--merge_correct_mismatches |
Correct mismatches in overlap (default: true) |
Base Correction
| Option |
Description |
-c, --correction |
Enable overlap-based correction |
--overlap_len_require |
Min overlap for correction (default: 30) |
--overlap_diff_limit |
Max mismatches for correction (default: 5) |
--overlap_diff_percent_limit |
Max mismatch % (default: 5.0%) |
--allow_gap_overlap_trimming |
Allow gaps in overlap detection |
--overlapped_out |
Output only overlapped region |
Output Splitting
| Option |
Description |
--split |
Split output into N files |
--split_by_lines |
Split by number of lines (4 lines = 1 read) |
--split_prefix_digits |
Digits in split suffix (default: 4) |
Other
| Option |
Description |
-6, --phred64 |
Phred64 quality encoding |
-V, --verbose |
Verbose output |
--reads_to_process |
Number of reads to process (0 = all) |
License
MIT License. See LICENSE for details.
Author
Sunju Kim (n.e.coli.1822@gmail.com)
Acknowledgments
Inspired by fastp and fastplong by Shifu Chen.