rsomics-rereplicate
Expand ;size=N abundance annotations back into N individual FASTA records.
The inverse of dereplication (rsomics-derep / vsearch --derep_fulllength).
Usage
rsomics-rereplicate <INPUT> -o <OUTPUT> [OPTIONS]
Arguments:
<INPUT> Input FASTA file (use `-` for stdin)
Options:
-o, --output <OUTPUT> Output FASTA file (use `-` for stdout)
--sizeout Append `;size=1` to each emitted copy
--fasta-width <N> Sequence line wrap width; 0 = no wrap [default: 80]
-t, --threads <N> Thread count (currently single-threaded; flag accepted)
-q, --quiet Suppress progress messages
--json Machine-readable JSON output on stderr
Behaviour
Matches vsearch --rereplicate v2.31.0 byte-for-byte:
- Each record with
;size=Nemits N copies in input order. - The
;size=Nannotation is stripped from output headers by default. - With
--sizeout, each copy receives;size=1appended. - Records with no
;size=annotation are treated as abundance 1 (warned on stderr). - No
minseqlengthormaxseqlengthfiltering (vsearch does not apply these here). - Sequence bytes preserved exactly — case and U characters unchanged.
- FASTA output wraps at 80 columns (configurable via
--fasta-width).
Performance
On 50 000 amplicons / 8.7 MB input (aarch64 macOS, Apple M2, hyperfine 15 runs):
| Tool | Mean | Ratio |
|---|---|---|
| vsearch 2.31.0 | 141.3 ms ± 5.9 ms | 1.00× |
| rsomics-rereplicate 0.1.0 | 38.8 ms ± 0.7 ms | 3.65× |
Output: 88 MB (50 500 reads, byte-exact vs vsearch).
Origin
This crate is an independent Rust reimplementation of vsearch --rereplicate
based on:
- The vsearch source code (BSD-2-Clause) —
src/rereplicate.ccandsrc/fasta.ccfrom https://github.com/torognes/vsearch - Black-box behaviour testing against the upstream binary (vsearch 2.31.0)
The vsearch source is dual-licensed (GPL-3 or BSD-2-Clause). We read the BSD-2-Clause copy; our Rust implementation is MIT OR Apache-2.0.
License: MIT OR Apache-2.0
Upstream credit: vsearch https://github.com/torognes/vsearch (BSD-2-Clause / GPL-3)