Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Deacon
Search and deplete FASTA/FASTQ files and streams at gigabases per second using accelerated minimizer matching. Default parameters balance sensitivity and specificity for the application of microbial metagenomic host depletion, for which a validated prebuilt index is available. Classification sensitivity, specificity and memory requirements may be tuned by varying k-mer length (-k), window size (-w), and the two match thresholds (-a and -r). Minimizer k and w are chosen at index time, while the match thresholds can be chosen at filter time. To be considered a match, sequences must meet both an absolute threshold (-a, default 2 minimizer hits) and a relative threshold (-r, default 0.01 or 1% of minimizers). Paired sequences are fully supported: a match in either mate causes both mates in the pair to be retained or discarded; deacon filter retains only matches by default (search mode) and discards matches in --deplete mode. Deacon reports live filtering performance during execution and optionally writes a JSON --summary upon completion. Sequences can optionally be renamed using --rename for privacy and smaller file sizes. Gzip, zst and xz compression formats are natively supported and detected by file extension. Other source formats can be converted to FASTA or FASTQ and piped into Deacon using stdin.
Deacon can filter compressed long reads at ~500Mbp/s, paired short reads at ~250Mbp/s, and index a human genome in 20s on Apple M1 Pro. x86_64 performance is comparable, and 3Gbp/s was recorded with uncompressed long reads on a 32 core amd64 system. For best performance, compressing reads with Zstandard (zstd --long) rather than Gzip is recommended. Peak memory usage during filtering is ~5GB for the default panhuman index.
Benchmarks for panhuman host depletion of complex microbial metagenomes are described in a preprint. Among tested approaches, Deacon with the panhuman-1 (k=31, w=15) index exhibited the highest balanced accuracy for both long and short simulated reads. Deacon was however less specific than Hostile for short reads.
[!IMPORTANT] Deacon is actively developed. Take note of software and index version(s) used in order to guarantee reproducibility of your results. Carefully review the CHANGELOG when updating. Versions 0.7.0 and 0.11.0 introduced backwards incompatible index formats. Please report any problems you encounter by creating an issue or using the email address in my profile.
Install
cargo 
conda/mamba/pixi 
Usage
Indexing
Use deacon index build to quickly build custom indexes. For human host depletion, the prebuilt validated panhuman index is recommended, available for download below from Zenodo or fast object storage provided by the ModMedMicro research unit at the University of Oxford.
deacon index build chm13v2.fa > chm13v2.k31w15.idx
# Discard low complexity minimizers during indexing
deacon index build -e 0.5 chm13v2.fa > human.k31w15e5.idx
Prebuilt indexes
| Name/URL | Composition | Minimizers | Subtracted minimizers | Size | Date |
|---|---|---|---|---|---|
| panhuman-1 (k=31, w=15) Cloud, Zenodo | HPRC Year 1 ∪ CHM13v2.0 ∪ GRCh38.p14 - bacteria (FDA-ARGOS) - viruses (RefSeq) |
409,907,949 | 20,671 (0.0050%) | 3.3GB | 2025-10 |
| panmouse-1 (k=31, w=15, e=0.5) Cloud | GRCm39 ∪ PRJEB47108 - bacteria (FDA-ARGOS) - viruses (RefSeq) |
548,328,389 | 8,243 (0.0015%) | 4.6GB | 2025-08 |
Index compatibility
Deacon 0.11.0 and above uses index format version 3. Using version 3 indexes with older Deacon versions and vice versa triggers an error. Prebuilt indexes in legacy formats are therefore archived in object storage. Should you wish to download indexes in legacy formats, replace the /3/ in any prebuilt index download URL with either /2/ or /1/ accordingly.
-
Deacon
0.11.0and above uses index format version3 -
Deacon
0.7.0through to0.10.0used index format version2 -
Deacon
0.1.0through to0.6.0used index format version1
Filtering
The main command deacon filter accepts an index path followed by up to two query FASTA/FASTQ file paths, depending on whether query sequences originate from stdin, a single file, or paired input files. Paired queries are supported as either separate files or interleaved stdin, and written interleaved to either stdout or file, or else to separate paired output files. For paired reads, distinct minimizer hits originating from either mate are counted. By default, query sequences must meet both an absolute threshold of 2 minimizer hits (-a 2) and a relative threshold of 1% of minimizers (-r 0.01) to pass the filter. Filtering can be inverted for e.g. host depletion using the --deplete (-d) flag. Gzip, Zstandard, and xz compression formats are detected automatically by file extension.
Examples
# Keep only human sequences
# Host depletion using the panhuman-1 index and default thresholds
# Max sensitivity with absolute threshold of 1 and no relative threshold
# More specific 10% relative match threshold
# Stdin and stdout
|
# Faster Zstandard compression
# Fast gzip with pigz
|
# Paired reads
|
# Save summary JSON
# Replace read headers with incrementing integers
# Only look for minimizer hits inside the first 1000bp per record
# Debug mode: see sequences with minimizer hits in stderr
Command line reference
Filtering
<INDEX> Path
)
)
; )
)
)
)
)
)
& ; )
Indexing
)
)
)
<INPUT> Path )
)
)
)
)
Building custom indexes
Building custom Deacon indexes is fast. Nevertheless, when indexing many large genomes, it may be worthwhile separately indexing and subsequently combining indexes into one succinct index. Combine distinct minimizers from multiple indexes using deacon index union. Similarly, use deacon index diff to subtract the minimizers contained in one index from another. This can be helpful for e.g. eliminating shared minimizers between the target and host genomes when building custom (non-human) indexes for host depletion.
- Use
deacon index union 1.idx 2.idx 3.idx… > 1+2+3.idxto succinctly combine two (or more!) deacon indexes. - Use
deacon index diff 1.idx 2.idx > 1-2.idxto subtract minimizers in 1.idx from 2.idx. Useful for masking out shared minimizer content between e.g. target and host genomes. - From version
0.7.0,deacon index diffalso supports subtracting minimizers from an index using a fastx file or stream, e.g.deacon index diff 1.idx 2.fa.gz > 1-2.idxorzcat *.fa.gz | deacon index diff 1.idx - > 1-2.idx.
Filtering summary statistics
Use -s summary.json to save detailed filtering statistics:
Server mode
From version 0.11.0, it is possible to eliminate index loading overhead at the start of each filter operation by preloading the index in the memory of a local server process. Subsequent filtering commands with --use-server are executed by the server process using a UNIX socket. Having started a server process, the index of the first filtering command it receives persists in memory for the life of that server process, enabling subsequent filter commands to be served rapidly without hash set construction overhead.
# Start the server
# The first filter command loads the index as usual
# Subsequent filter commands use the existing index stored in memory
# Stop the server
Citation
Bede Constantinides, John Lees, Derrick W Crook. "Deacon: fast sequence filtering and contaminant depletion" bioRxiv 2025.06.09.658732, https://doi.org/10.1101/2025.06.09.658732
Please also consider citing the SimdMinimizers paper:
Ragnar Groot Koerkamp, Igor Martayan. "SimdMinimizers: Computing random minimizers, fast" bioRxiv 2025.01.27.634998, https://doi.org/10.1101/2025.01.27.634998