Crate sourmash[][src]

Expand description

Compute, compare and search signatures for nucleotide (DNA/RNA) and protein sequences.

sourmash is a command-line tool and Python library for computing MinHash sketches from DNA sequences, comparing them to each other, and plotting the results. This allows you to estimate sequence similarity between even very large data sets quickly and accurately.

sourmash can be used to quickly search large databases of genomes for matches to query genomes and metagenomes.

sourmash also includes k-mer based taxonomic exploration and classification routines for genome and metagenome analysis. These routines can use the NCBI taxonomy but do not depend on it in any way. Documentation and further examples for each module can be found in the module descriptions below.


pub use errors::SourmashError as Error;


Foreign Function Interface for calling sourmash from a C API

Indexing structures for fast similarity search

Compressed representations of genomic data