[−][src]Crate chordclust
Chordclust implements similarity clustering using rust-bio.
Algorithm
The algorithm is a greedy search, similar to what is explained in https://www.drive5.com/usearch/manual/uclust_algo.html. It uses similarity instead of identity (for now)
- Sort by sequence length (bigger is first).
- For each sequence, compare it with the database of centroids:
- If identity with best match > T: add to cluster of best match.
- Else: form a new cluster.
Functions
cluster_similarity | Cluster a buffer by similarity. This is to be used in examples but it is not bery useful. |
cluster_slice | Cluster a slice of |
read_fasta_sorted | Read the sequences inside a buffer in FASTA format and store it in a sorted
|