Expand description
CD-HIT-compatible .clstr writer/reader and a semantic diff helper.
§Format notes
- Clusters start with a header line:
>Cluster N. - Member lines follow. The first member is the representative and is marked with
*. - We optionally emit lengths with units (e.g.,
150nt,or300aa,). - Parsers in the wild typically extract the member ID as the substring after
>up to the first occurrence of.... We follow this convention.
The writer here is intentionally small and conservative: it emits only the minimal fields required by most downstream tooling.
Structs§
- Clstr
Writer - CD-HIT-compatible
.clstrwriter.
Enums§
- Clstr
Unit - Unit for sequence length annotations in
.clstr.
Functions§
- parse_
clusters_ from_ reader - Parse clusters from any buffered reader.
- read_
clusters - Read clusters from a
.clstrfile (path).