pub fn determine_splitters(
contigs: &[Contig],
k: usize,
segment_size: usize,
) -> (AHashSet<u64>, AHashSet<u64>, AHashSet<u64>)Expand description
Build a splitter set from reference contigs
This implements the C++ AGC three-pass algorithm:
- Find all singleton k-mers in reference (candidates)
- Scan reference to find which candidates are ACTUALLY used as splitters
- Return only the actually-used splitters
This ensures all genomes split at the SAME positions!
§Arguments
contigs- Vector of reference contigsk- K-mer lengthsegment_size- Minimum segment size
§Returns
Tuple of (splitters, singletons, duplicates) HashSets
- splitters: Actually-used splitter k-mers (for segmentation)
- singletons: All singleton k-mers from reference (for adaptive mode exclusion)
- duplicates: All duplicate k-mers from reference (for adaptive mode exclusion)