determine_splitters_streaming_first_sample

Function determine_splitters_streaming_first_sample 

Source
pub fn determine_splitters_streaming_first_sample(
    fasta_path: &Path,
    k: usize,
    segment_size: usize,
) -> Result<(AHashSet<u64>, AHashSet<u64>, AHashSet<u64>)>
Expand description

Build a splitter set from ONLY the first sample in a PanSN file

This is used for single-file PanSN mode where multiple samples are in one file. We need to compute splitters from just the reference (first) sample, matching the behavior of multi-file mode where each file is a separate sample.

§Arguments

  • fasta_path - Path to PanSN FASTA file (can be gzipped)
  • k - K-mer length
  • segment_size - Minimum segment size

§Returns

Tuple of (splitters, singletons, duplicates) HashSets