pub fn determine_splitters_streaming(
fasta_path: &Path,
k: usize,
segment_size: usize,
) -> Result<(AHashSet<u64>, AHashSet<u64>, AHashSet<u64>)>Expand description
Build a splitter set by streaming through a FASTA file (memory-efficient!)
This matches C++ AGC’s approach but streams the file twice instead of loading all contigs into memory. For yeast (12MB genome):
- Max memory: ~100MB (Vec of 12M k-mers)
- vs loading all contigs: ~2.8GB
§Arguments
fasta_path- Path to reference FASTA file (can be gzipped)k- K-mer lengthsegment_size- Minimum segment size
§Returns
Tuple of (splitters, singletons, duplicates) HashSets