pub struct FastaStreamHasher { /* private fields */ }Expand description
A streaming FASTA hasher that processes data chunk-by-chunk.
This is designed for WASM environments where files are fetched in chunks. Memory usage is constant (~100KB) regardless of file size:
- Internal state: ~200 bytes (hasher state, counters)
- Line buffer: ~8KB (handles long lines)
- Gzip decoder state: ~32KB if compressed
- Results: grows only with number of sequences (not sequence length)
§Example
use gtars_refget::digest::stream::FastaStreamHasher;
let mut hasher = FastaStreamHasher::new();
// Process first chunk
hasher.update(b">chr1\nACGT").expect("update");
// Process second chunk
hasher.update(b"TGCA\n>chr2\nGGGG\n").expect("update");
// Finalize and get results
let collection = hasher.finish().expect("finish");
assert_eq!(collection.sequences.len(), 2);Implementations§
Source§impl FastaStreamHasher
impl FastaStreamHasher
Sourcepub fn update(&mut self, chunk: &[u8]) -> Result<()>
pub fn update(&mut self, chunk: &[u8]) -> Result<()>
Process a chunk of FASTA data.
This method can be called multiple times with successive chunks of data. Handles both plain text and gzip-compressed FASTA with true streaming decompression (constant memory usage).
§Arguments
chunk- A slice of bytes from the FASTA file
§Returns
Ok(()) on success, Err on parsing error
Sourcepub fn finish(self) -> Result<SequenceCollection>
pub fn finish(self) -> Result<SequenceCollection>
Finalize processing and return the SequenceCollection.
This must be called after all chunks have been processed via update().
Sourcepub fn sequence_count(&self) -> usize
pub fn sequence_count(&self) -> usize
Get the current number of completed sequences.
Sourcepub fn in_sequence(&self) -> bool
pub fn in_sequence(&self) -> bool
Check if currently processing a sequence.
Sourcepub fn current_sequence_name(&self) -> Option<&str>
pub fn current_sequence_name(&self) -> Option<&str>
Get the name of the sequence currently being processed (if any).
Sourcepub fn current_sequence_length(&self) -> usize
pub fn current_sequence_length(&self) -> usize
Get the current length of the sequence being processed.