Expand description
SIMD-accelerated FASTQ parsing using Helicase-style bitmask operations.
This crate provides high-throughput FASTQ parsing by processing 64 bytes at a time
through SIMD registers (NEON on ARM, AVX2 on x86_64), classifying newline characters
via bitmask operations and finding record boundaries without per-byte branching.
§Architecture
- Lexer: Loads 64-byte blocks into SIMD registers, produces a
u64bitmask where bitiis set if byteiis a newline (\n). - Parser: Walks the newline bitmask with
trailing_zeros()to find record boundaries. Every 4th newline marks the end of a FASTQ record.
§Example
use fgumi_simd_fastq::{find_record_offsets, parse_records};
let fastq = b"@r1\nACGT\n+\nIIII\n@r2\nTTTT\n+\nJJJJ\n";
let offsets = find_record_offsets(fastq);
assert_eq!(offsets, vec![0, 16, 32]);
let records: Vec<_> = parse_records(fastq).collect();
assert_eq!(records.len(), 2);
assert_eq!(records[0].name, b"r1");
assert_eq!(records[0].sequence, b"ACGT");Structs§
- Fastq
Bitmask - Bitmask output from lexing a 64-byte block.
- Fastq
Record - A borrowed FASTQ record with zero-copy slices into the input buffer.
- Simd
Fastq Reader - Buffered FASTQ reader that uses SIMD-accelerated record boundary detection.
Functions§
- find_
record_ offsets - Find FASTQ record boundary offsets in a byte buffer.
- lex_
block_ full - Lex a 64-byte block, producing newline bitmask, ACGT bitmask, and 2-bit encoding.
- parse_
records - Iterator over zero-copy FASTQ records parsed from a byte buffer.