pub struct SpectrumPreservingStringSet { /* private fields */ }Expand description
The spectrum-preserving string set
Stores all strings in a bit-packed format with offsets to each string. This allows for both memory-efficient storage and efficient access patterns.
Offsets are stored using Elias-Fano encoding (via sux-rs) for compact
representation and O(1) locate() via successor queries with Cursor.
Implementations§
Source§impl SpectrumPreservingStringSet
impl SpectrumPreservingStringSet
Sourcepub fn from_parts(
strings: Vec<u8>,
offsets: OffsetsVector,
k: usize,
m: usize,
) -> Self
pub fn from_parts( strings: Vec<u8>, offsets: OffsetsVector, k: usize, m: usize, ) -> Self
Create a new SPSS from existing strings and offsets
Converts the OffsetsVector to Elias-Fano encoding for compact storage.
§Arguments
strings- Encoded string data (2-bit packed)offsets- Offset vector for string boundaries (will be converted to EF)k- K-mer sizem- Minimizer size
Sourcepub fn string_offsets(&self, string_id: u32) -> (u64, u64)
pub fn string_offsets(&self, string_id: u32) -> (u64, u64)
Get the string offsets (begin, end) for a string ID
Sourcepub fn num_strings(&self) -> u64
pub fn num_strings(&self) -> u64
Get the number of strings stored
Sourcepub fn string_offset(&self, string_id: u64) -> u64
pub fn string_offset(&self, string_id: u64) -> u64
Get the starting offset (in bases) of a string
Sourcepub fn total_bases(&self) -> u64
pub fn total_bases(&self) -> u64
Get the total number of bases stored
Sourcepub fn locate(&self, absolute_pos: u64) -> Option<(u64, u64)>
pub fn locate(&self, absolute_pos: u64) -> Option<(u64, u64)>
Locate which string contains a given absolute position.
Returns (string_id, string_begin) or None if out of bounds.
Sourcepub fn locate_with_end(&self, absolute_pos: u64) -> Option<(u64, u64, u64)>
pub fn locate_with_end(&self, absolute_pos: u64) -> Option<(u64, u64, u64)>
Locate which string contains a given absolute position, returning
(string_id, string_begin, string_end) in a single EF traversal.
This is more efficient than calling locate() + string_offsets().
Sourcepub fn strings_bytes(&self) -> usize
pub fn strings_bytes(&self) -> usize
Get the byte size of the packed strings data
Sourcepub fn offsets_bytes(&self) -> usize
pub fn offsets_bytes(&self) -> usize
Get the byte size of the offsets vector
Sourcepub fn string_length(&self, string_id: u64) -> usize
pub fn string_length(&self, string_id: u64) -> usize
Get the length of a specific string in bases
Sourcepub fn decode_kmer<const K: usize>(
&self,
string_id: u64,
kmer_pos: usize,
) -> Kmer<K>
pub fn decode_kmer<const K: usize>( &self, string_id: u64, kmer_pos: usize, ) -> Kmer<K>
Decode a k-mer from a specific position in a string
Uses word-level loads from the packed buffer for efficiency.
Sourcepub fn decode_kmer_at<const K: usize>(&self, absolute_pos: usize) -> Kmer<K>
pub fn decode_kmer_at<const K: usize>(&self, absolute_pos: usize) -> Kmer<K>
Decode a k-mer at an absolute base position in the concatenated strings.
Avoids the need for string_id (no binary search needed).
This matches the C++ util::read_kmer_at approach with decoded_offsets.
Sourcepub fn serialize_to<W: Write>(&self, writer: &mut W) -> Result<()>
pub fn serialize_to<W: Write>(&self, writer: &mut W) -> Result<()>
Serialize the SPSS to a writer using a custom binary format.
Format:
- k: u64 (LE)
- m: u64 (LE)
- strings_len: u64 (LE)
- strings: [u8; strings_len]
- offsets: epserde Elias-Fano binary format
Sourcepub fn deserialize_from<R: Read>(reader: &mut R) -> Result<Self>
pub fn deserialize_from<R: Read>(reader: &mut R) -> Result<Self>
Deserialize an SPSS from a reader.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for SpectrumPreservingStringSet
impl RefUnwindSafe for SpectrumPreservingStringSet
impl Send for SpectrumPreservingStringSet
impl Sync for SpectrumPreservingStringSet
impl Unpin for SpectrumPreservingStringSet
impl UnsafeUnpin for SpectrumPreservingStringSet
impl UnwindSafe for SpectrumPreservingStringSet
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T, U> CastableInto<U> for Twhere
U: CastableFrom<T>,
impl<T, U> CastableInto<U> for Twhere
U: CastableFrom<T>,
Source§impl<T> DowncastableFrom<T> for T
impl<T> DowncastableFrom<T> for T
Source§fn downcast_from(value: T) -> T
fn downcast_from(value: T) -> T
Source§impl<T, U> DowncastableInto<U> for Twhere
U: DowncastableFrom<T>,
impl<T, U> DowncastableInto<U> for Twhere
U: DowncastableFrom<T>,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more