Skip to main content

SegmentedText

Struct SegmentedText 

Source
pub struct SegmentedText { /* private fields */ }
Expand description

Provider for texts partitioned into segments at known cumulative end positions. lim_at(p) binary-searches the sorted ends list and returns the distance from p to the next boundary.

Storage cost is 8 × n_segments bytes (the cumulative-ends Vec<u64>). For a 50 K-junction SA index on a 6 GB genome that is 400 KB total — vs the 750 MB a packed bitmap would need, and the 6 GB an extra-byte-per-symbol u16 text would need.

Lookup is O(log n_segments) — a few cycles for typical segment counts. The merge can cache lim_p/lim_q across LCP calls so the cost amortises to ~one binary search per output record.

Two constructors:

  • from_lengths takes per-segment lengths and builds the cumulative-ends list internally. Most ergonomic when the caller has [chr_len_0, chr_len_1, …] already.
  • from_ends takes the sorted cumulative ends directly. Useful when the caller already has them — e.g. STAR’s chr_start[] table.

Both constructors require the segments to cover the whole text (sum(lengths) == text_len, or ends.last() == Some(text_len)).

Implementations§

Source§

impl SegmentedText

Source

pub fn from_lengths(text_len: usize, lengths: &[usize]) -> Self

Build from per-segment lengths. The sum must equal text_len.

Source

pub fn from_ends(text_len: usize, ends: Vec<u64>) -> Self

Build from sorted, strictly-increasing cumulative end positions. ends.last() must equal text_len.

Source

pub fn text_len(&self) -> usize

Total text length in symbols.

Source

pub fn n_segments(&self) -> usize

Number of segments.

Source

pub fn ends(&self) -> &[u64]

Cumulative end positions, sorted, strictly increasing. ends()[i] is the position one past the last symbol of segment i.

Trait Implementations§

Source§

impl Clone for SegmentedText

Source§

fn clone(&self) -> SegmentedText

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for SegmentedText

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl LimitProvider for SegmentedText

Source§

fn lim_at(&self, p: usize) -> usize

Logical length of the suffix starting at position p in symbols — i.e. the number of comparable symbols before the next segment boundary or end-of-text. Must be at most text.len() - p.
Source§

fn boundary_order( &self, p_a: usize, lim_a: usize, p_b: usize, lim_b: usize, ) -> Ordering

Order to resolve when one or both suffixes hit their boundary before any byte of their shared prefix differs. The default is lim_a.cmp(&lim_b) — “shorter-suffix-is-smaller”, the standard generalised-SA / multi-string-SA convention, what a Vec<&str> sort with &str ordering produces. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.