Struct IntervalEncoding

Source

pub struct IntervalEncoding<'a> { /* private fields */ }

Expand description

This data structure allows fast, i.e. typically O(1), counting of tokens for arbitrary substrings of the original input text. It achieves this by precomputing for every position the last token which ends at this position. These last tokens represent a token tree with its root being the empty input text where each path starting at the root represents the encoded tokens of the corresponding text prefix. The struct stores a topological ordering in tree_id over this tree which then enables O(1) testing whether one node is the predecessor of another node. With the tree_depth field the number of path length (which is equivalent to the number of encoded tokens) can be determined in O(1) as well.

Note: the fields tree_end and tree_depth could also be represented by succinct data structures, reducing their size drastically. Since we still need the tree_id and last_token fields, this would in total reduce memory footprint by a bit less than 50%.

Struct IntervalEncodingCopy item path

Implementations§

impl<'a> IntervalEncoding<'a>

pub fn new(bpe: &'a BytePairEncoding, text: &'a [u8]) -> IntervalEncoding<'a>

pub fn count(&self, range: Range<usize>) -> usize

Auto Trait Implementations§

impl<'a> Freeze for IntervalEncoding<'a>

impl<'a> RefUnwindSafe for IntervalEncoding<'a>

impl<'a> Send for IntervalEncoding<'a>

impl<'a> Sync for IntervalEncoding<'a>

impl<'a> Unpin for IntervalEncoding<'a>

impl<'a> UnwindSafe for IntervalEncoding<'a>

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>where F: FnOnce(&Self) -> bool,

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct IntervalEncoding

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,