pub struct OffsetMapping { /* private fields */ }Expand description
Offset mapping from tokenizer.
Maps each token to its character span in the original text. Used to convert between token indices and character positions.
§Research Note (HuggingFace Tokenizers)
The offset_mapping from HuggingFace tokenizers is a list of
(char_start, char_end) for each token. Special tokens like
[CLS] and [SEP] have offset (0, 0).
Implementations§
Source§impl OffsetMapping
impl OffsetMapping
Sourcepub fn char_span_to_tokens(
&self,
char_start: usize,
char_end: usize,
) -> Option<(usize, usize)>
pub fn char_span_to_tokens( &self, char_start: usize, char_end: usize, ) -> Option<(usize, usize)>
Find tokens that overlap with a character span.
Returns (first_token, last_token_exclusive).
§Note on Label Propagation
For entity “playing” tokenized as ["play", "##ing"]:
- Assign B-PER to “play” (first token)
- Assign I-PER to “##ing” (continuation)
Trait Implementations§
Source§impl Clone for OffsetMapping
impl Clone for OffsetMapping
Source§fn clone(&self) -> OffsetMapping
fn clone(&self) -> OffsetMapping
Returns a duplicate of the value. Read more
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source. Read moreAuto Trait Implementations§
impl Freeze for OffsetMapping
impl RefUnwindSafe for OffsetMapping
impl Send for OffsetMapping
impl Sync for OffsetMapping
impl Unpin for OffsetMapping
impl UnsafeUnpin for OffsetMapping
impl UnwindSafe for OffsetMapping
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more