textspan 0.2.2

Text span utility
Documentation

Text span utilities for Rust and Python

creates.io pypi Action Status

Usage (Python)

Install: pip install pytextspan

get_original_spans

def get_original_spans(
    tokens: List[str], original_text: str,
) -> List[List[Tuple[int, int]]]: ...

Returns the span indices of original_text from the tokens based on the shortest edit script (SES). This is useful, for example, when you want to get the spans in the original text of tokens obtained in the normalized text.

>>> import textspan
>>> tokens = ["foo", "bar"]
>>> textspan.get_original_spans(tokens, "FO.o  BåR")
[[(0, 2), (3, 4)], [(6, 9)]]

align_spans

def align_spans(
    spans: List[Tuple[int, int]], text: str, original_text: str,
) -> List[List[Tuple[int, int]]]: ...

Converts the spans defined in text to those defined in original_text. It is useful, for example, when you want to get the spans on normalized text but you want the spans in original, unnormalized text.

>>> spans = [(0, 2), (3, 4)]
>>> mapping = [[0, 1], [], [2], [4, 5, 6]]
>>> align_spans_by_mapping(&spans, &mapping)
[[(0, 2)], [(4, 7)]]

align_spans_by_mapping

def align_spans_by_mapping(
    spans: List[Tuple[int, int]], mapping: List[List[int]],
) -> List[List[Tuple[int, int]]]: ...

Converts the spans by the given mapping.

>>> spans = [(0, 2), (3, 4)]
>>> mapping = [vec![0, 1], vec![], vec![2], vec![4, 5, 6]]
>>> align_spans_by_mapping(spans, mapping)
[[(0, 2)], [(4, 7)]]