Skip to main content

passage_token_offsets

Function passage_token_offsets 

Source
pub fn passage_token_offsets(
    text: &str,
) -> Result<Vec<(usize, usize)>, AppError>
Expand description

Returns the byte-offset pairs (start, end) for each whitespace-delimited word in text. The tokenizers crate used to return true sub-word offsets; the LLM headless path doesn’t need that granularity, so we return word boundaries.