Function passage_token_offsets

Source

pub fn passage_token_offsets(
    text: &str,
) -> Result<Vec<(usize, usize)>, AppError>

Expand description

Returns the byte-offset pairs (start, end) for each whitespace-delimited word in text. The tokenizers crate used to return true sub-word offsets; the LLM headless path doesn’t need that granularity, so we return word boundaries.

passage_token_offsets

Function passage_token_offsets Copy item path

Function passage_token_offsets