pub struct Token<'a> {
pub text: &'a str,
pub span: Range<usize>,
pub char_span: Range<usize>,
pub kind: TokenKind,
}Expand description
A single token produced by crate::Tokenizer::segment.
The text field is a zero-copy slice of the original input string.
Two span types are provided: span for byte offsets (suitable for slicing
&str) and char_span for Unicode scalar-value offsets (suitable for
Python/JavaScript string indexing and display).
§Example
use kham_core::Tokenizer;
let tok = Tokenizer::new();
let input = "ธนาคาร100แห่ง";
let tokens = tok.segment(input);
for t in &tokens {
// byte span slices the original string exactly
assert_eq!(&input[t.span.clone()], t.text);
// char span equals the Unicode scalar-value count
assert_eq!(t.char_span.end - t.char_span.start, t.text.chars().count());
}Fields§
§text: &'a strZero-copy reference into the original input.
span: Range<usize>Byte offsets start..end in the original input string.
Both boundaries are valid UTF-8 code-point boundaries.
char_span: Range<usize>Unicode scalar-value (char) offsets start..end in the original input.
Use these for language-level string indexing in Python, JavaScript, etc.
kind: TokenKindScript / category of this token.
Implementations§
Trait Implementations§
impl<'a> Eq for Token<'a>
impl<'a> StructuralPartialEq for Token<'a>
Auto Trait Implementations§
impl<'a> Freeze for Token<'a>
impl<'a> RefUnwindSafe for Token<'a>
impl<'a> Send for Token<'a>
impl<'a> Sync for Token<'a>
impl<'a> Unpin for Token<'a>
impl<'a> UnsafeUnpin for Token<'a>
impl<'a> UnwindSafe for Token<'a>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more