Crate sentience_tokenize

Source
Expand description

sentience-tokenize — tiny zero-dep tokenizer for a simple DSL.

§Stable API surface (guaranteed across compatible releases)

  • TokenKind, Token, Span
  • tokenize(&str) -> Result<Vec<Token>, LexError>
  • tokenize_iter(&str) returning an iterator of Result<Token, LexError>
  • LineMap for byte→(line,col) mapping
  • LexError and LexErrorKind

§Versioning

  • Patch releases fix bugs only; no public API changes.
  • Minor releases (0.x.y0.(x+1).0) may add new TokenKind variants or utilities without removing existing ones. Downstream code should avoid exhaustive match over TokenKind; prefer a _ catch-all to remain forward-compatible.
  • Any removal or change of existing public types/fields will be treated as a breaking change and called out explicitly.

§Spec (summary)

  • Identifiers: [A-Za-z_][A-Za-z0-9_]*, ASCII only.
  • Numbers: decimal integers/decimals with optional exponent (e|E[+|-]d+). A single dot is allowed once; .. is not consumed by numbers.
  • Strings: double-quoted with escapes \n \t \r \" \\. Raw newlines are accepted. Unknown escapes are errors.
  • Comments: // to end-of-line.
  • Delimiters: () { } [ ] , : ;.
  • Operators: = + - * / ->.
  • Keywords: true false if then else let rule and or.

Structs§

LexError
Error type and categories returned by the lexer; stable across minor versions.
Lexer
Streaming lexer. Prefer tokenize / tokenize_iter unless you need manual control.
LineMap
Utility for mapping byte offsets to (line, column); stable part of the public API.
Span
Byte span [start, end) into the original source.
Token
A token with its TokenKind and Span.
Tokens
Iterator-based API over tokens. Yields Result<Token, LexError>.

Enums§

LexErrorKind
Error type and categories returned by the lexer; stable across minor versions.
TokenKind
Token kind for the DSL. Variant set is stable across minor releases; new variants may be added in minor versions.

Functions§

tokenize
Tokenize the entire input and return a vector of tokens. Errors include unterminated strings/escapes, invalid escapes, invalid numbers, and unexpected characters.
tokenize_iter
Iterator-based API over tokens. Yields Result<Token, LexError>.