Crate sentience_tokenize

Crate sentience_tokenize 

Source
Expand description

sentience-tokenize — tiny zero-dep tokenizer for a simple DSL.

§Stable API surface (guaranteed across compatible releases)

  • TokenKind, Token, Span
  • tokenize(&str) -> Result<Vec<Token>, LexError>
  • tokenize_iter(&str) returning an iterator of Result<Token, LexError>
  • LineMap for byte→(line,col) mapping
  • LexError and LexErrorKind

§Versioning

  • Patch releases fix bugs only; no public API changes.
  • Minor releases (0.x.y0.(x+1).0) may add new TokenKind variants or utilities without removing existing ones. Downstream code should avoid exhaustive match over TokenKind; prefer a _ catch-all to remain forward-compatible.
  • Any removal or change of existing public types/fields will be treated as a breaking change and called out explicitly.

§Spec (summary)

  • Identifiers: [A-Za-z_][A-Za-z0-9_]*, ASCII only.
  • Numbers: decimal integers/decimals with optional exponent (e|E[+|-]d+). A single dot is allowed once; .. is not consumed by numbers.
  • Strings: double-quoted with escapes \n \t \r \" \\. Raw newlines are accepted. Unknown escapes are errors.
  • Comments: // to end-of-line.
  • Delimiters: () { } [ ] , : ;.
  • Operators: = + - * / ->.
  • Keywords: true false if then else let rule and or.

Structs§

BorrowedToken
A zero-copy token with its BorrowedTokenKind and Span.
LexError
Error type and categories returned by the lexer; stable across minor versions.
Lexer
Streaming lexer. Prefer tokenize / tokenize_iter unless you need manual control.
LineMap
Utility for mapping byte offsets to (line, column); stable part of the public API.
Span
Byte span [start, end) into the original source.
Token
A token with its TokenKind and Span.
Tokens
Iterator-based API over tokens. Yields Result<Token, LexError>.

Enums§

BorrowedTokenKind
Zero-copy token kind borrowing slices from the source. Note: String(&str) contains the literal contents between quotes without unquoting; escapes (e.g. \n) are left as two characters.
LexErrorKind
Error type and categories returned by the lexer; stable across minor versions.
TokenKind
Token kind for the DSL. Variant set is stable across minor releases; new variants may be added in minor versions.

Functions§

tokenize
Tokenize the entire input and return a vector of tokens. Errors include unterminated strings/escapes, invalid escapes, invalid numbers, and unexpected characters.
tokenize_borrowed
Tokenize the entire input returning zero-copy tokens that borrow from src. Strings are validated (including escapes) but their contents are not unescaped; the returned &str is the raw slice between quotes.
tokenize_iter
Iterator-based API over tokens. Yields Result<Token, LexError>. Streaming tokenizer over &str. Yields Result<Token, LexError> items. Terminates iteration after the first error.