Expand description
sentience-tokenize — tiny zero-dep tokenizer for a simple DSL.
§Stable API surface (guaranteed across compatible releases)
TokenKind,Token,Spantokenize(&str) -> Result<Vec<Token>, LexError>tokenize_iter(&str)returning an iterator ofResult<Token, LexError>LineMapfor byte→(line,col) mappingLexErrorandLexErrorKind
§Versioning
- Patch releases fix bugs only; no public API changes.
- Minor releases (
0.x.y→0.(x+1).0) may add newTokenKindvariants or utilities without removing existing ones. Downstream code should avoid exhaustivematchoverTokenKind; prefer a_catch-all to remain forward-compatible. - Any removal or change of existing public types/fields will be treated as a breaking change and called out explicitly.
§Spec (summary)
- Identifiers:
[A-Za-z_][A-Za-z0-9_]*, ASCII only. - Numbers: decimal integers/decimals with optional exponent (
e|E[+|-]d+). A single dot is allowed once;..is not consumed by numbers. - Strings: double-quoted with escapes
\n \t \r \" \\. Raw newlines are accepted. Unknown escapes are errors. - Comments:
//to end-of-line. - Delimiters:
() { } [ ] , : ;. - Operators:
= + - * / ->. - Keywords:
true false if then else let rule and or.
Structs§
- Borrowed
Token - A zero-copy token with its
BorrowedTokenKindandSpan. - LexError
- Error type and categories returned by the lexer; stable across minor versions.
- Lexer
- Streaming lexer. Prefer
tokenize/tokenize_iterunless you need manual control. - LineMap
- Utility for mapping byte offsets to
(line, column); stable part of the public API. - Span
- Byte span
[start, end)into the original source. - Token
- A token with its
TokenKindandSpan. - Tokens
- Iterator-based API over tokens. Yields
Result<Token, LexError>.
Enums§
- Borrowed
Token Kind - Zero-copy token kind borrowing slices from the source.
Note:
String(&str)contains the literal contents between quotes without unquoting; escapes (e.g.\n) are left as two characters. - LexError
Kind - Error type and categories returned by the lexer; stable across minor versions.
- Token
Kind - Token kind for the DSL. Variant set is stable across minor releases; new variants may be added in minor versions.
Functions§
- tokenize
- Tokenize the entire input and return a vector of tokens. Errors include unterminated strings/escapes, invalid escapes, invalid numbers, and unexpected characters.
- tokenize_
borrowed - Tokenize the entire input returning zero-copy tokens that borrow from
src. Strings are validated (including escapes) but their contents are not unescaped; the returned&stris the raw slice between quotes. - tokenize_
iter - Iterator-based API over tokens. Yields
Result<Token, LexError>. Streaming tokenizer over&str. YieldsResult<Token, LexError>items. Terminates iteration after the first error.