Expand description
sentience-tokenize — tiny zero-dep tokenizer for a simple DSL.
§Stable API surface (guaranteed across compatible releases)
TokenKind,Token,Spantokenize(&str) -> Result<Vec<Token>, LexError>tokenize_iter(&str)returning an iterator ofResult<Token, LexError>LineMapfor byte→(line,col) mappingLexErrorandLexErrorKind
§Versioning
- Minor releases (
0.x.y→0.(x+1).0) may add new token kinds behind minor bumps but will not break existing enum variants or fields. - Patch releases only fix bugs and do not change public types or behavior except to correct spec-conformant errors.
- Any breaking change to the above surface will be accompanied by a semver-visible minor bump and noted in the changelog.
§Spec (summary)
- Identifiers:
[A-Za-z_][A-Za-z0-9_]*, ASCII only. - Numbers: decimal integers/decimals with optional exponent (
e|E[+|-]d+). A single dot is allowed once;..is not consumed by numbers. - Strings: double-quoted with escapes
\n \t \r \" \\. Unknown escapes are errors. - Comments:
//to end-of-line. - Delimiters:
() { } [ ] , : ;. - Operators:
= + - * / ->. - Keywords:
true false if then else let rule and or.
Structs§
- LexError
- Error type and categories returned by the lexer; stable across minor versions.
- Lexer
- Streaming lexer. Prefer
tokenize/tokenize_iterunless you need manual control. - LineMap
- Utility for mapping byte offsets to
(line, column); stable part of the public API. - Span
- Byte span
[start, end)into the original source. - Token
- A token with its
TokenKindandSpan. - Tokens
- Iterator-based API over tokens. Yields
Result<Token, LexError>.
Enums§
- LexError
Kind - Error type and categories returned by the lexer; stable across minor versions.
- Token
Kind - Token kind for the DSL. Variant set is stable across minor releases; new variants may be added in minor versions.
Functions§
- tokenize
- Tokenize the entire input and return a vector of tokens. Errors include unterminated strings/escapes, invalid escapes, invalid numbers, and unexpected characters.
- tokenize_
iter - Iterator-based API over tokens. Yields
Result<Token, LexError>.