Sentience tokenizer
Tiny zero-dependency tokenizer for simple DSLs and config/query languages in Rust.
Generic: drop it into parsers, rule engines, interpreters, or build tooling.
Supports identifiers, numbers, strings, operators, and a small set of keywords.
Designed for speed, clarity, and easy embedding.
Features
- Zero dependencies (only std).
- Token kinds: identifiers, numbers, strings, parens/brackets/braces,
= + - * / ->. - Keywords:
true false if then else let rule and or. - Spans included for each token.
- Whitespace & // comments skipped.
Spec
| Aspect | Rules |
|---|---|
| Identifiers | ASCII: [A-Za-z_][A-Za-z0-9_]* |
| Numbers | Decimal integers/decimals; optional exponent e|E[+\-]d+. Single dot allowed once; .. is not consumed by numbers. |
| Strings | Double-quoted. Escapes: \n, \t, \r, \", \\. Unknown escapes = error. |
| Comments | // to end-of-line. |
| Delimiters | ( ) { } [ ] , : ; |
| Operators | =, +, -, *, /, -> |
| Keywords | true, false, if, then, else, let, rule, and, or |
The enum TokenKind, types Token/Span, functions tokenize/tokenize_iter, LineMap, and error types LexError{Kind} are part of the stable API.
Error Reporting
Lexing errors return a LexError with kind and span. Example with LineMap:
use ;
let src = "\"abc\\x\"";
let map = new;
let err = tokenize.unwrap_err;
let = map.to_line_col;
println!;
Output
Stable API surface
- Types:
TokenKind,Token,Span - Functions:
tokenize(&str) -> Result<Vec<Token>, LexError>,tokenize_iter(&str) - Utilities:
LineMapfor byte→(line, col) - Errors:
LexError,LexErrorKind
Iterator API example
use ;
Install
Add to Cargo.toml:
[]
= "0.1"
Example
use tokenize;
Output (truncated)
Let @18..21
Rule @22..26
Ident("greet") @27..32
LParen @32..33
Ident("name") @33..37
RParen @37..38
Eq @39..40
String("hi, ") @41..47
Plus @48..49
Ident("name") @50..54
...
Run tests
Example binary
|
Dev
Benchmark
Fuzzing
Includes a cargo-fuzz setup.
Why?
- Small, standalone lexer - no macros, no regexes.
- Useful as a foundation for parsers, DSLs, or interpreters.
- Explicit spans for better error reporting.
License
MIT © 2025 Nenad Bursać