Expand description
WIP: This project is still under active development and not ready for use.
Tokit
Blazing fast parser combinators with parse-while-lexing architecture (zero-copy), deterministic LALR-style parsing, and no hidden backtracking.
English | 简体中文
§Overview
Tokit is a blazing fast parser combinator library for Rust that uniquely combines:
- Parse-While-Lexing Architecture: Zero-copy streaming - parsers consume tokens directly from the lexer without buffering, eliminating allocation overhead
- Deterministic LALR-Style Parsing: Explicit lookahead with compile-time buffer capacity, no hidden backtracking
- Flexible Error Handling: Same parser code adapts for fail-fast runtime or greedy compiler diagnostics via the
Emittertrait
Unlike traditional parser combinators that buffer tokens and rely on implicit backtracking, Tokit streams tokens on-demand with predictable, deterministic decisions. This makes it ideal for building high-performance language tooling, DSL parsers, compilers, and REPLs that need both speed and comprehensive error reporting.
§Key Features
- Parse-While-Lexing: Zero-copy streaming architecture - no token buffering, no extra allocations
- No Hidden Backtracking: Explicit, predictable parsing with lookahead-based decisions instead of implicit backtracking
- Deterministic + Composable: Combines the flexibility of parser combinators with LALR-style deterministic table parsing
- Flexible Error Handling Architecture: Designed to support both fail-fast parsing (runtime) and greedy parsing (compiler diagnostics) by swapping the
Emittertype - same parser, different behavior - Token-Based Parsing: Works directly on token streams from any lexer implementing the
Lexer<'inp>trait - Composable Combinators: Build complex parsers from simple, reusable building blocks
- Flexible Error Handling: Configurable error emission strategies (
Fatal,Silent,Ignored) - Rich Error Recovery: Built-in support for error recovery and validation
- Zero-Cost Abstractions: All configuration resolved at compile time
- No-std Support: Core functionality works without allocator
- Multiple Source Types: Support for
str,[u8],Bytes,BStr,HipStr - Logos Integration: Optional
LogosLexeradapter for seamless Logos integration - CST Support: Optional Concrete Syntax Tree support via rowan
§Installation
Add this to your Cargo.toml:
[dependencies]
tokit = "0.0.0"§Feature Flags
std(default) - Enable standard library supportalloc- Enable allocator support for no-std environmentslogos- EnableLogosLexeradapter for Logos integrationrowan- Enable CST (Concrete Syntax Tree) support with rowan integrationbytes- Support forbytes::Bytesas token sourcebstr- Support forbstr::BStras token sourcehipstr- Support forhipstr::HipStras token sourceamong- EnableAmong<L, M, R>parseable supportsmallvec- Enable small vector optimization utilities
§Core Components
§Lexer Layer
-
Lexer<'inp>TraitCore trait for lexers that produce token streams. Implement this to use any lexer with Tokit.
-
Token<'a>TraitDefines token types with:
Kind: Token kind discriminatorError: Associated error type
-
LogosLexer<'inp, T, L>(feature:logos)Ready-to-use adapter for integrating Logos lexers.
§Error Handling
Tokit’s flexible Emitter system allows the same parser to adapt to different use cases by simply changing the error handling strategy:
- Emitter Strategies
Fatal- Fail-fast parsing: Stop on first error (default) - perfect for runtime parsing and REPLs- Greedy emitter (planned) - Collect all errors and continue parsing - perfect for compiler diagnostics and IDEs
Silent- Silently ignore errorsIgnored- Ignore errors completely
Key Design: Change the Emitter type to switch between fail-fast runtime parsing and greedy compiler diagnostics - same parser code, different behavior. This makes Tokit suitable for both:
-
Runtime/REPL: Fast feedback with
Fatalemitter -
Compiler/IDE: Comprehensive diagnostics with greedy emitter (coming soon)
-
Rich Error Types (in
error/module)- Token-level:
UnexpectedToken,MissingToken,UnexpectedEot - Syntax-level:
Unclosed,Unterminated,Malformed,Invalid - Escape sequences:
HexEscape,UnicodeEscape - All errors include span tracking
- Token-level:
§Utilities
-
Span Tracking
Span- Lightweight span representationSpanned<T>- Wrap value with spanLocated<T>- Wrap value with span and source sliceSliced<T>- Wrap value with source slice
-
Parser Configuration
Parser<F, L, O, Error, Context>- Configurable parserParseContext- Context for emitter and cacheWindow- Type-level peek buffer capacity for deterministic lookahead- Note: Lookahead windows support 1-32 token capacity via
typenum::{U1..U32}
§Quick Start
Here’s a simple example parsing JSON tokens:
use logos::Logos;
use tokit::{Any, Parse, Token as TokenT};
#[derive(Debug, Logos, Clone)]
#[logos(skip r"[ \t\r\n\f]+")]
enum Token {
#[token("true", |_| true)]
#[token("false", |_| false)]
Bool(bool),
#[token("null")]
Null,
#[regex(r"-?(?:0|[1-9]\d*)(?:\.\d+)?", |lex| lex.slice().parse::<f64>().unwrap())]
Number(f64),
}
#[derive(Debug, Display, Clone, Copy)]
enum TokenKind {
Bool,
Null,
Number,
}
impl TokenT<'_> for Token {
type Kind = TokenKind;
type Error = ();
fn kind(&self) -> Self::Kind {
match self {
Token::Bool(_) => TokenKind::Bool,
Token::Null => TokenKind::Null,
Token::Number(_) => TokenKind::Number,
}
}
}
type MyLexer<'a> = tokit::LogosLexer<'a, Token, Token>;
fn main() {
// Parse any token and extract its value
let parser = Any::parser::<'_, MyLexer<'_>, ()>()
.map(|tok: Token| match tok {
Token::Number(n) => Some(n),
_ => None,
});
let result = parser.parse("42.5");
println!("{:?}", result); // Ok(Some(42.5))
}§More Examples
Check out the examples directory:
# JSON token parsing with map combinators
cargo run --example json
# Note: The calculator examples are being updated for v0.3.0 API§Architecture
Tokit’s architecture follows a layered design:
- Lexer Layer - Token production and source abstraction
- Parser Layer - Composable parser combinators
- Error Layer - Rich error types and emission strategies
- Utility Layer - Spans, containers, and helpers
This separation enables:
- Use any lexer by implementing
Lexer<'inp> - Mix and match parser combinators
- Customize error handling per-parser or globally
- Zero-cost abstractions through compile-time configuration
§Design Philosophy
§Parse-While-Lexing: Zero-Copy Streaming
Tokit uses a parse-while-lexing architecture where parsers consume tokens directly from the lexer as needed, without intermediate buffering:
Traditional Approach (Two-Phase):
Source → Lexer → [Token Buffer] → Parser
↓
Allocate Vec<Token> ← Extra allocation!Tokit Approach (Streaming):
Source → Lexer ←→ Parser
↑________↓
Zero-copy streaming, no bufferBenefits:
- ✅ Zero Extra Allocations: No token buffer, tokens consumed on-demand
- ✅ Lower Memory Footprint: Only lookahead window buffered on stack, not entire token stream
- ✅ Better Cache Locality: Tokens processed immediately after lexing
- ✅ Predictable Performance: No large allocations, deterministic memory usage
§No Hidden Backtracking
Unlike traditional parser combinators that rely on implicit backtracking (trying alternatives until one succeeds), Tokit uses explicit lookahead-based decisions. This design choice provides:
- Predictable Performance: No hidden exponential backtracking scenarios
- Explicit Control: Developers decide when and where to peek ahead via
peek_then()andpeek_then_choice() - Deterministic Parsing: LALR-style table-driven decisions using fixed-capacity lookahead windows (
Windowtrait) - Better Error Messages: Failed alternatives don’t hide earlier, more relevant errors
// Traditional parser combinator (hidden backtracking):
// try_parser1.or(try_parser2).or(try_parser3) // May backtrack!
// Tokit approach (explicit lookahead, no backtracking):
let parser = any()
.peek_then::<_, typenum::U2>(|peeked, _| {
match peeked.get(0) {
Some(Token::If) => Ok(Action::Continue), // Deterministic decision
_ => Ok(Action::Stop),
}
});§Parser Combinators + Deterministic Table Parsing
Tokit uniquely combines:
- Parser Combinator Flexibility: Compose small parsers into complex grammars
- LALR-Style Determinism: Fixed lookahead windows with deterministic decisions
- Type-Level Capacity: Lookahead buffer size known at compile time (
Window::CAPACITY)
This hybrid approach gives you composable abstractions without sacrificing performance or predictability.
§Fail-Fast Runtime ↔ Greedy Compiler Diagnostics
Tokit’s architecture decouples parsing logic from error handling strategy through the Emitter trait. This means:
Same Parser, Different Contexts:
- Runtime/REPL Mode: Use
Fatalemitter → stop on first error for immediate feedback - Compiler/IDE Mode: Use greedy emitter (planned) → collect all errors for comprehensive diagnostics
- Testing/Fuzzing: Use
Ignoredemitter → parse through all errors for robustness testing
Benefits:
- ✅ Write parsers once, deploy everywhere
- ✅ No separate “error recovery mode” - it’s just a different emitter
- ✅ Custom emitters can implement domain-specific error handling
- ✅ Zero-cost abstraction - emitter behavior resolved at compile time
§Inspirations
Tokit takes inspiration from:
- winnow - For ergonomic parser API design
- chumsky - For composable parser combinator patterns
- logos - For high-performance lexing
- rowan - For lossless syntax tree representation
§Core Priorities
- Performance - Parse-while-lexing (zero-copy streaming), zero-cost abstractions, no hidden allocations
- Predictability - No hidden backtracking, explicit control flow, deterministic decisions
- Composability - Small parsers combine into complex ones
- Versatility - Same parser works for runtime (fail-fast) and compiler diagnostics (greedy) via
Emitter - Flexibility - Work with any lexer, customize error handling, support both AST and CST
- Correctness - Rich error types, span tracking, validation
§Who Uses Tokit?
smear: Blazing fast, fully spec-compliant, reusable parser combinators for standard GraphQL and GraphQL-like DSLs
§License
tokit is dual-licensed under:
- MIT License (LICENSE-MIT)
- Apache License, Version 2.0 (LICENSE-APACHE)
You may choose either license for your purposes.
Copyright (c) 2025 Al Liu.
Re-exports§
pub use emitter::Emitter;pub use lexer::Cache;pub use lexer::Lexed;pub use lexer::Lexer;pub use lexer::Source;pub use lexer::State;pub use lexer::Token;pub use parser::Parse;pub use parser::ParseChoice;pub use parser::ParseContext;pub use parser::ParseInput;pub use parser::Parser;pub use parser::Window;pub use logos;logos
Modules§
- container
- Trait for container types.
- cst
rowan - Concrete Syntax Tree (CST) representations and utilities. Concrete Syntax Tree (CST) utilities built on top of rowan.
- emitter
- The emitter related structures and traits
- error
- Common error types for lexers and parsers.
- lexer
- Lexers and token definitions.
- parser
- Parsers and combinators. Blazing fast parser combinators with deterministic parsing and zero-copy streaming.
- punct
- Common punctuation tokens.
- syntax
- Syntax definitions and traits. Syntax definition and incomplete syntax error types.
- types
- Common types for any programming language. Common types for building language-specific ASTs.
- utils
- Common utilities for working with tokens and lexers.
Macros§
- keyword
- Defines a keyword.
- punctuator
- Defines the punctuators.