WIP: This project is still under active development and not ready for use.
Blazing fast parser combinators with parse-while-lexing architecture (zero-copy), deterministic LALR-style parsing, and no hidden backtracking.
English | 简体中文
Overview
Tokit is a blazing fast parser combinator library for Rust that uniquely combines:
- Parse-While-Lexing Architecture: Zero-copy streaming - parsers consume tokens directly from the lexer without buffering, eliminating allocation overhead
- Deterministic LALR-Style Parsing: Explicit lookahead with compile-time buffer capacity, no hidden backtracking
- Flexible Error Handling: Same parser code adapts for fail-fast runtime or greedy compiler diagnostics via the
Emittertrait
Unlike traditional parser combinators that buffer tokens and rely on implicit backtracking, Tokit streams tokens on-demand with predictable, deterministic decisions. This makes it ideal for building high-performance language tooling, DSL parsers, compilers, and REPLs that need both speed and comprehensive error reporting.
Key Features
- Parse-While-Lexing: Zero-copy streaming architecture - no token buffering, no extra allocations
- No Hidden Backtracking: Explicit, predictable parsing with lookahead-based decisions instead of implicit backtracking
- Deterministic + Composable: Combines the flexibility of parser combinators with LALR-style deterministic table parsing
- Flexible Error Handling Architecture: Designed to support both fail-fast parsing (runtime) and greedy parsing (compiler diagnostics) by swapping the
Emittertype - same parser, different behavior - Token-Based Parsing: Works directly on token streams from any lexer implementing the
Lexer<'inp>trait - Composable Combinators: Build complex parsers from simple, reusable building blocks
- Flexible Error Handling: Configurable error emission strategies (
Fatal,Silent,Ignored) - Rich Error Recovery: Built-in support for error recovery and validation
- Zero-Cost Abstractions: All configuration resolved at compile time
- No-std Support: Core functionality works without allocator
- Multiple Source Types: Support for
str,[u8],Bytes,BStr,HipStr - Logos Integration: Optional
LogosLexeradapter for seamless Logos integration - CST Support: Optional Concrete Syntax Tree support via rowan
Installation
Add this to your Cargo.toml:
[]
= "0.0.0"
Feature Flags
std(default) - Enable standard library supportalloc- Enable allocator support for no-std environmentslogos- EnableLogosLexeradapter for Logos integrationrowan- Enable CST (Concrete Syntax Tree) support with rowan integrationbytes- Support forbytes::Bytesas token sourcebstr- Support forbstr::BStras token sourcehipstr- Support forhipstr::HipStras token sourceamong- EnableAmong<L, M, R>parseable supportsmallvec- Enable small vector optimization utilities
Core Components
Lexer Layer
-
Lexer<'inp>TraitCore trait for lexers that produce token streams. Implement this to use any lexer with Tokit.
-
Token<'a>TraitDefines token types with:
Kind: Token kind discriminatorError: Associated error type
-
LogosLexer<'inp, T, L>(feature:logos)Ready-to-use adapter for integrating Logos lexers.
Error Handling
Tokit's flexible Emitter system allows the same parser to adapt to different use cases by simply changing the error handling strategy:
- Emitter Strategies
Fatal- Fail-fast parsing: Stop on first error (default) - perfect for runtime parsing and REPLs- Greedy emitter (planned) - Collect all errors and continue parsing - perfect for compiler diagnostics and IDEs
Silent- Silently ignore errorsIgnored- Ignore errors completely
Key Design: Change the Emitter type to switch between fail-fast runtime parsing and greedy compiler diagnostics - same parser code, different behavior. This makes Tokit suitable for both:
-
Runtime/REPL: Fast feedback with
Fatalemitter -
Compiler/IDE: Comprehensive diagnostics with greedy emitter (coming soon)
-
Rich Error Types (in
error/module)- Token-level:
UnexpectedToken,MissingToken,UnexpectedEot - Syntax-level:
Unclosed,Unterminated,Malformed,Invalid - Escape sequences:
HexEscape,UnicodeEscape - All errors include span tracking
- Token-level:
Utilities
-
Span Tracking
Span- Lightweight span representationSpanned<T>- Wrap value with spanLocated<T>- Wrap value with span and source sliceSliced<T>- Wrap value with source slice
-
Parser Configuration
Parser<F, L, O, Error, Context>- Configurable parserParseContext- Context for emitter and cacheWindow- Type-level peek buffer capacity for deterministic lookahead- Note: Lookahead windows support 1-32 token capacity via
typenum::{U1..U32}
Quick Start
Here's a simple example parsing JSON tokens:
use Logos;
use ;
type MyLexer<'a> = LogosLexer;
More Examples
Check out the examples directory:
# JSON token parsing with map combinators
# Note: The calculator examples are being updated for v0.3.0 API
Architecture
Tokit's architecture follows a layered design:
- Lexer Layer - Token production and source abstraction
- Parser Layer - Composable parser combinators
- Error Layer - Rich error types and emission strategies
- Utility Layer - Spans, containers, and helpers
This separation enables:
- Use any lexer by implementing
Lexer<'inp> - Mix and match parser combinators
- Customize error handling per-parser or globally
- Zero-cost abstractions through compile-time configuration
Design Philosophy
Parse-While-Lexing: Zero-Copy Streaming
Tokit uses a parse-while-lexing architecture where parsers consume tokens directly from the lexer as needed, without intermediate buffering:
Traditional Approach (Two-Phase):
Source → Lexer → [Token Buffer] → Parser
↓
Allocate Vec<Token> ← Extra allocation!
Tokit Approach (Streaming):
Source → Lexer ←→ Parser
↑________↓
Zero-copy streaming, no buffer
Benefits:
- ✅ Zero Extra Allocations: No token buffer, tokens consumed on-demand
- ✅ Lower Memory Footprint: Only lookahead window buffered on stack, not entire token stream
- ✅ Better Cache Locality: Tokens processed immediately after lexing
- ✅ Predictable Performance: No large allocations, deterministic memory usage
No Hidden Backtracking
Unlike traditional parser combinators that rely on implicit backtracking (trying alternatives until one succeeds), Tokit uses explicit lookahead-based decisions. This design choice provides:
- Predictable Performance: No hidden exponential backtracking scenarios
- Explicit Control: Developers decide when and where to peek ahead via
peek_then()andpeek_then_choice() - Deterministic Parsing: LALR-style table-driven decisions using fixed-capacity lookahead windows (
Windowtrait) - Better Error Messages: Failed alternatives don't hide earlier, more relevant errors
// Traditional parser combinator (hidden backtracking):
// try_parser1.or(try_parser2).or(try_parser3) // May backtrack!
// Tokit approach (explicit lookahead, no backtracking):
let parser = any
.;
Parser Combinators + Deterministic Table Parsing
Tokit uniquely combines:
- Parser Combinator Flexibility: Compose small parsers into complex grammars
- LALR-Style Determinism: Fixed lookahead windows with deterministic decisions
- Type-Level Capacity: Lookahead buffer size known at compile time (
Window::CAPACITY)
This hybrid approach gives you composable abstractions without sacrificing performance or predictability.
Fail-Fast Runtime ↔ Greedy Compiler Diagnostics
Tokit's architecture decouples parsing logic from error handling strategy through the Emitter trait. This means:
Same Parser, Different Contexts:
- Runtime/REPL Mode: Use
Fatalemitter → stop on first error for immediate feedback - Compiler/IDE Mode: Use greedy emitter (planned) → collect all errors for comprehensive diagnostics
- Testing/Fuzzing: Use
Ignoredemitter → parse through all errors for robustness testing
Benefits:
- ✅ Write parsers once, deploy everywhere
- ✅ No separate "error recovery mode" - it's just a different emitter
- ✅ Custom emitters can implement domain-specific error handling
- ✅ Zero-cost abstraction - emitter behavior resolved at compile time
Inspirations
Tokit takes inspiration from:
- winnow - For ergonomic parser API design
- chumsky - For composable parser combinator patterns
- logos - For high-performance lexing
- rowan - For lossless syntax tree representation
Core Priorities
- Performance - Parse-while-lexing (zero-copy streaming), zero-cost abstractions, no hidden allocations
- Predictability - No hidden backtracking, explicit control flow, deterministic decisions
- Composability - Small parsers combine into complex ones
- Versatility - Same parser works for runtime (fail-fast) and compiler diagnostics (greedy) via
Emitter - Flexibility - Work with any lexer, customize error handling, support both AST and CST
- Correctness - Rich error types, span tracking, validation
Who Uses Tokit?
smear: Blazing fast, fully spec-compliant, reusable parser combinators for standard GraphQL and GraphQL-like DSLs
License
tokit is dual-licensed under:
- MIT License (LICENSE-MIT)
- Apache License, Version 2.0 (LICENSE-APACHE)
You may choose either license for your purposes.
Copyright (c) 2025 Al Liu.