> WIP: This project is still under active development and not ready for use.
<div align="center">
<h1>Tokit</h1>
</div>
<div align="center">
Blazing fast parser combinators with parse-while-lexing architecture (zero-copy), deterministic LALR-style parsing, and no hidden backtracking.
[<img alt="github" src="https://img.shields.io/badge/github-al8n/tokit-8da0cb?style=for-the-badge&logo=Github" height="22">][Github-url]
<img alt="LoC" src="https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2Fal8n%2F327b2a8aef9003246e45c6e47fe63937%2Fraw%2Ftokit" height="22">
[<img alt="Build" src="https://img.shields.io/github/actions/workflow/status/al8n/tokit/ci.yml?logo=Github-Actions&style=for-the-badge" height="22">][CI-url]
[<img alt="codecov" src="https://img.shields.io/codecov/c/gh/al8n/tokit?style=for-the-badge&token=6R3QFWRWHL&logo=codecov" height="22">][codecov-url]
[<img alt="docs.rs" src="https://img.shields.io/badge/docs.rs-tokit-66c2a5?style=for-the-badge&labelColor=555555&logo=" height="20">][doc-url]
[<img alt="crates.io" src="https://img.shields.io/crates/v/tokit?style=for-the-badge&logo=" height="22">][crates-url]
[<img alt="crates.io" src="https://img.shields.io/crates/d/tokit?color=critical&logo=&style=for-the-badge" height="22">][crates-url]
<img alt="license" src="https://img.shields.io/badge/License-Apache%202.0/MIT-blue.svg?style=for-the-badge&fontColor=white&logoColor=f5c076&logo=" height="22">
English | [简体中文][zh-cn-url]
</div>
## Overview
**Tokit** is a blazing fast parser combinator library for Rust that uniquely combines:
- **Parse-While-Lexing Architecture**: Zero-copy streaming - parsers consume tokens directly from the lexer without buffering, eliminating allocation overhead
- **Deterministic LALR-Style Parsing**: Explicit lookahead with compile-time buffer capacity, no hidden backtracking
- **Flexible Error Handling**: Same parser code adapts for fail-fast runtime or greedy compiler diagnostics via the `Emitter` trait
Unlike traditional parser combinators that buffer tokens and rely on implicit backtracking, Tokit streams tokens on-demand with predictable, deterministic decisions. This makes it ideal for building high-performance language tooling, DSL parsers, compilers, and REPLs that need both speed and comprehensive error reporting.
### Key Features
- **Parse-While-Lexing**: Zero-copy streaming architecture - no token buffering, no extra allocations
- **No Hidden Backtracking**: Explicit, predictable parsing with lookahead-based decisions instead of implicit backtracking
- **Deterministic + Composable**: Combines the flexibility of parser combinators with LALR-style deterministic table parsing
- **Flexible Error Handling Architecture**: Designed to support both fail-fast parsing (runtime) and greedy parsing (compiler diagnostics) by swapping the `Emitter` type - same parser, different behavior
- **Token-Based Parsing**: Works directly on token streams from any lexer implementing the `Lexer<'inp>` trait
- **Composable Combinators**: Build complex parsers from simple, reusable building blocks
- **Flexible Error Handling**: Configurable error emission strategies (`Fatal`, `Silent`, `Ignored`)
- **Rich Error Recovery**: Built-in support for error recovery and validation
- **Zero-Cost Abstractions**: All configuration resolved at compile time
- **No-std Support**: Core functionality works without allocator
- **Multiple Source Types**: Support for `str`, `[u8]`, `Bytes`, `BStr`, `HipStr`
- **Logos Integration**: Optional `LogosLexer` adapter for seamless [Logos](https://github.com/maciejhirsz/logos) integration
- **CST Support**: Optional Concrete Syntax Tree support via [rowan](https://github.com/rust-analyzer/rowan)
## Installation
Add this to your `Cargo.toml`:
```toml
[dependencies]
tokit = "0.0.0"
```
### Feature Flags
- `std` (default) - Enable standard library support
- `alloc` - Enable allocator support for no-std environments
- `logos` - Enable `LogosLexer` adapter for Logos integration
- `rowan` - Enable CST (Concrete Syntax Tree) support with rowan integration
- `bytes` - Support for `bytes::Bytes` as token source
- `bstr` - Support for `bstr::BStr` as token source
- `hipstr` - Support for `hipstr::HipStr` as token source
- `among` - Enable `Among<L, M, R>` parseable support
- `smallvec` - Enable small vector optimization utilities
## Core Components
### Lexer Layer
- **`Lexer<'inp>` Trait**
Core trait for lexers that produce token streams. Implement this to use any lexer with Tokit.
- **`Token<'a>` Trait**
Defines token types with:
- `Kind`: Token kind discriminator
- `Error`: Associated error type
- **`LogosLexer<'inp, T, L>`** (feature: `logos`)
Ready-to-use adapter for integrating [Logos](https://github.com/maciejhirsz/logos) lexers.
### Error Handling
Tokit's flexible `Emitter` system allows the same parser to adapt to different use cases by simply changing the error handling strategy:
- **Emitter Strategies**
- `Fatal` - **Fail-fast parsing**: Stop on first error (default) - perfect for runtime parsing and REPLs
- **Greedy emitter** (planned) - Collect all errors and continue parsing - perfect for compiler diagnostics and IDEs
- `Silent` - Silently ignore errors
- `Ignored` - Ignore errors completely
**Key Design**: Change the `Emitter` type to switch between fail-fast runtime parsing and greedy compiler diagnostics - **same parser code, different behavior**. This makes Tokit suitable for both:
- **Runtime/REPL**: Fast feedback with `Fatal` emitter
- **Compiler/IDE**: Comprehensive diagnostics with greedy emitter (coming soon)
- **Rich Error Types** (in `error/` module)
- Token-level: `UnexpectedToken`, `MissingToken`, `UnexpectedEot`
- Syntax-level: `Unclosed`, `Unterminated`, `Malformed`, `Invalid`
- Escape sequences: `HexEscape`, `UnicodeEscape`
- All errors include span tracking
### Utilities
- **Span Tracking**
- `Span` - Lightweight span representation
- `Spanned<T>` - Wrap value with span
- `Located<T>` - Wrap value with span and source slice
- `Sliced<T>` - Wrap value with source slice
- **Parser Configuration**
- `Parser<F, L, O, Error, Context>` - Configurable parser
- `ParseContext` - Context for emitter and cache
- `Window` - Type-level peek buffer capacity for deterministic lookahead
- **Note**: Lookahead windows support 1-32 token capacity via `typenum::{U1..U32}`
## Quick Start
Here's a simple example parsing JSON tokens:
```rust
use logos::Logos;
use tokit::{Any, Parse, Token as TokenT};
#[derive(Debug, Logos, Clone)]
#[logos(skip r"[ \t\r\n\f]+")]
enum Token {
#[token("true", |_| true)]
#[token("false", |_| false)]
Bool(bool),
#[token("null")]
Null,
#[regex(r"-?(?:0|[1-9]\d*)(?:\.\d+)?", |lex| lex.slice().parse::<f64>().unwrap())]
Number(f64),
}
#[derive(Debug, Display, Clone, Copy)]
enum TokenKind {
Bool,
Null,
Number,
}
impl TokenT<'_> for Token {
type Kind = TokenKind;
type Error = ();
fn kind(&self) -> Self::Kind {
match self {
Token::Bool(_) => TokenKind::Bool,
Token::Null => TokenKind::Null,
Token::Number(_) => TokenKind::Number,
}
}
}
type MyLexer<'a> = tokit::LogosLexer<'a, Token, Token>;
fn main() {
// Parse any token and extract its value
let parser = Any::parser::<'_, MyLexer<'_>, ()>()
.map(|tok: Token| match tok {
Token::Number(n) => Some(n),
_ => None,
});
let result = parser.parse("42.5");
println!("{:?}", result); // Ok(Some(42.5))
}
```
### More Examples
Check out the examples directory:
```bash
# JSON token parsing with map combinators
cargo run --example json
# Note: The calculator examples are being updated for v0.3.0 API
```
## Architecture
Tokit's architecture follows a layered design:
1. **Lexer Layer** - Token production and source abstraction
2. **Parser Layer** - Composable parser combinators
3. **Error Layer** - Rich error types and emission strategies
4. **Utility Layer** - Spans, containers, and helpers
This separation enables:
- Use any lexer by implementing `Lexer<'inp>`
- Mix and match parser combinators
- Customize error handling per-parser or globally
- Zero-cost abstractions through compile-time configuration
## Design Philosophy
### Parse-While-Lexing: Zero-Copy Streaming
Tokit uses a **parse-while-lexing** architecture where parsers consume tokens directly from the lexer as needed, without intermediate buffering:
**Traditional Approach (Two-Phase):**
```text
Source → Lexer → [Token Buffer] → Parser
↓
Allocate Vec<Token> ← Extra allocation!
```
**Tokit Approach (Streaming):**
```text
Source → Lexer ←→ Parser
↑________↓
Zero-copy streaming, no buffer
```
**Benefits:**
- ✅ **Zero Extra Allocations**: No token buffer, tokens consumed on-demand
- ✅ **Lower Memory Footprint**: Only lookahead window buffered on stack, not entire token stream
- ✅ **Better Cache Locality**: Tokens processed immediately after lexing
- ✅ **Predictable Performance**: No large allocations, deterministic memory usage
### No Hidden Backtracking
Unlike traditional parser combinators that rely on implicit backtracking (trying alternatives until one succeeds), Tokit uses **explicit lookahead-based decisions**. This design choice provides:
- **Predictable Performance**: No hidden exponential backtracking scenarios
- **Explicit Control**: Developers decide when and where to peek ahead via `peek_then()` and `peek_then_choice()`
- **Deterministic Parsing**: LALR-style table-driven decisions using fixed-capacity lookahead windows (`Window` trait)
- **Better Error Messages**: Failed alternatives don't hide earlier, more relevant errors
```rust
// Traditional parser combinator (hidden backtracking):
// try_parser1.or(try_parser2).or(try_parser3) // May backtrack!
// Tokit approach (explicit lookahead, no backtracking):
let parser = any()
.peek_then::<_, typenum::U2>(|peeked, _| {
match peeked.get(0) {
Some(Token::If) => Ok(Action::Continue), // Deterministic decision
_ => Ok(Action::Stop),
}
});
```
### Parser Combinators + Deterministic Table Parsing
Tokit uniquely combines:
- **Parser Combinator Flexibility**: Compose small parsers into complex grammars
- **LALR-Style Determinism**: Fixed lookahead windows with deterministic decisions
- **Type-Level Capacity**: Lookahead buffer size known at compile time (`Window::CAPACITY`)
This hybrid approach gives you composable abstractions without sacrificing performance or predictability.
### Fail-Fast Runtime ↔ Greedy Compiler Diagnostics
Tokit's architecture decouples parsing logic from error handling strategy through the `Emitter` trait. This means:
**Same Parser, Different Contexts:**
- **Runtime/REPL Mode**: Use `Fatal` emitter → stop on first error for immediate feedback
- **Compiler/IDE Mode**: Use greedy emitter (planned) → collect all errors for comprehensive diagnostics
- **Testing/Fuzzing**: Use `Ignored` emitter → parse through all errors for robustness testing
**Benefits:**
- ✅ Write parsers once, deploy everywhere
- ✅ No separate "error recovery mode" - it's just a different emitter
- ✅ Custom emitters can implement domain-specific error handling
- ✅ Zero-cost abstraction - emitter behavior resolved at compile time
### Inspirations
Tokit takes inspiration from:
- [**winnow**](https://github.com/winnow-rs/winnow) - For ergonomic parser API design
- [**chumsky**](https://github.com/zesterer/chumsky) - For composable parser combinator patterns
- [**logos**](https://github.com/maciejhirsz/logos) - For high-performance lexing
- [**rowan**](https://github.com/rust-analyzer/rowan) - For lossless syntax tree representation
### Core Priorities
1. **Performance** - Parse-while-lexing (zero-copy streaming), zero-cost abstractions, no hidden allocations
2. **Predictability** - No hidden backtracking, explicit control flow, deterministic decisions
3. **Composability** - Small parsers combine into complex ones
4. **Versatility** - Same parser works for runtime (fail-fast) and compiler diagnostics (greedy) via `Emitter`
5. **Flexibility** - Work with any lexer, customize error handling, support both AST and CST
6. **Correctness** - Rich error types, span tracking, validation
## Who Uses Tokit?
- [`smear`](https://github.com/al8n/smear): Blazing fast, fully spec-compliant, reusable parser combinators for standard GraphQL and GraphQL-like DSLs
## License
`tokit` is dual-licensed under:
- MIT License ([LICENSE-MIT](LICENSE-MIT))
- Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE))
You may choose either license for your purposes.
Copyright (c) 2025 Al Liu.
[Github-url]: https://github.com/al8n/tokit/
[CI-url]: https://github.com/al8n/tokit/actions/workflows/ci.yml
[doc-url]: https://docs.rs/tokit
[crates-url]: https://crates.io/crates/tokit
[codecov-url]: https://app.codecov.io/gh/al8n/tokit/
[zh-cn-url]: https://github.com/al8n/tokit/tree/main/README-zh_CN.md