# chasa
Parser combinators with explicit rollback control (`cut`) and streaming-friendly inputs.
This crate is a small parser-combinator core used in the YuLang workspace. The design goal is to keep
backtracking behavior predictable:
- `cut`: a branch-pruning signal (and root-level commit trigger for streaming inputs)
- rollback: combinator-driven (not "failure implies rollback")
- errors: accumulated, and rolled back together with input when backtracking
## TL;DR
```rust
use chasa::prelude::*;
// Parse "let x" and extract the identifier
let mut input = "let x";
let name = parse_ok_once(&mut input, tag("let").right(ws1).right(any)).unwrap();
assert_eq!(name, 'x');
```
**Key combinators**:
- `tag("...")` – match an exact string
- `ws1` / `ws` – match whitespace (one-or-more / zero-or-more)
- `any` – match any single character
- `right(q)` – parse both, return right result
- `many()` – repeat zero or more times
- `sep(s)` – parse separated list
Most combinators are available as methods (via `ParserOnce` / `ParserMut` / `Parser`),
and the `prelude` imports those traits so you can write `p.right(q)` / `p.many()` / `p.sep(...)`.
## What makes chasa different?
### 1) Explicit `cut` for branch pruning and error recovery
`cut` is a control signal that prevents backtracking across a commit point. This is useful for:
**Error recovery**: Once you've seen a keyword, the rest of the syntax becomes mandatory.
```rust
use chasa::prelude::*;
let mut input = "let x";
// After seeing "let", we MUST see an identifier (no backtracking to other branches)
let ident = one_of(ASCII_ALPHA).many1::<String>();
let var_decl = tag("let").cut().right(ws1).right(ident);
let result = parse_ok_once(&mut input, var_decl).unwrap();
assert_eq!(result, "x");
```
**Better error messages**: Instead of "expected A or B or C", you get "expected identifier after 'let'".
### 2) Streaming inputs for large files
`StreamInput` allows parsing large files or network streams without loading everything into memory.
```rust,no_run
use chasa::prelude::*;
use std::fs::File;
use std::io::{BufReader, Read};
// Parse a large file lazily (only buffering what's needed)
let file = File::open("large.txt").unwrap();
// After `cut()` commits, already-parsed data is dropped from the buffer
let my_parser = any.many::<String>();
let result = parse_ok_once(&mut input, my_parser).unwrap();
```
### 3) Combinator-driven rollback (not automatic)
In many libraries, "failure" automatically implies rollback. In `chasa`, rollback is a *semantic choice* made by each combinator:
- `maybe(p)` / `p.or_not()` – roll back on soft failure
- `lookahead(p)` – always roll back (peek without consuming)
- `many(p)` – roll back only the *final terminating attempt*
This keeps the control surface small and predictable: you can usually tell what rolls back by looking at the combinator you used.
## Showcase
This README includes two example styles:
- **Combinator style**: chain methods to build parsers declaratively
- **Imperative style**: use `In` methods (`run`, `choice`) for explicit control flow
### Example 1: S-expressions (combinator style)
This example uses **normal Rust functions** as parsers. Functions automatically implement `ParserOnce` / `ParserMut` / `Parser` in this crate, so recursion stays ergonomic.
```rust
use chasa::prelude::*;
#[derive(Debug, PartialEq, Eq)]
enum SExp {
Atom(String),
List(Vec<SExp>),
}
fn sexp(mut i: In<&str>) -> Option<SExp> {
let atom_char = none_of(SPACE.and("()"));
let atom = atom_char.many1().map(SExp::Atom);
let list = sexp.sep(ws1).map(SExp::List).between(ws, ws).between(item('('), item(')'));
i.choice((atom, list))
}
assert_eq!(
sexp.test_ok("(a (b c) d)"),
Ok(SExp::List(vec![
SExp::Atom("a".into()),
SExp::List(vec![SExp::Atom("b".into()), SExp::Atom("c".into())]),
SExp::Atom("d".into()),
]))
);
```
**Note**: `In` is the input wrapper that bundles the underlying input, error accumulator, and `cut` flag. Parsers receive `In` and return their output.
### Example 2: `key = value` (imperative style)
This example uses `In` methods (`run`, `choice`) for more explicit control flow. This style is useful when combinator chains become too long or when you need conditional branching.
```rust
use chasa::prelude::*;
#[derive(Debug, PartialEq, Eq)]
enum Value {
Bool(bool),
Number(i64),
Str(String),
}
fn kv(mut i: In<&str>) -> Option<(String, Value)> {
let ident = one_of(ASCII_ALPHA.and("_")).bind(|h| {
one_of(ASCII_ALPHANUM.and("_"))
.many_map(move |it| std::iter::once(h).chain(it).collect::<String>())
});
let eq = item('=').between(ws, ws);
let digit = one_of(ASCII_DIGIT);
let number = item('-').or_not().then(digit.many1::<String>()).map(|(sign, s)| {
let n = s.parse::<i64>().unwrap();
Value::Number(if sign.is_some() { -n } else { n })
});
let str_body = none_of("\"\\").many::<String>();
let string = str_body.between(item('\"'), item('\"')).map(Value::Str);
let key = i.run(ident)?;
i.run(eq)?;
let value = i.choice((
tag("true").to(Value::Bool(true)),
tag("false").to(Value::Bool(false)),
number,
string,
))?;
Some((key, value))
}
assert_eq!(kv.test_ok("port = 8080"), Ok(("port".into(), Value::Number(8080))));
assert_eq!(kv.test_ok("name = \"alice\""), Ok(("name".into(), Value::Str("alice".into()))));
```
## Quick API tour
**Entry points**:
- `parse_ok_once(&mut input, parser)` – run a `ParserOnce` and return `Result<T, Error>`
- `parse_ok_mut(&mut input, &mut parser)` – run a `ParserMut` by mutable reference
- `parse_ok(&mut input, &parser)` – run a `Parser` by shared reference
- `parser.test_ok(input)` – ergonomic helper for quick experiments (input doesn't need to be `&mut`)
**Building blocks**:
- Items: `any`, `item(c)`, `one_of("abc")`, `none_of("xyz")`
- Tags: `tag("keyword")`
- Whitespace: `ws`, `ws1`
- Sequencing: `then`, `right`, `left`, `between`
- Choice: `or`, `choice`
- Repetition: `many`, `many1`, `many_map`
- Separated lists: `sep`, `sep1`, `sep_map`, `sep_reduce`
- Lookahead: `lookahead`, `not`
- Control: `cut`, `maybe`, `label`
## Quickstart
### 1) Match a fixed string
```rust
use chasa::prelude::*;
let mut input = "let x";
// tag("let"): match "let"
// right(ws1): match whitespace and discard it
// right(any): match any character and return it
let name = parse_ok_once(&mut input, tag("let").right(ws1).right(any)).unwrap();
assert_eq!(name, 'x');
assert_eq!(input, "");
```
### 2) Repeat and collect
`many()` collects `Option<T>` outputs into any `O: FromIterator<T>`.
```rust
use chasa::prelude::*;
let mut input = "aaab";
let out: String = parse_ok_once(&mut input, item('a').many()).unwrap();
assert_eq!(out, "aaa");
assert_eq!(input, "b"); // 'b' remains (terminating attempt is rolled back)
```
**Important**: The terminating attempt (the final `item('a')` that fails on `'b'`) is rolled back, so `'b'` remains in the input.
### 3) Separated lists
```rust
use chasa::prelude::*;
let mut input = "a,a,";
let comma = item(',').to(());
let out: String = parse_ok_once(&mut input, item('a').sep(comma)).unwrap();
assert_eq!(out, "aa");
assert_eq!(input, ""); // trailing comma is consumed
```
**Note**: Trailing separators are allowed by default (matching common formats like JSON arrays, Rust struct literals). Use `.no_trail()` to forbid them:
```rust
use chasa::prelude::*;
let mut input = "a,a"; // no trailing comma
let comma = item(',').to(());
let out: String = parse_ok_once(&mut input, item('a').sep(comma).no_trail()).unwrap();
assert_eq!(out, "aa");
```
### 4) Try-with-rollback (`maybe`)
`maybe(p)` runs `p` and rolls back on soft failure (failure without `cut`).
```rust
use chasa::prelude::*;
let mut input = "b";
let out = parse_ok_once(&mut input, maybe(item('a'))).unwrap();
assert_eq!(out, None);
assert_eq!(input, "b"); // rolled back because 'a' didn't match
```
### 5) Commit with `cut`
```rust
use chasa::prelude::*;
let mut input = "let 123"; // invalid: expected identifier after "let"
let var_decl = tag("let").cut().right(ws1).right(one_of(ASCII_ALPHA).many1::<String>());
let err = parse_ok_once(&mut input, var_decl).unwrap_err();
// Error message will report failure at position after "let ",
// not at the beginning (because `cut` prevented backtracking)
```
## What is a parser combinator?
A **parser combinator** is a function (or value) that consumes some input and returns either:
- a value (success), or
- a failure (and sometimes an error).
The "combinator" part is that you build bigger parsers by composing smaller ones:
sequencing (`then` / `left` / `right`), repetition (`many`), separation (`sep`), choice (`or`),
and lookahead (`lookahead` / `not`).
In `chasa`, parsers are plain values implementing traits such as `ParserOnce`.
They run on an `In` wrapper, which bundles:
- the underlying input
- an error accumulator (`Merger`)
- the current `cut` flag
## Design details
### `cut` is a branch-pruning signal (and root-level commit trigger)
`cut` is not "input was consumed". It is "do not backtrack across this point".
Additionally, when called in a **root** cut scope, it triggers `Input::commit`,
allowing streaming inputs to drop already-accepted prefixes.
```rust
use chasa::prelude::*;
// Stream the input lazily (no full buffer needed)
let mut input = StreamInput::new("abc".chars());
// Root cut commits the accepted prefix (here: after matching 'a')
let p = item('a').cut().right(item('b'));
let out = parse_ok_once(&mut input, p).unwrap();
assert_eq!(out, 'b');
assert_eq!(input.committed_index(), 1); // 'a' has been dropped from the buffer
```
### Errors are accumulated, and can be rolled back too
`chasa` uses `Merger` to keep the "best" error span (lexicographic `(start, end)`),
storing all errors that occurred at that span.
Backtracking combinators roll back the error accumulator together with the input, so speculative
branches don't pollute successful parses.
**Example**: If `choice((p, q, r))` tries `p` and it fails softly, both the input position **and** any errors from `p` are rolled back before trying `q`.
## Where to look next
- `prelude`: start here for imports
- `parser`: combinators and traits
- `input`: input abstractions and streaming inputs
- `parse`: helpers like `parse_ok_once` / `test_ok`