chasa 0.2.0

A parser combinator focused on rollback/commit, streaming inputs, and composable method chains.
Documentation
# chasa


Parser combinators with explicit rollback control (`cut`) and streaming-friendly inputs.

This crate is a small parser-combinator core used in the YuLang workspace. The design goal is to keep
backtracking behavior predictable:

- `cut`: a branch-pruning signal (and root-level commit trigger for streaming inputs)
- rollback: combinator-driven (not "failure implies rollback")
- errors: accumulated, and rolled back together with input when backtracking

## TL;DR


```rust
use chasa::prelude::*;

// Parse "let x" and extract the identifier
let mut input = "let x";
let name = parse_ok_once(&mut input, tag("let").right(ws1).right(any)).unwrap();
assert_eq!(name, 'x');
```

**Key combinators**:
- `tag("...")` – match an exact string
- `ws1` / `ws` – match whitespace (one-or-more / zero-or-more)
- `any` – match any single character
- `right(q)` – parse both, return right result
- `many()` – repeat zero or more times
- `sep(s)` – parse separated list

Most combinators are available as methods (via `ParserOnce` / `ParserMut` / `Parser`),
and the `prelude` imports those traits so you can write `p.right(q)` / `p.many()` / `p.sep(...)`.

## What makes chasa different?


### 1) Explicit `cut` for branch pruning and error recovery


`cut` is a control signal that prevents backtracking across a commit point. This is useful for:

**Error recovery**: Once you've seen a keyword, the rest of the syntax becomes mandatory.

```rust
use chasa::prelude::*;

let mut input = "let x";
// After seeing "let", we MUST see an identifier (no backtracking to other branches)
let ident = one_of(ASCII_ALPHA).many1::<String>();
let var_decl = tag("let").cut().right(ws1).right(ident);
let result = parse_ok_once(&mut input, var_decl).unwrap();
assert_eq!(result, "x");
```

**Better error messages**: Instead of "expected A or B or C", you get "expected identifier after 'let'".

### 2) Streaming inputs for large files


`StreamInput` allows parsing large files or network streams without loading everything into memory.

```rust,no_run
use chasa::prelude::*;
use std::fs::File;
use std::io::{BufReader, Read};

// Parse a large file lazily (only buffering what's needed)
let file = File::open("large.txt").unwrap();
let mut input = StreamInput::new(BufReader::new(file).bytes().map(|b| b.unwrap() as char));

// After `cut()` commits, already-parsed data is dropped from the buffer
let my_parser = any.many::<String>();
let result = parse_ok_once(&mut input, my_parser).unwrap();
```

### 3) Combinator-driven rollback (not automatic)


In many libraries, "failure" automatically implies rollback. In `chasa`, rollback is a *semantic choice* made by each combinator:

- `maybe(p)` / `p.or_not()` – roll back on soft failure
- `lookahead(p)` – always roll back (peek without consuming)
- `many(p)` – roll back only the *final terminating attempt*

This keeps the control surface small and predictable: you can usually tell what rolls back by looking at the combinator you used.

## Showcase


This README includes two example styles:
- **Combinator style**: chain methods to build parsers declaratively
- **Imperative style**: use `In` methods (`run`, `choice`) for explicit control flow

### Example 1: S-expressions (combinator style)


This example uses **normal Rust functions** as parsers. Functions automatically implement `ParserOnce` / `ParserMut` / `Parser` in this crate, so recursion stays ergonomic.

```rust
use chasa::prelude::*;

#[derive(Debug, PartialEq, Eq)]

enum SExp {
    Atom(String),
    List(Vec<SExp>),
}

fn sexp(mut i: In<&str>) -> Option<SExp> {
    let atom_char = none_of(SPACE.and("()"));
    let atom = atom_char.many1().map(SExp::Atom);
    let list = sexp.sep(ws1).map(SExp::List).between(ws, ws).between(item('('), item(')'));
    i.choice((atom, list))
}

assert_eq!(
    sexp.test_ok("(a (b c) d)"),
    Ok(SExp::List(vec![
        SExp::Atom("a".into()),
        SExp::List(vec![SExp::Atom("b".into()), SExp::Atom("c".into())]),
        SExp::Atom("d".into()),
    ]))
);
```

**Note**: `In` is the input wrapper that bundles the underlying input, error accumulator, and `cut` flag. Parsers receive `In` and return their output.

### Example 2: `key = value` (imperative style)


This example uses `In` methods (`run`, `choice`) for more explicit control flow. This style is useful when combinator chains become too long or when you need conditional branching.

```rust
use chasa::prelude::*;

#[derive(Debug, PartialEq, Eq)]

enum Value {
    Bool(bool),
    Number(i64),
    Str(String),
}

fn kv(mut i: In<&str>) -> Option<(String, Value)> {
    let ident = one_of(ASCII_ALPHA.and("_")).bind(|h| {
        one_of(ASCII_ALPHANUM.and("_"))
            .many_map(move |it| std::iter::once(h).chain(it).collect::<String>())
    });

    let eq = item('=').between(ws, ws);

    let digit = one_of(ASCII_DIGIT);
    let number = item('-').or_not().then(digit.many1::<String>()).map(|(sign, s)| {
        let n = s.parse::<i64>().unwrap();
        Value::Number(if sign.is_some() { -n } else { n })
    });

    let str_body = none_of("\"\\").many::<String>();
    let string = str_body.between(item('\"'), item('\"')).map(Value::Str);

    let key = i.run(ident)?;
    i.run(eq)?;
    let value = i.choice((
        tag("true").to(Value::Bool(true)),
        tag("false").to(Value::Bool(false)),
        number,
        string,
    ))?;
    Some((key, value))
}

assert_eq!(kv.test_ok("port = 8080"), Ok(("port".into(), Value::Number(8080))));
assert_eq!(kv.test_ok("name = \"alice\""), Ok(("name".into(), Value::Str("alice".into()))));
```

## Quick API tour


**Entry points**:
- `parse_ok_once(&mut input, parser)` – run a `ParserOnce` and return `Result<T, Error>`
- `parse_ok_mut(&mut input, &mut parser)` – run a `ParserMut` by mutable reference
- `parse_ok(&mut input, &parser)` – run a `Parser` by shared reference
- `parser.test_ok(input)` – ergonomic helper for quick experiments (input doesn't need to be `&mut`)

**Building blocks**:
- Items: `any`, `item(c)`, `one_of("abc")`, `none_of("xyz")`
- Tags: `tag("keyword")`
- Whitespace: `ws`, `ws1`
- Sequencing: `then`, `right`, `left`, `between`
- Choice: `or`, `choice`
- Repetition: `many`, `many1`, `many_map`
- Separated lists: `sep`, `sep1`, `sep_map`, `sep_reduce`
- Lookahead: `lookahead`, `not`
- Control: `cut`, `maybe`, `label`

## Quickstart


### 1) Match a fixed string


```rust
use chasa::prelude::*;

let mut input = "let x";
// tag("let"): match "let"
// right(ws1): match whitespace and discard it
// right(any): match any character and return it
let name = parse_ok_once(&mut input, tag("let").right(ws1).right(any)).unwrap();
assert_eq!(name, 'x');
assert_eq!(input, "");
```

### 2) Repeat and collect


`many()` collects `Option<T>` outputs into any `O: FromIterator<T>`.

```rust
use chasa::prelude::*;

let mut input = "aaab";
let out: String = parse_ok_once(&mut input, item('a').many()).unwrap();
assert_eq!(out, "aaa");
assert_eq!(input, "b"); // 'b' remains (terminating attempt is rolled back)
```

**Important**: The terminating attempt (the final `item('a')` that fails on `'b'`) is rolled back, so `'b'` remains in the input.

### 3) Separated lists


```rust
use chasa::prelude::*;

let mut input = "a,a,";
let comma = item(',').to(());
let out: String = parse_ok_once(&mut input, item('a').sep(comma)).unwrap();
assert_eq!(out, "aa");
assert_eq!(input, ""); // trailing comma is consumed
```

**Note**: Trailing separators are allowed by default (matching common formats like JSON arrays, Rust struct literals). Use `.no_trail()` to forbid them:

```rust
use chasa::prelude::*;

let mut input = "a,a"; // no trailing comma
let comma = item(',').to(());
let out: String = parse_ok_once(&mut input, item('a').sep(comma).no_trail()).unwrap();
assert_eq!(out, "aa");
```

### 4) Try-with-rollback (`maybe`)


`maybe(p)` runs `p` and rolls back on soft failure (failure without `cut`).

```rust
use chasa::prelude::*;

let mut input = "b";
let out = parse_ok_once(&mut input, maybe(item('a'))).unwrap();
assert_eq!(out, None);
assert_eq!(input, "b"); // rolled back because 'a' didn't match
```

### 5) Commit with `cut`


```rust
use chasa::prelude::*;

let mut input = "let 123"; // invalid: expected identifier after "let"
let var_decl = tag("let").cut().right(ws1).right(one_of(ASCII_ALPHA).many1::<String>());
let err = parse_ok_once(&mut input, var_decl).unwrap_err();

// Error message will report failure at position after "let ",
// not at the beginning (because `cut` prevented backtracking)
```

## What is a parser combinator?


A **parser combinator** is a function (or value) that consumes some input and returns either:
- a value (success), or
- a failure (and sometimes an error).

The "combinator" part is that you build bigger parsers by composing smaller ones:
sequencing (`then` / `left` / `right`), repetition (`many`), separation (`sep`), choice (`or`),
and lookahead (`lookahead` / `not`).

In `chasa`, parsers are plain values implementing traits such as `ParserOnce`.
They run on an `In` wrapper, which bundles:
- the underlying input
- an error accumulator (`Merger`)
- the current `cut` flag

## Design details


### `cut` is a branch-pruning signal (and root-level commit trigger)


`cut` is not "input was consumed". It is "do not backtrack across this point".

Additionally, when called in a **root** cut scope, it triggers `Input::commit`,
allowing streaming inputs to drop already-accepted prefixes.

```rust
use chasa::prelude::*;

// Stream the input lazily (no full buffer needed)
let mut input = StreamInput::new("abc".chars());

// Root cut commits the accepted prefix (here: after matching 'a')
let p = item('a').cut().right(item('b'));
let out = parse_ok_once(&mut input, p).unwrap();
assert_eq!(out, 'b');
assert_eq!(input.committed_index(), 1); // 'a' has been dropped from the buffer
```

### Errors are accumulated, and can be rolled back too


`chasa` uses `Merger` to keep the "best" error span (lexicographic `(start, end)`),
storing all errors that occurred at that span.

Backtracking combinators roll back the error accumulator together with the input, so speculative
branches don't pollute successful parses.

**Example**: If `choice((p, q, r))` tries `p` and it fails softly, both the input position **and** any errors from `p` are rolled back before trying `q`.

## Where to look next


- `prelude`: start here for imports
- `parser`: combinators and traits
- `input`: input abstractions and streaming inputs
- `parse`: helpers like `parse_ok_once` / `test_ok`