chasa 0.3.0

A parser combinator focused on rollback/commit, streaming inputs, and composable method chains.
Documentation

chasa

Parser combinators with explicit rollback control (cut) and streaming-friendly inputs.

This crate is a small parser-combinator core used in the YuLang workspace. The design goal is to keep backtracking behavior predictable:

  • cut: a branch-pruning signal (and root-level commit trigger for streaming inputs)
  • rollback: combinator-driven (not "failure implies rollback")
  • errors: accumulated, and rolled back together with input when backtracking

TL;DR

use chasa::prelude::*;

// Parse "let x" and extract the identifier
let mut input = "let x";
let name = parse_ok_once(&mut input, tag("let").right(ws1).right(any)).unwrap();
assert_eq!(name, 'x');

Key combinators:

  • tag("...") – match an exact string
  • ws1 / ws – match whitespace (one-or-more / zero-or-more)
  • any – match any single character
  • right(q) – parse both, return right result
  • many() – repeat zero or more times
  • sep(s) – parse separated list

Most combinators are available as methods (via ParserOnce / ParserMut / Parser), and the prelude imports those traits so you can write p.right(q) / p.many() / p.sep(...).

What makes chasa different?

1) Explicit cut for branch pruning and error recovery

cut is a control signal that prevents backtracking across a commit point. This is useful for:

Error recovery: Once you've seen a keyword, the rest of the syntax becomes mandatory.

use chasa::prelude::*;

let mut input = "let x";
// After seeing "let", we MUST see an identifier (no backtracking to other branches)
let ident = one_of(ASCII_ALPHA).many1::<String>();
let var_decl = tag("let").cut().right(ws1).right(ident);
let result = parse_ok_once(&mut input, var_decl).unwrap();
assert_eq!(result, "x");

Better error messages: Instead of "expected A or B or C", you get "expected identifier after 'let'".

2) Streaming inputs for large files

StreamInput allows parsing large files or network streams without loading everything into memory.

use chasa::prelude::*;
use std::fs::File;
use std::io::{BufReader, Read};

// Parse a large file lazily (only buffering what's needed)
let file = File::open("large.txt").unwrap();
let mut input = StreamInput::new(BufReader::new(file).bytes().map(|b| b.unwrap() as char));

// After `cut()` commits, already-parsed data is dropped from the buffer
let my_parser = any.many::<String>();
let result = parse_ok_once(&mut input, my_parser).unwrap();

3) Combinator-driven rollback (not automatic)

In many libraries, "failure" automatically implies rollback. In chasa, rollback is a semantic choice made by each combinator:

  • maybe(p) / p.or_not() – roll back on soft failure
  • lookahead(p) – always roll back (peek without consuming)
  • many(p) – roll back only the final terminating attempt

This keeps the control surface small and predictable: you can usually tell what rolls back by looking at the combinator you used.

Showcase

This README includes two example styles:

  • Combinator style: chain methods to build parsers declaratively
  • Imperative style: use In methods (run, choice) for explicit control flow

Example 1: S-expressions (combinator style)

This example uses normal Rust functions as parsers. Functions automatically implement ParserOnce / ParserMut / Parser in this crate, so recursion stays ergonomic.

use chasa::prelude::*;

#[derive(Debug, PartialEq, Eq)]
enum SExp {
    Atom(String),
    List(Vec<SExp>),
}

fn sexp(mut i: In<&str>) -> Option<SExp> {
    let atom_char = none_of(SPACE.and("()"));
    let atom = atom_char.many1().map(SExp::Atom);
    let list = sexp.sep(ws1).map(SExp::List).between(ws, ws).between(item('('), item(')'));
    i.choice((atom, list))
}

assert_eq!(
    sexp.test_ok("(a (b c) d)"),
    Ok(SExp::List(vec![
        SExp::Atom("a".into()),
        SExp::List(vec![SExp::Atom("b".into()), SExp::Atom("c".into())]),
        SExp::Atom("d".into()),
    ]))
);

Note: In is the input wrapper that bundles the underlying input, error accumulator, and cut flag. Parsers receive In and return their output.

Example 2: key = value (imperative style)

This example uses In methods (run, choice) for more explicit control flow. This style is useful when combinator chains become too long or when you need conditional branching.

use chasa::prelude::*;

#[derive(Debug, PartialEq, Eq)]
enum Value {
    Bool(bool),
    Number(i64),
    Str(String),
}

fn kv(mut i: In<&str>) -> Option<(String, Value)> {
    let ident = one_of(ASCII_ALPHA.and("_")).bind(|h| {
        one_of(ASCII_ALPHANUM.and("_"))
            .many_map(move |it| std::iter::once(h).chain(it).collect::<String>())
    });

    let eq = item('=').between(ws, ws);

    let digit = one_of(ASCII_DIGIT);
    let number = item('-').or_not().then(digit.many1::<String>()).map(|(sign, s)| {
        let n = s.parse::<i64>().unwrap();
        Value::Number(if sign.is_some() { -n } else { n })
    });

    let str_body = none_of("\"\\").many::<String>();
    let string = str_body.between(item('\"'), item('\"')).map(Value::Str);

    let key = i.run(ident)?;
    i.run(eq)?;
    let value = i.choice((
        tag("true").to(Value::Bool(true)),
        tag("false").to(Value::Bool(false)),
        number,
        string,
    ))?;
    Some((key, value))
}

assert_eq!(kv.test_ok("port = 8080"), Ok(("port".into(), Value::Number(8080))));
assert_eq!(kv.test_ok("name = \"alice\""), Ok(("name".into(), Value::Str("alice".into()))));

Quick API tour

Entry points:

  • parse_ok_once(&mut input, parser) – run a ParserOnce and return Result<T, Error>
  • parse_ok_mut(&mut input, &mut parser) – run a ParserMut by mutable reference
  • parse_ok(&mut input, &parser) – run a Parser by shared reference
  • parser.test_ok(input) – ergonomic helper for quick experiments (input doesn't need to be &mut)

Building blocks:

  • Items: any, item(c), one_of("abc"), none_of("xyz")
  • Tags: tag("keyword")
  • Whitespace: ws, ws1
  • Sequencing: then, right, left, between
  • Choice: or, choice
  • Repetition: many, many1, many_map
  • Separated lists: sep, sep1, sep_map, sep_reduce
  • Lookahead: lookahead, not
  • Control: cut, maybe, label

Quickstart

1) Match a fixed string

use chasa::prelude::*;

let mut input = "let x";
// tag("let"): match "let"
// right(ws1): match whitespace and discard it
// right(any): match any character and return it
let name = parse_ok_once(&mut input, tag("let").right(ws1).right(any)).unwrap();
assert_eq!(name, 'x');
assert_eq!(input, "");

2) Repeat and collect

many() collects Option<T> outputs into any O: FromIterator<T>.

use chasa::prelude::*;

let mut input = "aaab";
let out: String = parse_ok_once(&mut input, item('a').many()).unwrap();
assert_eq!(out, "aaa");
assert_eq!(input, "b"); // 'b' remains (terminating attempt is rolled back)

Important: The terminating attempt (the final item('a') that fails on 'b') is rolled back, so 'b' remains in the input.

3) Separated lists

use chasa::prelude::*;

let mut input = "a,a,";
let comma = item(',').to(());
let out: String = parse_ok_once(&mut input, item('a').sep(comma)).unwrap();
assert_eq!(out, "aa");
assert_eq!(input, ""); // trailing comma is consumed

Note: Trailing separators are allowed by default (matching common formats like JSON arrays, Rust struct literals). Use .no_trail() to forbid them:

use chasa::prelude::*;

let mut input = "a,a"; // no trailing comma
let comma = item(',').to(());
let out: String = parse_ok_once(&mut input, item('a').sep(comma).no_trail()).unwrap();
assert_eq!(out, "aa");

4) Try-with-rollback (maybe)

maybe(p) runs p and rolls back on soft failure (failure without cut).

use chasa::prelude::*;

let mut input = "b";
let out = parse_ok_once(&mut input, maybe(item('a'))).unwrap();
assert_eq!(out, None);
assert_eq!(input, "b"); // rolled back because 'a' didn't match

5) Commit with cut

use chasa::prelude::*;

let mut input = "let 123"; // invalid: expected identifier after "let"
let var_decl = tag("let").cut().right(ws1).right(one_of(ASCII_ALPHA).many1::<String>());
let err = parse_ok_once(&mut input, var_decl).unwrap_err();

// Error message will report failure at position after "let ",
// not at the beginning (because `cut` prevented backtracking)

What is a parser combinator?

A parser combinator is a function (or value) that consumes some input and returns either:

  • a value (success), or
  • a failure (and sometimes an error).

The "combinator" part is that you build bigger parsers by composing smaller ones: sequencing (then / left / right), repetition (many), separation (sep), choice (or), and lookahead (lookahead / not).

In chasa, parsers are plain values implementing traits such as ParserOnce. They run on an In wrapper, which bundles:

  • the underlying input
  • an error accumulator (Merger)
  • the current cut flag

Design details

cut is a branch-pruning signal (and root-level commit trigger)

cut is not "input was consumed". It is "do not backtrack across this point".

Additionally, when called in a root cut scope, it triggers Input::commit, allowing streaming inputs to drop already-accepted prefixes.

use chasa::prelude::*;

// Stream the input lazily (no full buffer needed)
let mut input = StreamInput::new("abc".chars());

// Root cut commits the accepted prefix (here: after matching 'a')
let p = item('a').cut().right(item('b'));
let out = parse_ok_once(&mut input, p).unwrap();
assert_eq!(out, 'b');
assert_eq!(input.committed_index(), 1); // 'a' has been dropped from the buffer

Errors are accumulated, and can be rolled back too

chasa uses Merger to keep the "best" error span (lexicographic (start, end)), storing all errors that occurred at that span.

Backtracking combinators roll back the error accumulator together with the input, so speculative branches don't pollute successful parses.

Example: If choice((p, q, r)) tries p and it fails softly, both the input position and any errors from p are rolled back before trying q.

Where to look next

  • prelude: start here for imports
  • parser: combinators and traits
  • input: input abstractions and streaming inputs
  • parse: helpers like parse_ok_once / test_ok