camxes-rs 0.2.0

A Parsing Expression Grammar (PEG) parser generator with enhanced error reporting and semantic actions
Documentation
# camxes-rs

[![Crates.io](https://img.shields.io/crates/v/camxes-rs)](https://crates.io/crates/camxes-rs)
[![Documentation](https://docs.rs/camxes-rs/badge.svg)](https://docs.rs/camxes-rs)
[![License](https://img.shields.io/crates/l/camxes-rs)](LICENSE)

A Parsing Expression Grammar (PEG) parser generator with enhanced error reporting and semantic actions support.

## ⚠️ Version 0.2.0 Breaking Changes

**If you're upgrading from 0.1.x**, please read the [CHANGELOG.md](CHANGELOG.md) for migration instructions. The main change is:

- `ParseResult` now has 4 fields instead of 3 (added error position tracking)
- Access parse result at index **3** instead of index **2**

## Features

- **Zero-Copy Parsing**: Efficient parsing without unnecessary string allocations
- **Enhanced Error Reporting**: Track furthest error position for better diagnostics
- **Semantic Actions**: Build typed ASTs with bottom-up reducers
- **Embedded Lojban Grammar**: Full camxes-style Lojban PEG included
- **Thread-Safe**: Designed for concurrent use
- **Rich Debugging**: Detailed logging via the `log` crate

## Installation

Add this to your `Cargo.toml`:

```toml
[dependencies]
camxes-rs = "0.2.0"
```

## Quick Start

### Basic Usage

```rust
use camxes_rs::peg::grammar::Peg;

fn main() {
    // Define your grammar
    let grammar = r#"
    expression <- term (('+' / '-') term)*
    term <- factor (('*' / '/') factor)*
    factor <- number / '(' expression ')'
    number <- [0-9]+
    "#;

    // Create parser
    let parser = Peg::new("expression", grammar).unwrap();
    
    // Parse input
    let result = parser.parse("2+3*4");
    
    // Access the result (note: index 3 in version 0.2.0)
    match result.3.as_ref() {
        Ok(nodes) => println!("Parse succeeded with {} nodes", nodes.len()),
        Err(err) => println!("Parse failed at position {}", err.position),
    }
}
```

### Using the Embedded Lojban Grammar

```rust
use camxes_rs::peg::grammar::Peg;
use camxes_rs::LOJBAN_GRAMMAR;

fn main() {
    let (start_rule, grammar_text) = LOJBAN_GRAMMAR;
    let parser = Peg::new(start_rule, grammar_text).unwrap();
    
    let result = parser.parse("mi klama le zarci");
    match result.3.as_ref() {
        Ok(nodes) => println!("Valid Lojban!"),
        Err(err) => println!("Parse error at position {}", err.position),
    }
}
```

### Semantic Actions (Building ASTs)

```rust
use camxes_rs::peg::grammar::Peg;
use camxes_rs::peg::{parse_with_semantics, ReducerTable, SemanticNode};

fn main() {
    let grammar = r#"number <- [0-9]+"#;
    let parser = Peg::new("number", grammar).unwrap();
    
    // Define reducers to build typed values
    let mut reducers = ReducerTable::new();
    reducers.insert("number", |input, span, _children| {
        let text = &input[span.0..span.1];
        let value: i32 = text.parse().unwrap();
        SemanticNode::Int(value)
    });
    
    let result = parse_with_semantics(&parser, "42", &reducers).unwrap();
    println!("Parsed value: {:?}", result);
}
```

## Grammar Syntax

The parser supports standard PEG operators:

| Operator | Description | Example |
|----------|-------------|---------|
| `<-`     | Definition | `rule <- expression` |
| `/`      | Ordered choice | `a / b` |
| `*`      | Zero or more | `[0-9]*` |
| `+`      | One or more | `[a-z]+` |
| `?`      | Optional | `[A-Z]?` |
| `&`      | And-predicate | `&[a-z]` |
| `!`      | Not-predicate | `![0-9]` |
| `()`     | Grouping | `(a / b)` |
| `[]`     | Character class | `[a-zA-Z0-9]` |
| `.`      | Any character | `.` |

## API Reference

### ParseResult Structure (v0.2.0)

```rust
pub struct ParseResult(
    pub u32,                                      // cost
    pub usize,                                    // consumed position
    pub usize,                                    // error position (furthest failure)
    pub Arc<Result<Vec<ParseNode>, ParseError>>, // parse result
);
```

### ParseNode

```rust
pub enum ParseNode {
    Terminal { span: Span },
    NonTerminal {
        name: String,
        span: Span,
        children: Vec<ParseNode>,
    },
}

pub struct Span(pub usize, pub usize);  // (start, end)
```

### Key Functions

- `Peg::new(start_rule, grammar)` - Create a parser from grammar text
- `parser.parse(input)` - Parse input string
- `parse_with_semantics(parser, input, reducers)` - Parse and build AST

## Debugging

Enable debug logging to see detailed parsing information:

```bash
RUST_LOG=camxes_rs=debug cargo run
```

Or in code:

```rust
env_logger::builder()
    .filter_level(log::LevelFilter::Debug)
    .init();
```

## Multi-threaded Usage

For web servers or multi-threaded applications, create one `Peg` instance per thread:

```rust
use std::collections::HashMap;
use std::sync::Arc;
use camxes_rs::peg::grammar::Peg;
use camxes_rs::LOJBAN_GRAMMAR;

// In your server initialization
let grammar_texts: Arc<HashMap<i32, String>> = Arc::new({
    let mut map = HashMap::new();
    map.insert(1, LOJBAN_GRAMMAR.1.to_string());
    map
});

// In each worker thread
let mut parsers = HashMap::new();
for (lang_id, grammar_text) in grammar_texts.iter() {
    match Peg::new("text", grammar_text) {
        Ok(parser) => {
            parsers.insert(*lang_id, parser);
        }
        Err(e) => {
            log::error!("Failed to initialize parser: {}", e);
        }
    }
}
```

## Migration from 0.1.x

See [CHANGELOG.md](CHANGELOG.md) for detailed migration instructions.

**Quick summary:**
- Change `result.2``result.3` to access parse result
- Update tuple destructuring: `ParseResult(cost, pos, result)``ParseResult(cost, pos, error_pos, result)`

## License

MIT

## Contributing

Contributions are welcome! This crate is part of the [tersmu](https://github.com/lojban/tersmu) project.

## Links

- [Documentation]https://docs.rs/camxes-rs
- [Crates.io]https://crates.io/crates/camxes-rs
- [Repository]https://github.com/lojban/tersmu
- [Issue Tracker]https://github.com/lojban/tersmu/issues