camxes-rs 1.1.1

Lojban PEG parser with semantic analysis - integrated camxes parser and tersmu semantic engine
Documentation

camxes-rs

A comprehensive Lojban parser combining fast PEG parsing with semantic analysis capabilities.

camxes-rs provides both low-level parsing (via the integrated camxes PEG parser) and high-level semantic analysis (via the tersmu semantic engine). Use it as a standalone parser library or as a complete semantic analyzer.

Features

  • Fast PEG Parser: Zero-copy parsing with span-based tokens
  • Semantic Analysis: Converts Lojban to logical forms and canonical representations
  • Egglog Equality Saturation: Optional egglog feature flag adds e-graph analysis with rewrite rules for logical normalisation and Lojban-specific canonicalisation
  • Prolog Export: Generates SWI-Prolog source code (facts, rules, and queries) from Lojban
  • Rich Error Diagnostics: Position tracking and detailed error messages
  • WebAssembly Support: Runs in browsers via WASM
  • Thread-Safe: Create parser instances per thread for concurrent usage
  • Comprehensive Testing: Validated against extensive golden examples

Installation

Add to your Cargo.toml:

[dependencies]
camxes-rs = "1.0"

Quick Start

As a PEG Parser

Use the integrated camxes module for fast, low-level parsing:

use camxes_rs::camxes::peg::grammar::Peg;
use camxes_rs::camxes::LOJBAN_GRAMMAR;

fn main() {
    // Create parser from embedded Lojban grammar
    let (start_rule, grammar_text) = LOJBAN_GRAMMAR;
    let parser = Peg::new(start_rule, grammar_text).expect("Failed to build parser");
    
    // Parse Lojban text
    let input = "mi klama le zarci";
    let result = parser.parse(input);
    
    // result is ParseResult(cost, consumed_pos, error_pos, Result<Vec<ParseNode>, ParseError>)
    match result.3.as_ref() {
        Ok(nodes) => {
            println!("Parse succeeded!");
            for node in nodes {
                println!("{:?}", node);
            }
        }
        Err(err) => {
            println!("Parse failed at position {}: {:?}", err.position, err);
        }
    }
}

With Semantic Analysis

Use the high-level API for logical forms and canonical output:

use camxes_rs::parse_lojban::parse_text;

fn main() {
    env_logger::init();  // Optional: enable logging
    
    let result = parse_text("mi klama le zarci");
    match result {
        Ok((logical, canonical, graph)) => {
            println!("Logical form: {}", logical);
            println!("Canonical: {}", canonical);
            // graph contains semantic graph structure
        }
        Err(e) => eprintln!("Parse error: {}", e),
    }
}

Command-Line Tool

The crate includes a camxes binary for command-line usage:

# Install
cargo install camxes-rs

# Parse a file (one sentence per line)
camxes -L input.jbo

# Parse from stdin
echo "mi klama le zarci" | camxes -L

# Output JSON
echo "mi klama le zarci" | camxes --json -

# Logical form only
echo "mi klama le zarci" | camxes -l -L

# Canonical Lojban only
echo "mi klama le zarci" | camxes -j -L

# Prolog (SWI-Prolog) source output
echo "mi klama le zarci" | camxes -P -L

Logging

Enable debug output with the RUST_LOG environment variable:

# All debug logs
RUST_LOG=debug camxes -L input.jbo

# Only camxes-rs logs
RUST_LOG=camxes_rs=debug camxes -L input.jbo

# Specific module logs
RUST_LOG=camxes_rs::morphology=debug,camxes_rs::parse_lojban=trace camxes -L input.jbo

Prolog Export

camxes-rs can convert Lojban sentences to SWI-Prolog source code — facts, rules, and queries. This feature achieves feature parity with "Logical English" project, enabling Lojban to be used as a logic programming front-end.

How It Works

Lojban semantics are represented as propositions (Prop<JboRel, JboTerm, ...>), which are then translated into Prolog clauses:

  • Facts: .i sentences become Prolog facts ending with .
  • Rules: Implications (.ijanai etc.) become rules with :-
  • Queries: Question words (ma) produce ?- query clauses
  • Negation: Logical negation becomes \+
  • Conjunction/Disjunction: Logical AND/OR become , / ;

Programmatic Usage

use camxes_rs::eval_show::eval_text_to_prolog;
use camxes_rs::parse_lojban::parse_text;
use camxes_rs::morphology::morph;

let text = morph(".i la .alis. cu ninmu").expect("morphology");
let parsed = parse_text(&format!("{text} %%%END%%%")).expect("parse");
let prolog = eval_text_to_prolog(&parsed);
println!("{prolog}");

Prop-level control is also available via jbo_prolog::prop_to_prolog(), props_to_prolog(), and semantic_results_to_prolog() for constructing clauses from individual propositions.

References

API Examples

Token Extraction

Extract tokens with text spans:

use camxes_rs::camxes::peg::grammar::Peg;
use camxes_rs::camxes::peg::parsing::{ParseNode, ParseResult};
use camxes_rs::camxes::LOJBAN_GRAMMAR;

#[derive(Debug)]
struct Token {
    name: String,
    text: String,
    start: usize,
    end: usize,
}

impl Token {
    fn from_parse_node(node: &ParseNode, input: &str) -> Self {
        match node {
            ParseNode::Terminal { name, start, end } => Token {
                name: name.clone(),
                text: input[*start..*end].to_string(),
                start: *start,
                end: *end,
            },
            ParseNode::NonTerminal { name, start, end, .. } => Token {
                name: name.clone(),
                text: input[*start..*end].to_string(),
                start: *start,
                end: *end,
            },
        }
    }
}

fn parse_to_tokens(input: &str) -> Result<Vec<Token>, String> {
    let (start_rule, grammar) = LOJBAN_GRAMMAR;
    let parser = Peg::new(start_rule, grammar)
        .map_err(|e| format!("Failed to build parser: {:?}", e))?;
    
    let ParseResult(_, _, _, result) = parser.parse(input);
    
    match result.as_ref() {
        Ok(nodes) => Ok(nodes.iter().map(|n| Token::from_parse_node(n, input)).collect()),
        Err(err) => Err(format!("Parse failed at position {}: {:?}", err.position, err)),
    }
}

Custom Grammar Rules

Parse specific syntactic constructs:

use camxes_rs::camxes::peg::grammar::Peg;
use camxes_rs::camxes::LOJBAN_GRAMMAR;

// Parse only a word (morphology level)
let (_, grammar) = LOJBAN_GRAMMAR;
let word_parser = Peg::new("lojban_word", grammar)?;
let result = word_parser.parse("klama");

// Parse a specific construct
let sumti_parser = Peg::new("sumti", grammar)?;
let result = sumti_parser.parse("le zarci");

Multi-threaded Usage

For web servers or concurrent applications:

use std::collections::HashMap;
use std::sync::Arc;
use camxes_rs::camxes::peg::grammar::Peg;
use camxes_rs::camxes::LOJBAN_GRAMMAR;

// In server initialization
let grammar_texts: Arc<HashMap<i32, String>> = Arc::new({
    let mut map = HashMap::new();
    map.insert(1, LOJBAN_GRAMMAR.1.to_string()); // Language ID 1 = Lojban
    map
});

// In each worker thread
let mut parsers = HashMap::new();
for (lang_id, grammar_text) in grammar_texts.iter() {
    match Peg::new("text", grammar_text) {
        Ok(parser) => {
            parsers.insert(*lang_id, parser);
        }
        Err(e) => {
            log::error!("Failed to initialize parser for language {}: {}", lang_id, e);
        }
    }
}

// Use the parser
if let Some(parser) = parsers.get(&1) {
    let result = parser.parse("mi klama");
    // Process result...
}

Egglog Equality-Saturation Analysis

camxes-rs ships an optional analysis mode, enabled via the egglog Cargo feature flag, that routes Lojban text through an egglog e-graph engine after the normal PEG parse + semantic evaluation. The goal is a canonical representation of Lojban meaning via equality saturation — the same sentence expressed in different but logically equivalent ways converges to a single normal form inside the e-graph.

What the mode does

Input text
    │
    ▼  (existing path — always runs)
PEG morphology + parse  →  jbo_syntax::Text
    │
    ▼  (existing path — always runs)
Semantic evaluation     →  Vec<SemanticResult>  (JboProp / JboRel / JboTerm tree)
    │
    ▼  (NEW — only when --egglog / feature = "egglog")
egglog lowering         →  s-expression program text  (egglog_lower::lower_text)
    │
    ▼
egglog EGraph           →  load schema (egglog_schema.egg)
                            assert facts (lowered text)
                            run rules to saturation (egglog_rules.egg, up to 1000 iter)
    │
    ├──▶  extract canonical prop  (smallest-cost representative)
    └──▶  serialize e-graph       →  JSON relation dump  ("egglog_graph" key)

The PEG parser is kept unchanged — it handles Lojban morphology and cmavo disambiguation that would take months to re-encode in Datalog form. The egglog pass runs on the already-evaluated semantic tree, enriching rather than replacing the existing output.

Enabling the feature

# Cargo.toml
[dependencies]
camxes-rs = { version = "1.0", features = ["egglog"] }

Or for the CLI binary:

cargo build --features egglog

CLI usage

# Single file, JSON output with egglog e-graph
cargo run --bin camxes --features egglog -- --json --egglog input.jbo

# Short flag -E is equivalent
echo "mi klama le zarci" > /tmp/t.jbo
cargo run --bin camxes --features egglog -- --json -E /tmp/t.jbo

The --egglog / -E flag is accepted even when the crate is compiled without the egglog feature — in that case a warning is logged and the flag is silently ignored, so scripts that always pass --egglog remain forward-compatible.

JSON output format

When --egglog is active every NDJSON line gains an additional "egglog_graph" key:

{
  "input": "mi klama le zarci",
  "logical": "non-veridical: zarci(c0)\nklama(mi,c0)",
  "canonical": "...",
  "graph": { ... },
  "prolog": "...",
  "error": null,
  "egglog_graph": {
    "nodes": {
      "function-0-PRel": {
        "op": "PRel",
        "children": ["function-0-Brivla", "function-0-TCons"],
        "eclass": "JboProp-5",
        "cost": 1.0,
        "subsumed": false
      },
      ...
    },
    "root_eclasses": [],
    "class_data": {
      "JboProp-5": { "type": "JboProp" },
      ...
    }
  }
}

Each node in egglog_graph.nodes is one e-node; nodes in the same eclass are known to be semantically equivalent after saturation. Downstream tools can query the graph for all equivalent representations of any sub-expression.

Programmatic usage

#[cfg(feature = "egglog")]
{
    use camxes_rs::egglog_extract::run_egglog_analysis;
    use camxes_rs::jbo_parse::eval_text;
    use camxes_rs::jbo_prop::Texticule;
    use camxes_rs::parse_lojban::parse_text;
    use camxes_rs::morphology::morph;

    let raw   = morph("mi klama le zarci").expect("morph");
    let src   = format!("{raw} %%%END%%%");
    let tree  = parse_text(&src).expect("parse");
    let texs: Vec<Texticule> = eval_text(&tree)
        .into_iter()
        .flat_map(|r| {
            let mut v = r.side_texticules;
            v.push(Texticule::TexticuleProp(r.prop));
            v
        })
        .collect();

    let result = run_egglog_analysis(0, &texs).expect("egglog");
    println!("canonical prop : {}", result.canonical_prop);
    println!("e-graph JSON   : {}", result.graph_json);
}

Architecture: source files

File Role
src/egglog_schema.egg Sort declarations + constructors for every Lojban semantic type (JboProp, JboRel, JboTerm, JboTag, JboMex, …)
src/egglog_rules.egg Equality-saturation rewrite rules (see below)
src/egglog_lower.rs Walks JboProp/JboRel/JboTerm trees and emits egglog s-expression text
src/egglog_extract.rs Orchestrates schema load → fact assertion → rule run → extraction + JSON serialisation

Rewrite rules

Rules live in src/egglog_rules.egg and are organised in four categories:

3a — Logical simplification

Rule Effect
(PNot (PNot p)) → p Double-negation elimination
(PAnd p (Eet)) → p Eet (empty/true) is identity for conjunction
(PAnd p p) → p Idempotency
(PAnd p q) ↔ (PAnd q p) Commutativity (birewrite)
(PAnd (PAnd p q) r) ↔ (PAnd p (PAnd q r)) Associativity (birewrite)
De Morgan, implication elimination, equivalence expansion Standard propositional rewrites

3b — Lojban-specific

Rule Effect
PermutedRel(1, PermutedRel(1, r)) → r se se cancels (double place-swap)
ScalarNegRel("nai", Brivla b) → PNot(PRel(Brivla b, …)) nai scalar negation
ScalarNegRel("to'e", ScalarNegRel("to'e", r)) → r to'e to'e cancels
Tanru(Brivla a, Brivla a) → Brivla a Tanru of identical brivla collapses
Nested identical modal tags absorb PModal(tag, PModal(tag, p)) → PModal(tag, p)

3c — Quantifier normalisation

Rule Effect
PQuant q _ (PAnd (Eet) body) _ → PQuant q _ body _ Strip vacuous restrictions
PNot(PQuant(Exists, …body…)) → PQuant(Forall, …PNot(body)…) ¬∃ ↔ ∀¬
PNot(PQuant(Forall, …body…)) → PQuant(Exists, …PNot(body)…) ¬∀ ↔ ∃¬

3d — Anaphora hints

Lojban anaphora (ri, ra) are resolved on the Rust side before lowering; the e-graph inherits the resolved bindings because equal terms are merged into the same e-class automatically.

Schema design

All Lojban semantic sorts are mutually recursive (JboProp references JboRel which references JboTerm which references TexList which references Texticule which references JboProp, etc.). The schema uses the egglog 2.0 two-step pattern:

; Step 1 — declare sort names first
(sort JboTerm)
(sort JboRel)
(sort JboProp)
...

; Step 2 — add constructors referencing any already-declared sort
(constructor Brivla (String) JboRel)
(constructor PRel   (JboRel TermList) JboProp)
...

This is the only way to handle mutual recursion in egglog 2.0; (datatype …) blocks only work for self-recursive types.

Limitations and known approximations

  • Higher-order closures (JboVPred, JboNPred, JboPred) cannot be lowered — they are represented by opaque placeholder strings such as "<vpred>" or "<abspred:ka>".
  • Quantified propositions (Prop::Quantified) lose their restriction/body closures; the quantifier structure is preserved but the body is approximated as (Eet). A fuller encoding would pre-apply the closure at a fresh BoundVar index before lowering.
  • AppliedRel (internal pre-filled relation) is collapsed to its base relation.
  • The canonical prop extraction currently returns (Eet) when the e-graph contains no TexticuleProp at position 0 — this occurs for texts that produce only fragment terms.
  • Pure egglog CYK parsing (replacing the PEG layer) is a planned stretch goal; see the plan document at .windsurf/plans/lojban-egglog-mode-5b3525.md for the encoding strategy.

API Compatibility Note

Important: The embedded camxes module differs from the standalone camxes-rs 0.1.x:

  • 0.1.x: ParseResult(cost, position, result) - result at index 2
  • 1.0.0+: ParseResult(cost, position, error_position, result) - result at index 3

The 1.0.0+ version adds an explicit error position field for better error diagnostics. When migrating from 0.1.x, change result.2 to result.3 to access the parse result.

WebAssembly

Build for WASM:

cargo build --release --target wasm32-unknown-unknown --lib
wasm-bindgen \
  --target web \
  --out-dir ./pkg \
  --out-name camxes \
  target/wasm32-unknown-unknown/release/camxes_rs.wasm

See the web-app directory for a complete browser example.

Development

Build

cargo build --release

Test

# Run all tests (default features)
cargo test

# Run focused camxes tests
cargo test --test camxes_lojban_grammar
cargo test --test camxes_semantic_actions

# Run egglog integration tests (requires egglog feature)
cargo test --features egglog --test camxes_egglog

# Run all tests with egglog feature
cargo test --features egglog

# Build examples
cargo build --examples

# Run benchmarks
cargo bench --bench camxes_parser

Project Structure

rust/
  Cargo.toml
  src/
    lib.rs                  # Crate root with documentation
    main.rs                 # CLI entry point
    camxes/                 # Integrated PEG parser
      grammar/lojban.peg    # Embedded Lojban grammar
      peg/                  # PEG parser engine
    parse_lojban.rs         # High-level semantic API
    morphology.rs           # Morphology validation
    jbo_*.rs                # Semantic analysis modules
    egglog_schema.egg       # Egglog sort + constructor declarations  [feature = egglog]
    egglog_rules.egg        # Equality-saturation rewrite rules       [feature = egglog]
    egglog_lower.rs         # JboProp/JboRel/JboTerm → egglog text   [feature = egglog]
    egglog_extract.rs       # Run engine, extract + serialise         [feature = egglog]
  examples/                 # Usage examples
  tests/
    camxes_egglog.rs        # Egglog integration tests                [feature = egglog]
    ...                     # Other integration tests
  benches/                  # Performance benchmarks
  web-app/                  # WASM browser application

Modules

  • camxes: PEG parser with embedded Lojban grammar
  • parse_lojban: High-level semantic parsing API
  • morphology: Lojban morphology validation
  • jbo_tree, jbo_syntax, jbo_prop: Semantic tree structures
  • jbo_show: Output formatting (logical forms, canonical Lojban)
  • jbo_prolog: Prolog source code generation (SWI-Prolog compatible)
  • jbo_parse: Parse tree to semantic tree conversion
  • run: CLI orchestration and JSON output
  • egglog_lower (feature = egglog): Lower semantic tree to egglog facts
  • egglog_extract (feature = egglog): Run equality saturation and extract results

Documentation

License

GPL-3.0 - See LICENSE for details.

Acknowledgments

This crate combines:

  • camxes: PEG parser originally developed as a standalone crate
  • tersmu: Semantic analysis engine (Rust port of the Haskell implementation)

Both are now integrated into a single, comprehensive Lojban parsing library.