camxes-rs
A comprehensive Lojban parser combining fast PEG parsing with semantic analysis capabilities.
camxes-rs provides both low-level parsing (via the integrated camxes PEG parser) and high-level semantic analysis (via the tersmu semantic engine). Use it as a standalone parser library or as a complete semantic analyzer.
Features
- Fast PEG Parser: Zero-copy parsing with span-based tokens
- Semantic Analysis: Converts Lojban to logical forms and canonical representations
- Egglog Equality Saturation: Optional
egglogfeature flag adds e-graph analysis with rewrite rules for logical normalisation and Lojban-specific canonicalisation - Prolog Export: Generates SWI-Prolog source code (facts, rules, and queries) from Lojban
- Rich Error Diagnostics: Position tracking and detailed error messages
- WebAssembly Support: Runs in browsers via WASM
- Thread-Safe: Create parser instances per thread for concurrent usage
- Comprehensive Testing: Validated against extensive golden examples
Installation
Add to your Cargo.toml:
[]
= "1.0"
Quick Start
As a PEG Parser
Use the integrated camxes module for fast, low-level parsing:
use Peg;
use LOJBAN_GRAMMAR;
With Semantic Analysis
Use the high-level API for logical forms and canonical output:
use parse_text;
Command-Line Tool
The crate includes a camxes binary for command-line usage:
# Install
# Parse a file (one sentence per line)
# Parse from stdin
|
# Output JSON
|
# Logical form only
|
# Canonical Lojban only
|
# Prolog (SWI-Prolog) source output
|
Logging
Enable debug output with the RUST_LOG environment variable:
# All debug logs
RUST_LOG=debug
# Only camxes-rs logs
RUST_LOG=camxes_rs=debug
# Specific module logs
RUST_LOG=camxes_rs::morphology=debug,camxes_rs::parse_lojban=trace
Prolog Export
camxes-rs can convert Lojban sentences to SWI-Prolog source code — facts, rules, and queries. This feature achieves feature parity with "Logical English" project, enabling Lojban to be used as a logic programming front-end.
How It Works
Lojban semantics are represented as propositions (Prop<JboRel, JboTerm, ...>), which are then
translated into Prolog clauses:
- Facts:
.isentences become Prolog facts ending with. - Rules: Implications (
.ijanaietc.) become rules with:- - Queries: Question words (
ma) produce?-query clauses - Negation: Logical negation becomes
\+ - Conjunction/Disjunction: Logical AND/OR become
,/;
Programmatic Usage
use eval_text_to_prolog;
use parse_text;
use morph;
let text = morph.expect;
let parsed = parse_text.expect;
let prolog = eval_text_to_prolog;
println!;
Prop-level control is also available via jbo_prolog::prop_to_prolog(), props_to_prolog(), and
semantic_results_to_prolog() for constructing clauses from individual propositions.
References
- lojysamban (Haskell, 2012) — original Lojban-to-Prolog translator that defined the conversion model
- Prolog for Lojbanists — wiki article on Lojban/Prolog correspondence
- Natural-Language-Processing-in-Prolog — top-down and bottom-up Prolog NLP techniques informing the conversion
- Logical English — logical English grammar with Prolog export, motivating feature parity for Lojban
API Examples
Token Extraction
Extract tokens with text spans:
use Peg;
use ;
use LOJBAN_GRAMMAR;
Custom Grammar Rules
Parse specific syntactic constructs:
use Peg;
use LOJBAN_GRAMMAR;
// Parse only a word (morphology level)
let = LOJBAN_GRAMMAR;
let word_parser = new?;
let result = word_parser.parse;
// Parse a specific construct
let sumti_parser = new?;
let result = sumti_parser.parse;
Multi-threaded Usage
For web servers or concurrent applications:
use HashMap;
use Arc;
use Peg;
use LOJBAN_GRAMMAR;
// In server initialization
let grammar_texts: = new;
// In each worker thread
let mut parsers = new;
for in grammar_texts.iter
// Use the parser
if let Some = parsers.get
Egglog Equality-Saturation Analysis
camxes-rs ships an optional analysis mode, enabled via the egglog Cargo feature flag,
that routes Lojban text through an egglog e-graph engine
after the normal PEG parse + semantic evaluation. The goal is a canonical representation of
Lojban meaning via equality saturation — the same sentence expressed in different but logically
equivalent ways converges to a single normal form inside the e-graph.
What the mode does
Input text
│
▼ (existing path — always runs)
PEG morphology + parse → jbo_syntax::Text
│
▼ (existing path — always runs)
Semantic evaluation → Vec<SemanticResult> (JboProp / JboRel / JboTerm tree)
│
▼ (NEW — only when --egglog / feature = "egglog")
egglog lowering → s-expression program text (egglog_lower::lower_text)
│
▼
egglog EGraph → load schema (egglog_schema.egg)
assert facts (lowered text)
run rules to saturation (egglog_rules.egg, up to 1000 iter)
│
├──▶ extract canonical prop (smallest-cost representative)
└──▶ serialize e-graph → JSON relation dump ("egglog_graph" key)
The PEG parser is kept unchanged — it handles Lojban morphology and cmavo disambiguation that would take months to re-encode in Datalog form. The egglog pass runs on the already-evaluated semantic tree, enriching rather than replacing the existing output.
Enabling the feature
# Cargo.toml
[]
= { = "1.0", = ["egglog"] }
Or for the CLI binary:
CLI usage
# Single file, JSON output with egglog e-graph
# Short flag -E is equivalent
The --egglog / -E flag is accepted even when the crate is compiled without the egglog
feature — in that case a warning is logged and the flag is silently ignored, so scripts that
always pass --egglog remain forward-compatible.
JSON output format
When --egglog is active every NDJSON line gains an additional "egglog_graph" key:
Each node in egglog_graph.nodes is one e-node; nodes in the same eclass are known to be
semantically equivalent after saturation. Downstream tools can query the graph for all
equivalent representations of any sub-expression.
Programmatic usage
Architecture: source files
| File | Role |
|---|---|
src/egglog_schema.egg |
Sort declarations + constructors for every Lojban semantic type (JboProp, JboRel, JboTerm, JboTag, JboMex, …) |
src/egglog_rules.egg |
Equality-saturation rewrite rules (see below) |
src/egglog_lower.rs |
Walks JboProp/JboRel/JboTerm trees and emits egglog s-expression text |
src/egglog_extract.rs |
Orchestrates schema load → fact assertion → rule run → extraction + JSON serialisation |
Rewrite rules
Rules live in src/egglog_rules.egg and are organised in four categories:
3a — Logical simplification
| Rule | Effect |
|---|---|
(PNot (PNot p)) → p |
Double-negation elimination |
(PAnd p (Eet)) → p |
Eet (empty/true) is identity for conjunction |
(PAnd p p) → p |
Idempotency |
(PAnd p q) ↔ (PAnd q p) |
Commutativity (birewrite) |
(PAnd (PAnd p q) r) ↔ (PAnd p (PAnd q r)) |
Associativity (birewrite) |
| De Morgan, implication elimination, equivalence expansion | Standard propositional rewrites |
3b — Lojban-specific
| Rule | Effect |
|---|---|
PermutedRel(1, PermutedRel(1, r)) → r |
se se cancels (double place-swap) |
ScalarNegRel("nai", Brivla b) → PNot(PRel(Brivla b, …)) |
nai scalar negation |
ScalarNegRel("to'e", ScalarNegRel("to'e", r)) → r |
to'e to'e cancels |
Tanru(Brivla a, Brivla a) → Brivla a |
Tanru of identical brivla collapses |
| Nested identical modal tags absorb | PModal(tag, PModal(tag, p)) → PModal(tag, p) |
3c — Quantifier normalisation
| Rule | Effect |
|---|---|
PQuant q _ (PAnd (Eet) body) _ → PQuant q _ body _ |
Strip vacuous restrictions |
PNot(PQuant(Exists, …body…)) → PQuant(Forall, …PNot(body)…) |
¬∃ ↔ ∀¬ |
PNot(PQuant(Forall, …body…)) → PQuant(Exists, …PNot(body)…) |
¬∀ ↔ ∃¬ |
3d — Anaphora hints
Lojban anaphora (ri, ra) are resolved on the Rust side before lowering; the e-graph inherits the resolved bindings because equal terms are merged into the same e-class automatically.
Schema design
All Lojban semantic sorts are mutually recursive
(JboProp references JboRel which references JboTerm which references TexList which
references Texticule which references JboProp, etc.).
The schema uses the egglog 2.0 two-step pattern:
; Step 1 — declare sort names first
(sort JboTerm)
(sort JboRel)
(sort JboProp)
...
; Step 2 — add constructors referencing any already-declared sort
(constructor Brivla (String) JboRel)
(constructor PRel (JboRel TermList) JboProp)
...
This is the only way to handle mutual recursion in egglog 2.0; (datatype …) blocks only work
for self-recursive types.
Limitations and known approximations
- Higher-order closures (
JboVPred,JboNPred,JboPred) cannot be lowered — they are represented by opaque placeholder strings such as"<vpred>"or"<abspred:ka>". - Quantified propositions (
Prop::Quantified) lose their restriction/body closures; the quantifier structure is preserved but the body is approximated as(Eet). A fuller encoding would pre-apply the closure at a freshBoundVarindex before lowering. AppliedRel(internal pre-filled relation) is collapsed to its base relation.- The canonical prop extraction currently returns
(Eet)when the e-graph contains noTexticulePropat position 0 — this occurs for texts that produce only fragment terms. - Pure egglog CYK parsing (replacing the PEG layer) is a planned stretch goal; see the plan
document at
.windsurf/plans/lojban-egglog-mode-5b3525.mdfor the encoding strategy.
API Compatibility Note
Important: The embedded camxes module differs from the standalone camxes-rs 0.1.x:
- 0.1.x:
ParseResult(cost, position, result)- result at index 2 - 1.0.0+:
ParseResult(cost, position, error_position, result)- result at index 3
The 1.0.0+ version adds an explicit error position field for better error diagnostics. When migrating from 0.1.x, change result.2 to result.3 to access the parse result.
WebAssembly
Build for WASM:
See the web-app directory for a complete browser example.
Development
Build
Test
# Run all tests (default features)
# Run focused camxes tests
# Run egglog integration tests (requires egglog feature)
# Run all tests with egglog feature
# Build examples
# Run benchmarks
Project Structure
rust/
Cargo.toml
src/
lib.rs # Crate root with documentation
main.rs # CLI entry point
camxes/ # Integrated PEG parser
grammar/lojban.peg # Embedded Lojban grammar
peg/ # PEG parser engine
parse_lojban.rs # High-level semantic API
morphology.rs # Morphology validation
jbo_*.rs # Semantic analysis modules
egglog_schema.egg # Egglog sort + constructor declarations [feature = egglog]
egglog_rules.egg # Equality-saturation rewrite rules [feature = egglog]
egglog_lower.rs # JboProp/JboRel/JboTerm → egglog text [feature = egglog]
egglog_extract.rs # Run engine, extract + serialise [feature = egglog]
examples/ # Usage examples
tests/
camxes_egglog.rs # Egglog integration tests [feature = egglog]
... # Other integration tests
benches/ # Performance benchmarks
web-app/ # WASM browser application
Modules
camxes: PEG parser with embedded Lojban grammarparse_lojban: High-level semantic parsing APImorphology: Lojban morphology validationjbo_tree,jbo_syntax,jbo_prop: Semantic tree structuresjbo_show: Output formatting (logical forms, canonical Lojban)jbo_prolog: Prolog source code generation (SWI-Prolog compatible)jbo_parse: Parse tree to semantic tree conversionrun: CLI orchestration and JSON outputegglog_lower(feature = egglog): Lower semantic tree to egglog factsegglog_extract(feature = egglog): Run equality saturation and extract results
Documentation
- API Documentation - Full API reference
- Repository - Source code and examples
- Lojban.org - Official Lojban website
License
GPL-3.0 - See LICENSE for details.
Acknowledgments
This crate combines:
- camxes: PEG parser originally developed as a standalone crate
- tersmu: Semantic analysis engine (Rust port of the Haskell implementation)
Both are now integrated into a single, comprehensive Lojban parsing library.