arena-terms-parser 0.6.2

Parser for arena-backed, lightweight representations of Prolog-like terms
Documentation
# Arena Terms Parser

[![Crates.io](https://img.shields.io/crates/v/arena-terms-parser.svg)](https://crates.io/crates/arena-terms-parser)
[![Documentation](https://docs.rs/arena-terms-parser/badge.svg)](https://docs.rs/arena-terms-parser)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Rust](https://img.shields.io/badge/rust-stable-brightgreen.svg)](https://www.rust-lang.org)

Parser for **arena-backed, Prolog-like terms**.

This crate provides a lexer, parser, and operator handling for Prolog-style
terms. It depends on the [`arena_terms`](https://crates.io/crates/arena_terms)
crate to store terms efficiently in an arena and is built on top of the
[`parlex`](https://crates.io/crates/parlex) core library.


## Features

- **Lexer**
  Tokenizes atoms, variables, numbers, strings, dates, and symbols.

- **Parser**
  An SLR(1) parser (generated by [`parlex-gen`]https://crates.io/crates/parlex-gen) that
  produces `arena_terms::Term` values.

- **Operators**
  Dynamically handles operator fixity, associativity, and precedence rules.

- **Multi-encoding**
  Supports all WHATWG encodings via `encoding_rs`: UTF-8, ASCII, ISO-8859-1 through 16,
  Windows-1250 through 1258, KOI8-R/U, Shift_JIS, EUC-JP, GBK, GB18030, Big5, EUC-KR,
  UTF-16, and more. All internal term representation is UTF-8; input bytes are transcoded
  automatically. Binary content (`bin{...}`) always collects raw source bytes.

- **Arena-backed**
  Terms are stored compactly in arenas for efficient allocation and traversal.


## Usage

Parsing a string into arena terms:

```rust
use arena_terms::Arena;
use arena_terms_parser::{Encoding, TermParser, define_opers};
use try_next::{IterInput, TryNextWithContext};

const DEFS: &str = "[
    op('+'(x,y), infix, 380, left),
    op('*'(x,y), infix, 400, left),
]";

const TERMS: &str = "
    likes(mary, pizza).
    2 + 2 * 3 = 8 .
";

fn main() {
    let mut arena = Arena::try_with_default_opers().unwrap();
    define_opers(&mut arena, IterInput::from(DEFS.bytes()), Encoding::Utf8).unwrap();
    let mut parser = TermParser::try_new(IterInput::from(TERMS.bytes()), Encoding::Utf8).unwrap();

    while let Some(term) = parser.try_next_with_context(&mut arena).unwrap() {
        println!("{}", term.display(&arena));
    }
}
```


## CLI

Build the binary with:

```bash
cargo build --release --bin arena-terms-parser
```

Then run:

```bash
# Parse terms
./target/release/arena-terms-parser parse --terms input.ax
./target/release/arena-terms-parser parse --encoding iso-8859-1 --terms input.ax
./target/release/arena-terms-parser parse --defs ops.ax --terms input.ax

# Decode: bytes in source encoding → UTF-8
./target/release/arena-terms-parser decode --from windows-1251 --input file.bin
echo -n $'\xCF\xF0\xE8\xE2\xE5\xF2' | ./target/release/arena-terms-parser decode --from windows-1251

# Encode: UTF-8 → bytes in target encoding
echo -n 'café' | ./target/release/arena-terms-parser encode --to iso-8859-1
./target/release/arena-terms-parser encode --to shift_jis --input japanese.txt
```

All encoding names accept any WHATWG/IANA charset label (case-insensitive), including
common aliases like `latin1`, `sjis`, `cp1251`, `chinese`, etc. Default: `utf-8`.

**Note:** In `bin{N:...}` and `text{N:...}`, *N* is the number of **raw bytes**, not characters.
For example, with UTF-8 input, `text{10:Игорь}` is correct (5 Cyrillic characters = 10 UTF-8 bytes),
while `text{5:Игорь}` will fail to parse.


## Known Divergences from Legacy Parser

* **`123e5` is accepted as a float literal.** The legacy parser requires a decimal point
  (e.g., `1.23e5`), treating `123e5` as integer `123` followed by atom `e5`. Arena-terms
  accepts the `DEC+EXP` form as valid, following C/Python/JSON/Rust conventions.

* **Date representation.** Legacy uses Excel serial dates (double); arena-terms uses Unix
  epoch milliseconds (i64) with extended ISO-8601 format support.


## Documentation

For detailed API documentation, visit [docs.rs/arena-terms-parser](https://docs.rs/arena-terms-parser).


## License

Copyright (c) 2005–2026 IKH Software, Inc.

Released under the [MIT License](https://opensource.org/licenses/MIT).

## See Also

- [parlex]https://crates.io/crates/parlex - Parlex core core library
- [parlex-gen]https://crates.io/crates/parlex-gen - Lexer and parser generation tools (`alex` and `aslr`)
- [arena-terms]https://crates.io/crates/arena-terms - Arena-backed Prolog-like terms