# Arena Terms Parser
[](https://crates.io/crates/arena-terms-parser)
[](https://docs.rs/arena-terms-parser)
[](https://opensource.org/licenses/MIT)
[](https://www.rust-lang.org)
Parser for **arena-backed, Prolog-like terms**.
This crate provides a lexer, parser, and operator handling for Prolog-style
terms. It depends on the [`arena_terms`](https://crates.io/crates/arena_terms)
crate to store terms efficiently in an arena and is built on top of the
[`parlex`](https://crates.io/crates/parlex) core library.
## Features
- **Lexer**
Tokenizes atoms, variables, numbers, strings, dates, and symbols.
- **Parser**
An SLR(1) parser (generated by [`parlex-gen`](https://crates.io/crates/parlex-gen)) that
produces `arena_terms::Term` values.
- **Operators**
Dynamically handles operator fixity, associativity, and precedence rules.
- **Multi-encoding**
Supports all WHATWG encodings via `encoding_rs`: UTF-8, ASCII, ISO-8859-1 through 16,
Windows-1250 through 1258, KOI8-R/U, Shift_JIS, EUC-JP, GBK, GB18030, Big5, EUC-KR,
UTF-16, and more. All internal term representation is UTF-8; input bytes are transcoded
automatically. Binary content (`bin{...}`) always collects raw source bytes.
- **Arena-backed**
Terms are stored compactly in arenas for efficient allocation and traversal.
## Usage
Parsing a string into arena terms:
```rust
use arena_terms::Arena;
use arena_terms_parser::{Encoding, TermParser, define_opers};
use try_next::{IterInput, TryNextWithContext};
const DEFS: &str = "[
op('+'(x,y), infix, 380, left),
op('*'(x,y), infix, 400, left),
]";
const TERMS: &str = "
likes(mary, pizza).
2 + 2 * 3 = 8 .
";
fn main() {
let mut arena = Arena::try_with_default_opers().unwrap();
define_opers(&mut arena, IterInput::from(DEFS.bytes()), Encoding::Utf8).unwrap();
let mut parser = TermParser::try_new(IterInput::from(TERMS.bytes()), Encoding::Utf8).unwrap();
while let Some(term) = parser.try_next_with_context(&mut arena).unwrap() {
println!("{}", term.display(&arena));
}
}
```
## CLI
Build the binary with:
```bash
cargo build --release --bin arena-terms-parser
```
Then run:
```bash
# Parse terms
./target/release/arena-terms-parser parse --terms input.ax
./target/release/arena-terms-parser parse --encoding iso-8859-1 --terms input.ax
./target/release/arena-terms-parser parse --defs ops.ax --terms input.ax
# Decode: bytes in source encoding → UTF-8
./target/release/arena-terms-parser decode --from windows-1251 --input file.bin
echo -n $'\xCF\xF0\xE8\xE2\xE5\xF2' | ./target/release/arena-terms-parser decode --from windows-1251
# Encode: UTF-8 → bytes in target encoding
```
All encoding names accept any WHATWG/IANA charset label (case-insensitive), including
common aliases like `latin1`, `sjis`, `cp1251`, `chinese`, etc. Default: `utf-8`.
**Note:** In `bin{N:...}` and `text{N:...}`, *N* is the number of **raw bytes**, not characters.
For example, with UTF-8 input, `text{10:Игорь}` is correct (5 Cyrillic characters = 10 UTF-8 bytes),
while `text{5:Игорь}` will fail to parse.
## Known Divergences from Legacy Parser
* **`123e5` is accepted as a float literal.** The legacy parser requires a decimal point
(e.g., `1.23e5`), treating `123e5` as integer `123` followed by atom `e5`. Arena-terms
accepts the `DEC+EXP` form as valid, following C/Python/JSON/Rust conventions.
* **Date representation.** Legacy uses Excel serial dates (double); arena-terms uses Unix
epoch milliseconds (i64) with extended ISO-8601 format support.
## Documentation
For detailed API documentation, visit [docs.rs/arena-terms-parser](https://docs.rs/arena-terms-parser).
## License
Copyright (c) 2005–2026 IKH Software, Inc.
Released under the [MIT License](https://opensource.org/licenses/MIT).
## See Also
- [parlex](https://crates.io/crates/parlex) - Parlex core core library
- [parlex-gen](https://crates.io/crates/parlex-gen) - Lexer and parser generation tools (`alex` and `aslr`)
- [arena-terms](https://crates.io/crates/arena-terms) - Arena-backed Prolog-like terms