Arena Terms Parser
Parser for arena-backed, Prolog-like terms.
This crate provides a lexer, parser, and operator handling for Prolog-style
terms. It depends on the arena_terms
crate to store terms efficiently in an arena and is built on top of the
parlex core library.
Features
-
Lexer Tokenizes atoms, variables, numbers, strings, dates, and symbols.
-
Parser An SLR(1) parser (generated by
parlex-gen) that producesarena_terms::Termvalues. -
Operators Dynamically handles operator fixity, associativity, and precedence rules.
-
Multi-encoding Supports all WHATWG encodings via
encoding_rs: UTF-8, ASCII, ISO-8859-1 through 16, Windows-1250 through 1258, KOI8-R/U, Shift_JIS, EUC-JP, GBK, GB18030, Big5, EUC-KR, UTF-16, and more. All internal term representation is UTF-8; input bytes are transcoded automatically. Binary content (bin{...}) always collects raw source bytes. -
Arena-backed Terms are stored compactly in arenas for efficient allocation and traversal.
Usage
Parsing a string into arena terms:
use Arena;
use ;
use ;
const DEFS: &str = "[
op('+'(x,y), infix, 380, left),
op('*'(x,y), infix, 400, left),
]";
const TERMS: &str = "
likes(mary, pizza).
2 + 2 * 3 = 8 .
";
CLI
Build the binary with:
Then run:
# Parse terms
# Decode: bytes in source encoding → UTF-8
|
# Encode: UTF-8 → bytes in target encoding
|
All encoding names accept any WHATWG/IANA charset label (case-insensitive), including
common aliases like latin1, sjis, cp1251, chinese, etc. Default: utf-8.
Note: In bin{N:...} and text{N:...}, N is the number of raw bytes, not characters.
For example, with UTF-8 input, text{10:Игорь} is correct (5 Cyrillic characters = 10 UTF-8 bytes),
while text{5:Игорь} will fail to parse.
Known Divergences from Legacy Parser
-
123e5is accepted as a float literal. The legacy parser requires a decimal point (e.g.,1.23e5), treating123e5as integer123followed by atome5. Arena-terms accepts theDEC+EXPform as valid, following C/Python/JSON/Rust conventions. -
Date representation. Legacy uses Excel serial dates (double); arena-terms uses Unix epoch milliseconds (i64) with extended ISO-8601 format support.
Documentation
For detailed API documentation, visit docs.rs/arena-terms-parser.
License
Copyright (c) 2005–2026 IKH Software, Inc.
Released under the MIT License.
See Also
- parlex - Parlex core core library
- parlex-gen - Lexer and parser generation tools (
alexandaslr) - arena-terms - Arena-backed Prolog-like terms