pgf2json.rs
Overview
A Rust library for parsing Portable Grammar Format (PGF) files and converting them to JSON. This crate provides an API to load and interpret grammars compiled in Portable Grammar Format (PGF), which is the final output format from the Grammatical Framework (GF) compiler.
The library enables embedding GF grammars in Rust programs. The current implementation is sensitive to PGF versioning; stable for 1.0 and experimental support for the yanked (upcoming?) version 2.1 .
Documentation
Tread carefully, here be dragons!
There seems to be a minor format difference between PGF 1.0 and 2.1 that needs addressing (see more below), but the core parser architecture is solid.
Health Status
- ✅ 3/4 tests passing (75% success rate)
- ✅ Synthetic PGF creation and JSON conversion working
- ✅ Error handling tests passing
- ✅ Parse sentence functionality working
- ⚠️ 1 remaining issue: UTF-8 decoding in Hello.pgf (version 2.1 format differences)
A temporary fix to the UTF-8 decoding issue has been implemented. Once a more robust solution has been implemented, the crate version will be bumped.
Features
pgf2json contains a complete PGF binary parser, covering all of the PGF v1.0 format specifications. The libraries strength include:
- PGF Parsing: Read binary PGF files into Rust data structures
- JSON Conversion: Convert PGF grammars to structured JSON format
- Grammar Analysis: Extract functions, categories, and language-specific concrete syntaxes
- Sentence Parsing: Parse sentences using loaded grammars
- Error Handling: Comprehensive error reporting for invalid files and parsing failures
- Testing - Around 75% of the crates functionality has been tested.
Usage
Pretty simple, just load a .pgf files and convert it to .json.
use ;
// Load a PGF file
let pgf = read_pgf?;
// Convert to JSON
let json = pgf_to_json?;
// Parse a sentence
let lang = read_language.unwrap;
let typ = start_cat;
let trees = parse?;
Testing
The test suite includes parsing real PGF files and validating JSON output structure.
Temporary fix to UTF-8 issue
There seems to be an "issue" related to fix- and variable- length strings, where
PGF 1.0 uses variable-length, and PGF 2.1 uses fixed-length.
The current version of pgf2json.rs contains a fix to this issue, where the
read_string function reads the string length as a fixed 32-bit big-endian
integer for PGF 2.1.For full compatibility with both PGF 1.0 and the yanked (or
unreleased PGF 2.1), we made read_string function version-aware and propagate
the is_pgf_2_1 flag through the parsing functions.
The test test_real_pgf_parsing, targets the fix by reading a .pgf file and
confirm correct parsing of strings like "Greeting" at offset 180.