meta-language 0.46.0

A self-describing links-network core for lossless language representation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
# meta-language

A Rust foundation for a universal, self-describing meta language backed by a
links network. The initial crate focuses on the common structural substrate:
links, references, source spans, parse status metadata, configurable trivia
attachment, self-description roots, and verification that a parsed region is
clean.

**Website:** <https://link-foundation.github.io/meta-language> — project
description, an interactive WebAssembly demo, and the full
[Rust API documentation](https://link-foundation.github.io/meta-language/api/).

## What Is Implemented

- A mutable `LinkNetwork` where every item is a link.
- Self-referential point links, so a point is represented without introducing a
  separate primitive.
- Relation links as ordinary links with ordered references to other links.
- Field labels as explicit links instead of side-table metadata.
- Source metadata: link type, named/anonymous flag, byte range, row/column
  points, and `is_error`, `has_error`, `is_missing`, `is_extra` flags.
- `verify_full_match()` for reporting error and missing links in a selected
  source region.
- `parse()` as the default lossless parse entry point; the explicit
  `parse_lossless_text()` boundary remains available.
- `reconstruct_text()` for byte-for-byte reconstruction from non-missing token
  links ordered by source span.
- `insert_source_token()`, `insert_syntax_node()`, and `render_source()` for
  emitting target-language source from programmatically constructed syntax
  networks whose token leaves do not come from a prior parse.
- `projected_links()` for viewing the same lossless network as concrete syntax,
  abstract syntax, or semantic-only data by stripping lower-level preservation
  links from the view.
- `NetworkSnapshot` and `MutableNetworkSnapshot` for immutable versioned
  snapshots, editable forks, provenance, and forward commits.
- `AccessMode` for a read-only or mutable engine per user configuration:
  `freeze()` / `as_read_only()` yield a `ReadOnlyNetwork` view whose mutators
  are unreachable at compile time, and `parse_engine()` returns an
  `EngineNetwork` that rejects mutation with a clear diagnostic under
  `AccessMode::ReadOnly`; the frozen form reuses snapshot `Arc` sharing.
- `LinkStore` and `EngineLinkStore` for storage-backed create/read/update/delete
  and search operations: reads take `&self`, writes take `&mut self`, the
  default store is the in-memory `LinkNetwork`, and read-only access mode
  rejects writes through the same storage boundary.
- Optional `doublets` feature support for a file-mapped `DoubletsLinkStore`
  using `doublets-rs` 0.4, with lossless network round trips and a documented
  `doublets-web` backend label for browser/WASM exchange of the same binary
  graph layout.
- `ParseConfiguration` with containment-link, token-link, or combined trivia
  attachment policies.
- Mixed-region links for Markdown fenced code and HTML regions, plus HTML
  script, style, and style-attribute regions, with `txt` fallback for prose
  regions that content sniffing cannot classify.
- `LinkQuery` for structural matching by link type, term, language, named flag,
  tree-sitter-query-like S-expressions, captures, and host predicates.
- `find()` / `replace()` for codemod-style query transforms over captured links
  while preserving unchanged source bytes.
- `SubstitutionRule` / `apply_substitution()` for the link-cli-style
  match-and-substitute operation.
- `apply_edit()` for incremental source reparsing with stable outside-edit link
  ids, snapshot fork sharing for unchanged links, and structural diff sets for
  changed, added, and removed links.
- Concept-to-language syntax mappings for cross-language reconstruction.
- `reconstruct_text_as()` for semantic cross-language reconstruction and
  configurable formalization levels.
- Exact-match concept interning with language-bound expression links,
  queryable external-id aliases, LiNo concept-set import, and
  `seed_common_concept_ontology()` for the default 351-concept semantic
  lexicon plus structural programming-language concepts.
- Object-identity links, many-valued `TruthValue` semantics, and fixed-point
  `ProbabilisticTruthValue` confidence semantics.
- A testable parity registry and upstream-provenanced `PARITY_FIXTURES` for
  executable competitor and ecosystem feature gates.
- Structural LiNo parsing for links-notation doublets, triplets, named links,
  simple indented definitions, and self-references.
- `LANGUAGE_FIXTURES` with lossless parse/reconstruction samples for every
  required markup, programming-language, and natural-language target.
- `NATURAL_LANGUAGE_GRAMMAR_FIXTURES` with pass/fail grammaticality fixtures
  for the ten natural-language targets, including provenance for the
  repo-authored sentences and UD-derived tag vocabulary.
- Coverage targets for full `txt`, Markdown, HTML, PDF, and DOCX support, mixed
  grammar embedding, ten programming-language parser targets, and ten
  natural-language parser targets.
- A documented text PDF profile (issue #84): `render_pdf_document()` /
  `parse_pdf_document()` map a language-free formatting document onto a valid,
  uncompressed single-page PDF (marked content for heading/paragraph/list, font
  resources for bold/italic), `parse("…", "pdf", …)` builds a byte-exact lossless
  network with additive concept-tagged structure links, and
  `reconstruct_text_as("PDF", …)` renders a structurally equivalent PDF through
  the shared concept layer. See `docs/pdf-fidelity.md` for the round-trip
  fidelity matrix.
- A documented DOCX (OOXML) profile (issue #85): `render_docx_document()` /
  `parse_docx_document()` map a language-free formatting document onto
  WordprocessingML `word/document.xml` (`<w:pStyle>` headings, `<w:numPr>` lists,
  `<w:b/>`/`<w:i/>` bold/italic runs), a binary OPC layer
  (`render_docx_package()` / `parse_docx_package()`) assembles a valid `.docx`
  ZIP with no new dependencies, `parse("…", "docx", …)` builds a byte-exact
  lossless network with additive concept-tagged structure links, and
  `reconstruct_text_as("DOCX", …)` renders a structurally equivalent DOCX through
  the shared concept layer. See `docs/docx-fidelity.md` for the round-trip
  fidelity matrix.
- Cross-format document reconstruction (issue #86): `reconstruct_text_as("txt" |
  "Markdown" | "HTML" | "PDF" | "DOCX", …)` reconstructs a document parsed from
  any supported format into any other over the shared concept layer (same-format
  targets stay byte-exact), with `txt` added as a first-class format and the lossy
  fallback floor. Per-format capability profiles (`document_format_profile`,
  `DOCUMENT_FORMATS`, `CROSS_FORMAT_CONCEPTS`) report, for every concept, either
  native support or a documented `LanguageProfile` fallback. See
  `docs/cross-format-fidelity.md` for the cross-format entry point and the
  per-format fidelity matrix.
- Self-description roots for `link`, `reference`, `relation link`, `language`,
  `grammar`, `type`, `Type`, `concept`, `point`, `field`, `trivia`, `region`,
  and `object`.
- A lossless text parser boundary that preserves tokens, trivia, recovery
  markers, and mixed-region metadata behind the same representation.

## Usage

```rust
use meta_language::{LinkNetwork, ParseConfiguration};

let network = LinkNetwork::parse("alpha beta", "plain-text", ParseConfiguration::default());
let report = network.verify_full_match(None);

assert!(report.is_clean());
assert_eq!(network.reconstruct_text(), "alpha beta");
```

The default parse path is lossless. Callers that need a narrower view can use a
projection without mutating the original network:

```rust
use meta_language::{LinkNetwork, NetworkProjection, ParseConfiguration};

let network = LinkNetwork::parse("alpha beta", "plain-text", ParseConfiguration::default());
let abstract_links = network
    .projected_links(NetworkProjection::AbstractSyntax)
    .count();

assert!(abstract_links < network.len());
```

Construct source directly as a syntax network when code should be generated
before validation:

```rust
use meta_language::{LinkNetwork, ParseConfiguration};

let mut network = LinkNetwork::new();
let tokens = [
    network.insert_source_token("JavaScript", "const answer = "),
    network.insert_source_token("JavaScript", "42"),
    network.insert_source_token("JavaScript", ";\n"),
];
let declaration = network.insert_syntax_node("JavaScript", "lexical_declaration", tokens);
network.insert_syntax_node("JavaScript", "program", [declaration]);

let source = network.render_source("JavaScript");
assert_eq!(source, "const answer = 42;\n");
assert!(LinkNetwork::parse(&source, "JavaScript", ParseConfiguration::default())
    .verify_full_match(None)
    .is_clean());
```

Configure the engine read-only when a parsed network must never be mutated. The
frozen view exposes every read operation but no mutators (calling one is a
compile error), and the `EngineNetwork` boundary rejects mutation at runtime:

```rust
use meta_language::{AccessMode, LinkNetwork, ParseConfiguration};

let configuration = ParseConfiguration::default().with_access_mode(AccessMode::ReadOnly);
let mut engine = LinkNetwork::parse_engine("alpha beta", "plain-text", configuration);

assert!(engine.is_read_only());
assert_eq!(engine.reconstruct_text(), "alpha beta");
assert!(engine.as_mutable().is_err()); // read-only engine rejects mutation
```

Use the storage trait directly when links need to move between in-memory and
binary stores. The optional doublets backend is enabled with
`--features doublets`:

```rust
use meta_language::{
    DoubletsLinkStore, LinkNetwork, LinkStore, LinkStoreQuery, ParseConfiguration,
};

let network = LinkNetwork::parse("const answer = 42;\n", "JavaScript", ParseConfiguration::default());
let mut store = DoubletsLinkStore::create_file("network.doublets").expect("create doublets store");
store.replace_with_network(&network).expect("write network");

let restored = DoubletsLinkStore::open_file("network.doublets")
    .expect("open doublets store")
    .to_network()
    .expect("read network");
assert_eq!(restored.to_lino(), network.to_lino());

let links = LinkStore::search(&restored, &LinkStoreQuery::new()).expect("search links");
assert_eq!(links.len(), restored.len());
```

`LinkStoreBackend::DoubletsWeb` names the WASM/browser exchange target for this
binary graph representation; native code uses `DoubletsLinkStore`, while a
browser host can map the same logical records through `doublets-web`.

Codemod-style transforms can select links with an S-expression query and replace
only captured source ranges:

```rust
use meta_language::{LinkNetwork, LinkQuery, ParseConfiguration, ReplacementRule};

let mut network = LinkNetwork::parse(
    "const oldName = call(oldName);\n",
    "JavaScript",
    ParseConfiguration::default(),
);
let query = LinkQuery::from_sexpression(
    r#"
    (identifier) @target
    (#eq? @target "oldName")
    "#,
)
.expect("query parses");
let captures = network.find(&query);

network.replace(
    &captures,
    &ReplacementRule::captured_text("target", "newName"),
);

assert_eq!(network.reconstruct_text(), "const newName = call(newName);\n");
```

Cross-language reconstruction can naturalize a parsed semantic proposition into
another target language, or expose progressively more formal representations:

```rust
use meta_language::{FormalizationLevel, LinkNetwork, ParseConfiguration};

let network = LinkNetwork::parse(
    "Hawaii is a state.\n",
    "English",
    ParseConfiguration::default(),
);

assert_eq!(
    network.reconstruct_text_as("Russian", ParseConfiguration::default()),
    "Гавайи это штат.\n"
);
assert_eq!(
    network.reconstruct_text_as(
        "Russian",
        ParseConfiguration::default().with_formalization_level(FormalizationLevel::Concept),
    ),
    "statehood(Q782, Q35657)\n"
);
```

The same `reconstruct_text_as` entry point translates a document parsed from one
markup/document format into another over the shared formatting concept layer.
Same-format targets stay byte-exact; cross-format targets carry the document's
heading/paragraph/list and bold/italic/link structure, degrading through the
documented per-format fallbacks (see `docs/cross-format-fidelity.md`):

```rust
use meta_language::{LinkNetwork, ParseConfiguration};

let network = LinkNetwork::parse(
    "# Status Report\n\nThe system is **ready** for *launch*.",
    "Markdown",
    ParseConfiguration::default(),
);

let html = network.reconstruct_text_as("HTML", ParseConfiguration::default());
assert!(html.contains("<h1>Status Report</h1>"));
assert!(html.contains("<strong>ready</strong>"));

// txt is the lossy fallback floor: prose survives, markup is dropped.
let txt = network.reconstruct_text_as("txt", ParseConfiguration::default());
assert!(txt.contains("The system is ready for launch"));
assert!(!txt.contains('#'));
```

## CLI

```bash
cargo run -- describe
cargo run -- verify --language plain-text --text "alpha beta"
cargo run -- infer examples/json-samples/ --format bnf --metrics
cargo run -- import-grammar --format bnf tests/fixtures/grammar/bnf/arithmetic.bnf
cargo run -- emit-grammar --format gbnf grammar.lino --out grammar.gbnf
cargo run -- translate-grammar grammar.lino --from-language en --to-language ru
```

`describe` prints the built-in self-description network as LiNo-style definition
lines that round-trip through `parse()` and `reconstruct_text()`. `verify` parses
the text with the lossless text boundary and exits successfully when the
resulting region has no error or missing links.

The grammar subcommands expose the grammar IR pipeline from the shell. `infer`
reads one or more example files or directories, infers a grammar, and writes
LiNo by default; `--format bnf|ebnf|abnf|peg|gbnf|tree-sitter` renders another
supported notation, `--metrics` writes precision/recall/F1/runtime to stderr,
and `--out <path>` writes the grammar to a file. `import-grammar` reads
`bnf|ebnf|abnf|peg|antlr|lark|gbnf|tree-sitter` input and renders LiNo by
default or another `--to` notation. `emit-grammar` reads LiNo or native grammar
surface input and emits `bnf|ebnf|abnf|peg|gbnf|tree-sitter`. Unsupported target
formats report a clear non-zero CLI error. `translate-grammar` rewrites
concept-aligned rule names and documentation to the requested target language
while preserving grammar structure.

## Grammar subsystem

The grammar layer stores authored, imported, inferred, translated, and generated
grammars as first-class links instead of side data. Its IR is built from
`Grammar`, `GrammarRule`, `GrammarExpr`, `RuleKind`, and `GrammarFormat`, lowers
through `ToLinks` / `FromLinks` as `LinkType::Grammar`, and is documented in
[docs/grammar](docs/grammar/README.md).

Start with the [architecture overview](docs/grammar/architecture.md) for the IR,
links encoding, and pipeline. The stage guides cover
[authoring](docs/grammar/authoring.md),
[import and export](docs/grammar/import-export.md),
[code generation](docs/grammar/codegen.md),
[inference](docs/grammar/inference.md),
[translation](docs/grammar/translation.md), and
[CLI/runtime integration](docs/grammar/cli-and-runtime.md).

## Parity Implementation

The crate exposes `PARITY_TARGETS`, `MARKUP_LANGUAGE_TARGETS`,
`PROGRAMMING_LANGUAGE_TARGETS`, `SECOND_TIER_PROGRAMMING_LANGUAGE_TARGETS`,
`NATURAL_LANGUAGE_TARGETS`, `DATA_FORMAT_TARGETS`, and
`GRAMMAR_EMBEDDING_TARGETS` so comparison scope is part of the tested Rust API.
It also exposes `PARITY_FIXTURES`, with executable, provenance-tracked fixtures
covering every advertised target capability, and `LANGUAGE_FIXTURES`, with a
lossless fixture for every requested language target.
Grammar notation round-trip fidelity is tracked in
[docs/grammar/fidelity.md](docs/grammar/fidelity.md) through the
`grammar_format_profile` API.
The current registry tracks tree-sitter, LibCST, Recast, jscodeshift, Rowan,
cstree, Roslyn, links-notation, link-cli, lino-objects-codec,
relative-meta-logic, formal-ai, and meta-expression.
Internal ecosystem fixtures now structurally parse links-notation doublet,
triplet, tuple, indented, and self-reference cases; link-cli create/update/delete/swap
substitutions; lino object round-trip, shared-reference, and circular-reference
cases; relative-meta-logic dependent, many-valued, probabilistic, and
liar-paradox cases;
formal-ai seed and benchmark `.lino` corpora; and meta-expression formalize and
naturalize examples backed by the verified 351-concept lexicon.

See [docs/parity-roadmap.md](docs/parity-roadmap.md) for the feature matrix,
executable fixture gates, and language coverage targets.

## Website

The project website at <https://link-foundation.github.io/meta-language> is
built and deployed by the `Deploy Website` job in
[`.github/workflows/release.yml`](.github/workflows/release.yml). It is assembled
by [`scripts/build-site.rs`](scripts/build-site.rs) into `_site/`:

| Path | Contents |
| --- | --- |
| `/` | Landing page (description, interactive demo, docs links) — [`docs/site/`]docs/site |
| `/demo/pkg/` | WebAssembly demo built from the [`web/`]web crate (wraps `links-notation`) |
| `/api/` | `rustdoc` output, with a root redirect so `/api/` does not 404 |

Build and preview the site locally:

```bash
# Requires the wasm32-unknown-unknown target and wasm-pack:
#   rustup target add wasm32-unknown-unknown
#   cargo install wasm-pack
rust-script scripts/build-site.rs
python3 -m http.server -d _site 8000   # open http://localhost:8000
```

Deploying `cargo doc` output directly used to make the Pages root URL return
HTTP 404 because `cargo doc` writes `target/doc/<crate>/index.html` but no root
`index.html` (see [issue #90](https://github.com/link-foundation/meta-language/issues/90)
and [docs/case-studies/issue-90](docs/case-studies/issue-90)). The assembled
site puts the landing page at the root and keeps the API docs under `/api/`.

**One-time setup:** in the repository's *Settings → Pages*, set *Source* to
*GitHub Actions*. Without this, the first deploy fails on `actions/deploy-pages`
with "Get Pages site failed". This cannot be configured from a workflow.

## Development

```bash
cargo fmt --check
cargo clippy --all-targets --all-features
cargo test --all-features
rust-script scripts/check-no-src-tests.rs
```

The WebAssembly demo crate in [`web/`](web) is standalone (not part of the
published crate). Check it with:

```bash
cd web && cargo test && cargo clippy --all-targets
```

This repository uses changelog fragments in `changelog.d/`; code changes should
include a fragment with the intended semantic-version bump.