Warning: This project is
v0— the protocol, schemas, and APIs are subject to breaking changes without notice until a formal release.
An open standard protocol — GAP — that lets LLMs declare, diff, and reprovision text artifacts with minimal token expenditure. Includes a Rust reference implementation of the apply engine plus a Python evaluation framework for measuring token efficiency against real LLM runs.
Features
- Envelope system — three operation types (
synthesize,edit,handle) for full generation, targeted updates, and lightweight references - Stateless apply engine — pure function, no I/O, ~2μs per edit; portable to browsers (WASM), IDEs, CLIs, or service backends
- ID-based targeting —
<gap:target id="ID">markers and JSON Pointer paths eliminate hallucinated search strings - Format-agnostic — works with HTML, Python, JavaScript, JSON, YAML, Rust, Go, SVG, and more
- 90-99% output token reduction per edit, translating to 43-86% total cost savings (cost model)
- SSE transport binding — wire format for streaming with reconnection support (GAP-SSE)
- Evaluation framework — 89 experiment datasets measuring token efficiency and reliability against real LLM runs
Install
Rust crate:
From source (full workspace):
Requires Rust (stable), uv (for evals), and optionally just (for recipes).
Quick Start
# Build the Rust library
# Run tests
# Run criterion benchmarks (apply engine speed)
# Sync workspace — build FFI via maturin + Python packages
# Run LLM evaluations
# Generate report from experiment metrics
Usage
How it works
LLM ──produces──▶ envelope ──apply──▶ (artifact, handle)
▲
gap (stateless, ~2μs)
- An LLM produces an artifact envelope (JSON) — either a
synthesizeenvelope (full content with target markers) or aneditenvelope (targeted changes by ID or JSON Pointer). - The apply engine resolves the envelope against the current artifact state to produce the updated artifact and a lightweight handle.
- The orchestrator holds handles; the resolved artifact is stored and consumed by downstream tools — browsers, IDEs, etc.
Apply engine
The core of the library is a single stateless function:
| Envelope | Direction | Description |
|---|---|---|
| synthesize | input | Complete artifact content (baseline or reset) with <gap:target> markers |
| edit | input | Targeted changes via ID (<gap:target> markers) or JSON Pointer |
| handle | output | Lightweight reference returned after every synthesize or edit |
Recipes
| Recipe | Description |
|---|---|
just build |
Compile the Rust library |
just test |
Run Rust unit tests |
just bench |
Criterion micro-benchmarks (apply engine speed) |
just bind |
Sync workspace — build FFI via maturin + Python packages |
just run [count] [model] [id] [provider] |
Run conversation benchmark experiments (base vs GAP flows) |
just report |
Generate markdown report from experiment metrics |
Cost model
GAP saves tokens by replacing full artifact regeneration with small diff envelopes. The savings vary with the model's tokenizer, output/input price ratio, and whether a cheaper model handles diffs. See the full derivation in the spec.
The maintain context reads the full artifact ($S$ input tokens) and produces an edit envelope ($d$ output tokens, where $d$ is typically 1–5% of $S$). The apply engine resolves the edit at zero token cost (CPU, ~2μs).
- Output token reduction: $d$ instead of $S$ per edit (95–99% fewer output tokens)
- Context flattening: each edit reads only the current artifact ($S$), not all prior versions ($k \cdot S$ at edit $k$)
- Model asymmetry: the maintain context can use a cheaper model, multiplying savings further
Example (2,000-token artifact, 30-token edit, $r = p_{\text{out}}/p_{\text{in}} = 4\text{x}$):
| After $N$ edits | Naive conversation | GAP | Total savings |
|---|---|---|---|
| 1 | $0.071 | $0.039 | 45% |
| 5 | $0.304 | $0.070 | 77% |
| 10 | $0.763 | $0.107 | 86% |
Payload benchmarks
Payload size and apply time for each envelope type, measured against an 8 KB HTML dashboard fixture.
Note: "Payload savings" measures byte reduction — a proxy for output token reduction but not identical (tokenizers vary). See cost model for the full derivation.
| Envelope | Scenario | Payload | % of Full | Payload savings | Apply Time |
|---|---|---|---|---|---|
| synthesize | Full generation (baseline) | 8,164 B | 100.0% | — | 1 ns |
| edit | 1 value replace (ID targeting) | 12 B | 0.1% | 99.9% | 1.5 µs |
| edit | 4 value replaces (ID targeting) | 50 B | 0.6% | 99.4% | 3.5 µs |
| edit | 1 section replace (ID targeting) | 441 B | 5.4% | 94.6% | 1.4 µs |
| edit | 2 section replaces (ID targeting) | 516 B | 6.3% | 93.7% | 3.8 µs |
License
This project is dual-licensed:
- Code (
src/,evals/,benches/, build files) — Apache License 2.0 - Specification & docs (
spec/,assets/, documentation) — CC-BY 4.0
See NOTICE for details. Attribution is required under both licenses.