LiteDoc

Deterministic document format for AI agents and LLM output. Explicit block fencing, zero-copy parsing, and error recovery.

Why LiteDoc?

Markdown is ambiguous. Indentation rules vary, edge cases abound, and parsers disagree.[1] LiteDoc uses explicit ::block fencing for deterministic parsing that recovers gracefully from malformed input, which is ideal for machine-generated content in LLM pipelines where output parsing can fail when formatting is off.[2]

Status

v0.1.0 - Initial release with Rust parser and CLI. APIs are intended to be stable within v0.1 but may evolve.

Stability & Compatibility

LiteDoc follows semantic versioning. For the v0.1 line, we will not make breaking changes to the core format or public APIs without a version bump and a migration note in the changelog.

Document	Description
LITEDOC_SPEC.md	Language specification
LITEDOC_AST.md	AST reference

Performance

Metric	LiteDoc	Markdown	Improvement
Parse speed	5.451 µs	5.660 µs	3.7% faster
Inline parsing	490 ns	1.313 µs	63% faster
Error recovery	0.89	0.67	33% better

Install

cargo add litedoc-core            # Rust library
pip install litedoc-py                  # Python library
cargo install litedoc-cli          # CLI tool

Usage

Rust

use litedoc_core::{Block, Parser, Profile};

let mut parser = Parser::new(Profile::Litedoc);
let result = parser.parse_with_recovery(input);

for block in &result.document.blocks {
    match block {
        Block::List(list) => process_list(list),
        Block::Table(table) => process_table(table),
        _ => {}
    }
}

Python

import pyld

doc = pyld.parse("# Hello\n\nWorld")
for block in doc.blocks:
    match block:
        case pyld.Heading(level=level):
            print(f"H{level}")
        case pyld.Paragraph(content=content):
            print(content)

CLI

ldcli agent_output.ld            # Parse and display structure
ldcli -j agent_output.ld         # Output as JSON
ldcli validate agent_output.ld   # Check for errors
ldcli stats agent_output.ld      # Show statistics

Format

::list
- First item
- Second item
::

::quote
Quoted text
::

::table
| A | B |
|---|---|
| 1 | 2 |
::

Metadata:

--- meta ---
agent: summarizer-v2
task_id: abc123
timestamp: 1704067200
confidence: 0.92
tags: [summary, final]
---

Benchmarks

cargo bench -p litedoc-core

cargo test -p litedoc-core robustness_report -- --nocapture

CSV output:

ROBUSTNESS_CSV=1 cargo test -p litedoc-core robustness_report -- --nocapture
ROBUSTNESS_BENCH_CSV=1 cargo bench -p litedoc-core robustness_benchmark -- --nocapture

References

[1] CommonMark Spec, “Why is a spec needed?” (notes original Markdown syntax is not unambiguous and implementations diverged). https://spec.commonmark.org/0.31.2/

[2] LangChain docs: OUTPUT_PARSING_FAILURE (example of JSON-in-Markdown parsing failures). https://docs.langchain.com/oss/python/langchain/errors/OUTPUT_PARSING_FAILURE

License

Apache-2.0

litedoc-core 0.1.0