luadata 0.1.6

Parse Lua data files and convert to JSON
Documentation
# Architecture

luadata is a Lua data parser written in Rust, with bindings for Go, Python,
WebAssembly, and a standalone CLI. This document describes the architecture and
the reasoning behind it.

## Overview

```
Rust core library (src/)
├── cdylib (clib/)          → C shared library (.so/.dylib/.dll)
│   └── exports: LuaDataToJSON, LuaDataFree
├── PyO3 module (python/)   → native Python extension
├── wasm-bindgen (wasm/)    → WebAssembly module (~124KB)
└── CLI binary (cli/)       → luadata tojson / validate

Go wrapper (go/)
├── Uses purego to load the Rust cdylib at runtime (no CGO)
├── Embeds platform-specific shared libs via go:embed + build tags
└── Exposes TextToJSON, FileToJSON, ToJSON, ReaderToJSON
```

All language bindings share the same Rust parser — there is exactly one
implementation of the lexer, parser, and JSON converter.

## Why Rust as source of truth

Before the rewrite, luadata was a pure Go library. Go served the core use case
well, but adding Python and WebAssembly support exposed limitations:

- **WASM size**: Go's WASM binary was ~3.2MB (runtime overhead). The Rust WASM
  module is ~124KB — roughly 25x smaller.
- **Python integration**: The Go approach required building a CGO shared library
  and loading it via ctypes, with manual JSON-envelope marshalling. Rust/PyO3
  compiles directly to a native Python extension with native types.
- **Single parser**: Maintaining one parser instead of porting logic across
  languages eliminates behavioral drift between platforms.
- **Go compatibility**: The main objection to Rust was that Go consumers would
  need CGO. With [purego]https://github.com/ebitengine/purego, Go loads the
  Rust shared library at runtime with `CGO_ENABLED=0` — no C toolchain required.

## Repository structure

```
luadata/
├── src/                    Rust core library
│   ├── lib.rs              Module exports
│   ├── lexer.rs            Hand-written character-by-character lexer
│   ├── parser.rs           Recursive descent parser
│   ├── converter.rs        Lua → JSON conversion
│   ├── options.rs          ParseConfig, ArrayMode, EmptyTableMode, StringTransform
│   └── types.rs            Key, Value, KeyValuePairs, RawValue, RawKey
├── cli/                    CLI binary (clap)
│   └── src/main.rs         tojson + validate subcommands
├── clib/                   C shared library (cdylib)
│   └── src/lib.rs          LuaDataToJSON / LuaDataFree FFI exports
├── python/                 PyO3 Python module
│   ├── src/lib.rs          lua_to_json / lua_to_dict
│   ├── pyproject.toml      Package: mmobeus-luadata
│   └── tests/              pytest suite
├── wasm/                   wasm-bindgen module
│   └── src/lib.rs          convertLuaDataToJson (JS name)
├── npm/                    npm package (mmobeus-luadata)
│   ├── package.json        Package metadata
│   ├── index.js            ES module wrapper: init() + convert()
│   ├── index.d.ts          TypeScript type definitions
│   └── wasm/               (built at release time, gitignored)
├── go/                     Pure Go wrapper (no CGO)
│   ├── luadata.go          Public API: TextToJSON, FileToJSON, etc.
│   ├── options.go          Functional options: WithArrayMode, etc.
│   ├── luadata_test.go     Go test suite
│   └── internal/ffi/       purego FFI bridge
│       ├── ffi.go          Call(), init, library loading
│       ├── ffi_unix.go     purego.Dlopen (darwin/linux)
│       ├── ffi_windows.go  syscall.LoadLibrary
│       └── embed_dev.go    Empty embed (dev); replaced on release branch
├── web/                    Browser interface
│   ├── index.html          Live converter UI
│   ├── luadata.js          ES module wrapper: init() + convert()
│   ├── app.js              UI logic
│   └── docs/gen/           "Luadata by Example" generator
├── testdata/               Shared test fixtures
│   ├── valid/              .lua files that must parse
│   └── invalid/            .lua files that must fail
├── scripts/
│   ├── release.sh          Tag RC on main
│   ├── prepare-release.sh  Stage libs + generate embeds on release branch
│   └── validate-folder.sh  Validate testdata against CLI
├── Cargo.toml              Workspace root
└── Makefile                Build, test, lint, release targets
```

## Release workflow

Development happens on `main`. The `go/internal/ffi/lib/` directory is
gitignored — local development uses `make build-clib` to populate it, or sets
`LUADATA_LIB_PATH` to point at a locally-built shared library.

### Cutting a release

1. **Tag RC on main**: `make release` (or `make release BUMP=minor`, etc.) tags
   the current commit as `v<version>-rc.1` and pushes the tag.

2. **CI cross-compiles**: The `release.yml` workflow triggers on `v*-rc.*` tags.
   It builds the Rust cdylib for five platforms:
   - `darwin_amd64`, `darwin_arm64`
   - `linux_amd64`, `linux_arm64`
   - `windows_amd64`

3. **CI tests**: Rust tests and Go tests (using the freshly-built Linux library)
   run in the same workflow.

4. **Prepare release branch**: `scripts/prepare-release.sh` copies the
   cross-compiled shared libraries into `go/internal/ffi/lib/<platform>/`,
   generates platform-specific `go:embed` files (replacing `embed_dev.go`), and
   commits everything to the `release` branch with the final version tag.

5. **GitHub Release**: CI creates a GitHub Release with shared library artifacts.

6. **Publish to registries**: After the release succeeds, three publish jobs run
   in parallel — all using OIDC trusted publishing (no stored secrets):
   - **PyPI**: Builds platform-specific wheels (same five platforms as clib) plus
     an sdist, then publishes `mmobeus-luadata` via `pypa/gh-action-pypi-publish`.
   - **npm**: Builds the WASM module with wasm-pack, packages it with the JS/TS
     wrapper from `npm/`, and publishes `mmobeus-luadata`.
   - **crates.io**: Publishes the `luadata` core crate via
     `rust-lang/crates-io-auth-action` for OIDC token exchange.

### Go consumer model

Go consumers install with:

```
go get github.com/mmobeus/luadata/go@v0.5.0
```

This resolves to the `release` branch commit, which contains the embedded shared
libraries. At runtime, the Go wrapper extracts the platform-appropriate library
to a temp file and loads it via purego. No CGO is involved at any point.

On `main`, `embed_dev.go` provides an empty `EmbeddedLib` byte slice. If
`LUADATA_LIB_PATH` is set, the FFI layer loads the library from that path
instead — this is the local development path.

## Key design decisions

### purego FFI

The Go wrapper uses [purego](https://github.com/ebitengine/purego) to call the
Rust shared library. purego uses assembly trampolines to make C function calls
from pure Go — no CGO, no C compiler, no `CGO_ENABLED=1`. This means:

- `go get` works without a C toolchain
- Cross-compilation works (libraries are pre-built per platform)
- The Go module is a normal Go package from the consumer's perspective

### internal/ffi for platform isolation

Platform-specific code (library loading, embed directives) lives in
`go/internal/ffi/` behind build tags. The public API in `go/luadata.go` is
platform-agnostic and delegates to `ffi.Call()`.

### runtime.KeepAlive for GC safety

When passing Go byte slices to the Rust FFI as C pointers, `runtime.KeepAlive`
ensures the Go garbage collector doesn't collect the backing memory while Rust
is reading it.

### Null-terminated C strings for UTF-8 correctness

The FFI boundary uses null-terminated C strings (`*const c_char` on the Rust
side). The Go wrapper converts Go strings to null-terminated byte slices before
passing them. This avoids length-prefix mismatches and ensures Rust's `CStr`
can safely interpret the data as UTF-8.

### JSON envelope for FFI results

The clib returns results as a JSON envelope: `{"result":"..."}` on success,
`{"error":"..."}` on failure. This keeps the FFI surface minimal (two functions:
`LuaDataToJSON` and `LuaDataFree`) while supporting structured error reporting.