geno 0.5.0

A cross-language schema compiler that generates type definitions and serialization code from a simple, declarative schema language.
Documentation
# Geno

[![coverage](https://shields.io/endpoint?url=https://raw.githubusercontent.com/jlyonsmith/geno/main/coverage.json)](https://github.com/jlyonsmith/geno/blob/main/coverage.json)
[![Crates.io](https://img.shields.io/crates/v/geno.svg)](https://crates.io/crates/geno)
[![Docs.rs](https://docs.rs/geno/badge.svg)](https://docs.rs/geno)

A cross-language schema compiler that generates type definitions and serialization code from a simple, declarative schema language.

Define your data types once in a `.geno` file, then generate idiomatic code for multiple target languages.

The name **geno** comes from the word **genome**, the set of genetic instructions containing all information needed for an organism to develop, function, and reproduce.

> This project is still in development. In particular, the schema language is not yet stable. Please feel free to contribute!

## Why?

There are several existing packing protocols with schema languages. In particular:

- [FlatBuffers]https://flatbuffers.dev/
- [Cap'n Proto]https://capnproto.org/
- [Protocol Buffers]https://protobuf.dev/ (also referred to as ProtoBuf)

 While they all generally have excellent programming language support, the schema languages for each are understandably tied to the underlying packing algorithms, and can be a little quirky. They also have some gaps. For example, it seems that most of these protocols existed before nullable types were standard across programming languages.

For my projects, I have found that the [MessagePack](https://msgpack.org/) protocol is actually the easiest packing protocol to work with across languages, even if it is a little slower than the others. It's easy to integrate, perhaps because it is the closest to JSON, and JSON is still the most universal serialization format on the Internet.

I originally built Geno as a schema language for MessagePack. But I realized tha Geno could easily supports other formats, such as JSON, YAML, TOML and [TOON](https://github.com/toon-format/toon) and so on. You could even use it to generate schemas for any of the above protocols.  So really, Geno is a schema definition language that easily supports combination of modern language and packing protocol.

Finally, I designed the AST for Geno to be a simple as possible, which makes it easy for Claude Code and other AI's to comprehend in a small number of tokens.  This ought to make it easy to create generators for your programming language and packing protocol of choice.

## Architecture

Geno uses a multi-process pipeline. The main `geno` binary parses the schema and serializes the AST to MessagePack. It then pipes those bytes to a code generator binary (`geno-<format>`) via stdin, which writes generated source code to stdout.

```
.geno file ──► geno (parser + validator) ──► MessagePack AST ──► geno-<format> ──► source code
```

## Schema Language

The recommended extension for Geno files is `.geno`.  Geno schemas consist of a single `meta` section followed by any number of `enum` and `struct` declarations.  Schemas can be nested using the `include` statement.  For example, you could have a file called `common.geno`:

```geno
#[format = 1]

enum fruit: i16 {
    apple = 1,
    orange = 2,
    kiwiFruit = 3,
    pear, // auto-incremented to 4
}
```

And another file in the same directory called `order.geno`:

```geno
#[format = 1]

include "./common.geno"

struct order {
    id: u64,
    name: string,
    quantity: i32,
    price: f64,
    fruit: fruit,
    tags: [string],
    metadata: {string: string},
    notes: string?,       // nullable
    items: [order; 10],   // fixed-length array
}
```

Whether nesting is preserved in the generated code is dependent on the generator implementation; the AST structures track the nesting.

### Metadata

One metadata value is required to define the schema format being used:

| Key      | Values | Description |
|----------|--------|-------------|
| `format` | `1`    | This is the only supported format value at present |

Metadata can contain any other values you want.  Use the `ast::Schema` struct to access the values in your code generator.

### Types

| Category | Types |
|----------|-------|
| Integers | `i8`, `u8`, `i16`, `u16`, `i32`, `u32`, `i64`, `u64` |
| Floats | `f32`, `f64` |
| Other | `string`, `bool` |
| Arrays | `[T]` variable-length, `[T; N]` fixed-length |
| Maps | `{K: V}` where `K` is a builtin type |
| Nullable | Append `?` to any type |
| User-defined | Reference any declared enum or struct by name |

### Enums

Enums have an optional integer base type (which defaults to `i32`). Variant values must be given explicitly and there cannot be variants with the same  value.

```
enum color: u8 {
    red = 1,
    green = 2,
    blue = 3,
}
```

Integer literals support decimal, hex (`0xFF`), and binary (`0b1010`) notation.

### Comments

Single-line comments with `//` are supported.

## Code Generators

Geno comes with some built-in generators for several language/encoding formats and serve as examples of how to write your own generator:

| Format | Binary | Description |
|--------|--------|-------------|
| `rust-serde` | `geno-rust-serde` | Rust structs/enums with `Serialize`/`Deserialize` derives |
| `dart-mp` | `geno-dart-mp` | Dart classes/enums with MessagePack `toBytes`/`fromBytes` serialization |
| `dart-json` | `geno-dart-json` | Dart classes/enums with `json_annotation` and `json_serializable` support |

### Rust and Serde

The binary `geno-rust-serde` generates code that:

- Derives `Debug`, `Clone`, `PartialEq`, `Serialize`, `Deserialize`
- Converts type names to `PascalCase` and field names to `snake_case`
- Adds `#[serde(rename = "...")]` when names are converted
- Maps arrays to `Vec<T>` or `[T; N]`, maps to `HashMap<K, V>`, nullable to `Option<T>`

### Dart and MessagePack

The binary `geno-dart-mp` generates code that:

- Generates classes with `final` fields and constructors with `required` named arguments
- Converts type names to `PascalCase` and field/variant names to `lowerCamelCase`
- Generates `toBytes()` and `static fromBytes()` methods using the [`messagepack`]https://pub.dev/packages/messagepack package
- Handles nested structures, nullable types, lists, and maps
- All Dart integer types map to `int`, floats to `double`

### Dart and JSON

The binary `geno-dart-json` generates code that:

- Generates classes with `json_annotation`
- Converts type names to `PascalCase` and field/variant names to `lowerCamelCase`
- Generates classes with `fromJson` and `toJson` methods to support `json_serializable` codegen
- Handles nested structures, nullable types, lists, and maps
- All Dart integer types map to `int`, floats to `double`

### Command Line

```
Geno is a schema compiler for generating source code from a schema definition.

Usage: geno [OPTIONS] <INPUT_FILE> [EXTRA_ARGS]...

Arguments:
  <INPUT_FILE>
          Input .geno file
  [EXTRA_ARGS]...

Options:
  -o, --output-path <OUTPUT_FILE>
          Output file path for the generated source code, or STDOUT if not provided
  -t, --ast-path <AST_FILE>
          Intermediate AST file path for debugging. Program will write the AST to this file in MessagePack format then exit
  -f, --format <FORMAT>
          Output source code format (e.g. -f dart-json or -f rust-rmp)
  -h, --help
          Print help (see a summary with '-h')
  -V, --version
          Print version```

Note that can pass arguments to the generators by adding `--` at the end of the command line.

Set `GENO_DEBUG=1` to invoke code generators via `cargo run` instead of looking for installed binaries on `PATH`.

```bash
GENO_DEBUG=1 geno schema.geno -f rust-serde
```

See the `GenoError` enumeration in the documentation for the list of errors that the parser/validator looks for.

Code generators are standalone binaries that read a MessagePack-encoded `Schema` from stdin. This makes it straightforward to add new target languages without modifying the core parser.

### Example Usage

```bash
# Generate Rust code to stdout
geno schema.geno -f rust-serde

# Generate Dart code to a file
geno schema.geno -f dart-mp -o lib/generated.dart

# Dump the intermediate AST for debugging
geno schema.geno -t schema.ast
```

## Building

Building the `geno` core requires the Rust toolchain.  Generators can be written in any language, and just need to conform to the `geno-` prefix naming convention and be in the path to be used.

```bash
# Build all binaries
cargo build --release

# Install to ~/.cargo/bin
cargo install --path .
```