Geno
A cross-language schema compiler that generates type definitions and serialization code from a simple, declarative schema language.
Define your data types once in a .geno file, then generate idiomatic code for multiple target languages.
The name geno comes from the word genome, the set of genetic instructions containing all information needed for an organism to develop, function, and reproduce.
This project is still in development. In particular, the schema language is not yet stable. Please feel free to contribute!
Why?
There are several existing packing protocols with schema languages. In particular:
- FlatBuffers
- Cap'n Proto
- Protocol Buffers (also referred to as ProtoBuf)
While they all generally have excellent programming language support, the schema languages for each are understandably tied to the underlying packing algorithms, and can be a little quirky. They also have some gaps. For example, it seems that most of these protocols existed before nullable types were standard across programming languages.
For my projects, I have found that the MessagePack protocol is actually the easiest packing protocol to work with across languages, even if it is a little slower than the others. It's easy to integrate, perhaps because it is the closest to JSON, and JSON is still the most universal serialization format on the Internet.
I originally built Geno as a schema language for MessagePack. But I realized tha Geno could easily supports other formats, such as JSON, YAML, TOML and TOON and so on. You could even use it to generate schemas for any of the above protocols. So really, Geno is a schema definition language that easily supports combination of modern language and packing protocol.
Finally, I designed the AST for Geno to be a simple as possible, which makes it easy for Claude Code and other AI's to comprehend in a small number of tokens. This ought to make it easy to create generators for your programming language and packing protocol of choice.
Architecture
Geno uses a multi-process pipeline. The main geno binary parses the schema and serializes the AST to MessagePack. It then pipes those bytes to a code generator binary (geno-<format>) via stdin, which writes generated source code to stdout.
.geno file ──► geno (parser + validator) ──► MessagePack AST ──► geno-<format> ──► source code
Schema Language
The recommended extension for Geno files is .geno. Geno schemas consist of a single meta section followed by any number of enum and struct declarations. Schemas can be nested using the include statement. For example, you could have a file called common.geno:
#[format = 1]
enum fruit: i16 {
apple = 1,
orange = 2,
kiwiFruit = 3,
pear, // auto-incremented to 4
}
And another file in the same directory called order.geno:
#[format = 1]
include "./common.geno"
struct order {
id: u64,
name: string,
quantity: i32,
price: f64,
fruit: fruit,
tags: [string],
metadata: {string: string},
notes: string?, // nullable
items: [order; 10], // fixed-length array
}
Whether nesting is preserved in the generated code is dependent on the generator implementation; the AST structures track the nesting.
Metadata
One metadata value is required to define the schema format being used:
| Key | Values | Description |
|---|---|---|
format |
1 |
This is the only supported format value at present |
Metadata can contain any other values you want. Use the ast::Schema struct to access the values in your code generator.
Types
| Category | Types |
|---|---|
| Integers | i8, u8, i16, u16, i32, u32, i64, u64 |
| Floats | f32, f64 |
| Other | string, bool |
| Arrays | [T] variable-length, [T; N] fixed-length |
| Maps | {K: V} where K is a builtin type |
| Nullable | Append ? to any type |
| User-defined | Reference any declared enum or struct by name |
Enums
Enums have an optional integer base type (which defaults to i32). Variant values must be given explicitly and there cannot be variants with the same value.
enum color: u8 {
red = 1,
green = 2,
blue = 3,
}
Integer literals support decimal, hex (0xFF), and binary (0b1010) notation.
Comments
Single-line comments with // are supported.
Code Generators
Geno comes with some built-in generators for several language/encoding formats and serve as examples of how to write your own generator:
| Format | Binary | Description |
|---|---|---|
rust-serde |
geno-rust-serde |
Rust structs/enums with Serialize/Deserialize derives |
dart-mp |
geno-dart-mp |
Dart classes/enums with MessagePack toBytes/fromBytes serialization |
dart-json |
geno-dart-json |
Dart classes/enums with json_annotation and json_serializable support |
Rust and Serde
The binary geno-rust-serde generates code that:
- Derives
Debug,Clone,PartialEq,Serialize,Deserialize - Converts type names to
PascalCaseand field names tosnake_case - Adds
#[serde(rename = "...")]when names are converted - Maps arrays to
Vec<T>or[T; N], maps toHashMap<K, V>, nullable toOption<T>
Dart and MessagePack
The binary geno-dart-mp generates code that:
- Generates classes with
finalfields and constructors withrequirednamed arguments - Converts type names to
PascalCaseand field/variant names tolowerCamelCase - Generates
toBytes()andstatic fromBytes()methods using themessagepackpackage - Handles nested structures, nullable types, lists, and maps
- All Dart integer types map to
int, floats todouble
Dart and JSON
The binary geno-dart-json generates code that:
- Generates classes with
json_annotation - Converts type names to
PascalCaseand field/variant names tolowerCamelCase - Generates classes with
fromJsonandtoJsonmethods to supportjson_serializablecodegen - Handles nested structures, nullable types, lists, and maps
- All Dart integer types map to
int, floats todouble
Command Line
Geno is a schema compiler for generating source code from a schema definition.
Usage: geno [OPTIONS] <INPUT_FILE> [EXTRA_ARGS]...
Arguments:
<INPUT_FILE>
Input .geno file
[EXTRA_ARGS]...
Options:
-o, --output-path <OUTPUT_FILE>
Output file path for the generated source code, or STDOUT if not provided
-t, --ast-path <AST_FILE>
Intermediate AST file path for debugging. Program will write the AST to this file in MessagePack format then exit
-f, --format <FORMAT>
Output source code format (e.g. -f dart-json or -f rust-rmp)
-h, --help
Print help (see a summary with '-h')
-V, --version
Print version```
Note that can pass arguments to the generators by adding `--` at the end of the command line.
Set `GENO_DEBUG=1` to invoke code generators via `cargo run` instead of looking for installed binaries on `PATH`.
```bash
GENO_DEBUG=1 geno schema.geno -f rust-serde
See the GenoError enumeration in the documentation for the list of errors that the parser/validator looks for.
Code generators are standalone binaries that read a MessagePack-encoded Schema from stdin. This makes it straightforward to add new target languages without modifying the core parser.
Example Usage
# Generate Rust code to stdout
# Generate Dart code to a file
# Dump the intermediate AST for debugging
Building
Building the geno core requires the Rust toolchain. Generators can be written in any language, and just need to conform to the geno- prefix naming convention and be in the path to be used.
# Build all binaries
# Install to ~/.cargo/bin