phenotyper-cli 0.2.0

CLI for the Phenotyper compiler
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
<p align="center">
  <img src="static/phenotyper-logo.svg" alt="Phenotyper Logo" width="300">
</p>

# Phenotyper

**Phenotyper** is a domain-specific language and compiler for defining the shape of structured textual artifacts and generating typed tooling that can construct, render, validate, and eventually parse them.

It is designed for outputs that are too structured to be treated as free-form text, but too artifact-shaped to fit naturally into ordinary data schemas alone.

Think:
- structured prompts
- CSV-like tabular text
- reports and semi-formal documents
- code and configuration fragments
- markup and transformation instructions
- other human-readable artifacts with a stable, meaningful shape

Phenotyper aims to make those outputs:
- more predictable
- more reusable
- easier to validate
- easier to generate safely from code
- easier to standardize across LLMs, tools, and time

---

## Why Phenotyper exists

Modern systems often need to produce artifacts that are not just data and not just prose.

A template engine can substitute values into a mostly textual skeleton. A schema language can define the shape of data. A parser generator can recognize input. But many real-world outputs live in the space between those tools.

Phenotyper is built for that middle space.

It lets you define an artifact family directly in a readable DSL, then generate typed Rust APIs that build valid instances of that family. Over time, the same definitions can also drive validation, parsing, and round-tripping.

In that sense, Phenotyper is closer to **"artifact schema + rendering algebra + generated tooling"** than to a plain template system.

---

## Core idea

A Phenotyper source file defines:
- a **structural namespace** (e.g., `aivolution/format/csv:`)
- reusable **types** (unions, enums, type aliases)
- named **phenotypes** with fields and render expressions
- singular/plural phenotype relationships
- **nested phenotypes** with parent-scoped field references
- a constrained structure for building valid artifacts

From that, the compiler can generate:
- typed builders
- renderers
- validators
- dedicated wrapper types for plural companion phenotypes
- Rust enums for named enum types and union-like field choices
- later, parsers and reverse mappings

---

## A first taste

### Markdown-based source form

Phenotyper supports documentation-rich source documents in Markdown. Phenotype code lives in fenced `pht` blocks.

````markdown
# CSV artifact family

This family models a CSV-like format.

```pht
aivolution/format/csv:

type ScalarValue: {int64, real64, string, date, time, datetime};
type Visibility: [public, protected, private];

CSVFieldValue plural CSVFieldValues:
    value: required ScalarValue,
    @(value)
;

CSVLine plural CSVLines:
    values: required CSVFieldValues,
    separator: required string,
    @join(values, separator)
;

.
```
````

### Pure `.pht` source form

```pht
aivolution/format/csv:

// Reusable union-like type
type ScalarValue: {int64, real64, string, date, time, datetime};

/* Closed symbolic enum type */
type Visibility: [public, protected, private];

CSVFieldValue plural CSVFieldValues:
    value: required ScalarValue,
    @(value)
;

.
```

---

## Language highlights

### Structural Namespaces

Phenotyper uses structural namespace declarations with `/` as the
path separator, terminated by `:` at the start and `.` at the end:

```pht
aivolution/format/csv:

// ... declarations ...

.
```

This maps naturally to generated Rust modules:

- DSL namespace: `aivolution/format/csv`
- Rust module path: `aivolution::format::csv`

### Reusable named types

```pht
type ScalarValue: {int64, real64, string, Date, Time, Datetime};
type CsvText: string;
```

Phenotyper supports reusable type declarations using:

```pht
type Name: TypeExpression;
```

### Enums

Enums are namespace-level named types:

```pht
type Visibility: [public, protected, private];
```

These map naturally to dedicated Rust enums.

### Singular/plural phenotype declarations

A phenotype can declare its plural companion explicitly:

```pht
CSVLine plural CSVLines:
    values: required CSVFieldValues,
    separator: required string,
    @join(values, separator)
;
```

This makes the DSL more natural to read and allows code generation to preserve semantic collection types rather than collapsing everything into anonymous vectors.

### Nested phenotype declarations

Phenotype bodies can contain other phenotype declarations for modeling
hierarchical structures:

```pht
JavaClass plural JavaClasses:
    name: required string,

    Constructor plural Constructors:
        argList: optional string,
        @(JavaClass/name), "(", @(argList)?, ")"
    ;,

    ctors: required Constructors,
    "class ", @(name), " { ... }"
;
```

Nested types reference parent fields with `@(Parent/field)` and are
flattened into independent Rust structs at compile time.

### Render expressions

Phenotyper's output side is expressed through a small, explicit render-expression system.

Supported forms include:

```pht
"literal text"            // verbatim output
@(field)                  // field emission
@(Parent/field)           // parent-scoped field reference
@(optional_field)?        // optional field shorthand
@join(values, ", ")       // join collection with separator
@eol                      // end of line
@ifset(field) { ... }     // conditional on optional field
@ifnotempty(field) { ... } // conditional on non-empty collection
```

### Comments in pure `.pht`

Pure source files support:

```pht
// line comments
/* block comments */
```

---

## Two source containers, one core language

Phenotyper supports two normative source containers:

### `.md`
Markdown source documents.

- processed whether or not they contain phenotype content
- `pht` fenced blocks are extracted in document order
- markdown outside `pht` blocks is documentation only

### `.pht`
Pure Phenotyper source.

- parsed directly as the core language
- useful for tests, generated sources, and code-centric workflows

Both forms compile to the same core language model.

A key design requirement is that diagnostics must point to the **exact original line and column in the author’s source file**, whether that file is Markdown or pure `.pht`.

---

## Why not just use a template engine?

Template engines are useful, and Phenotyper is not trying to deny that.

But template engines and Phenotyper optimize for different things.

A general-purpose template engine like Handlebars or Jinja is excellent when you want:
- editable templates
- highly dynamic rendering logic
- familiar loops, helpers, includes, and macros
- looser coupling between structure and data model

Phenotyper is for cases where the artifact family itself deserves a proper type system and generated tooling.

It gives you:
- explicit structural definitions
- type-checked construction APIs
- dedicated collection wrapper types
- reusable union and enum types
- constrained rendering semantics
- a path toward parsing and round-tripping

### Potential performance angle

There is also a likely performance advantage.

A generated Rust builder/renderer for Phenotyper can often render more directly than a general-purpose template runtime because it already knows:
- the exact field set
- the exact output order
- legal cardinalities
- enum and union structure
- how joins and literals compose

That means the generated code can behave much closer to ordinary specialized Rust string-building code, with less runtime lookup and less generic template machinery.

A template engine is not necessarily doing naive string search-and-replace on every render, especially if templates are compiled and cached. In real use, the comparison is more fairly:

- **generated domain-specific Rust rendering code**
- versus **compiled general-purpose template runtime**

Phenotyper should usually have the advantage for fixed, repeatedly-rendered artifact families, especially where output shape matters. The biggest gains are likely to come from:
- reduced dynamic lookup
- simpler looping and join logic
- compile-time knowledge of structure
- fewer runtime shape errors

So the claim is not "templates are slow." The claim is:

> for stable, typed artifact families, generated Rust renderers can be both more reliable and potentially more efficient than general-purpose template execution.

---

## Generated Rust model

Phenotyper is designed to generate Rust that feels explicit and domain-shaped rather than generic and anonymous.

### Example direction

If a phenotype declares:

```pht
CSVLine plural CSVLines:
    values: required CSVFieldValues,
    separator: required string,
    @join(values, separator)
;
```

then code generation can produce:
- `CsvLine`
- `CsvLines`
- `CsvLineBuilder`
- `CsvLines` as a dedicated wrapper type around `Vec<CsvLine>`

Plural companion types are intended to generate dedicated wrapper types, while still making the underlying vector representation easy to access through ergonomic conversions and helpers.

That gives you both:
- semantic clarity in the generated API
- practical collection ergonomics in Rust

### Builder-pattern API sketch

For a CSV-like phenotype such as:

```pht
aivolution/format/csv:

type ScalarValue: {int64, real64, string, date, time, datetime};

CSVFieldValue plural CSVFieldValues:
    value: required ScalarValue,
    @(value)
;

CSVRecord plural CSVRecords:
    values: required CSVFieldValues,
    @join(values, ", ")
;

CSVLine plural CSVLines:
    record: required CSVRecord,
    @(record)
;

CSVFile plural CSVFiles:
    lines: required CSVLines,
    @join(lines, @eol)
;

.
```

Phenotyper could generate Rust along these lines:

```rust
use std::fmt::{self, Write};

#[derive(Debug, Clone)]
pub enum ScalarValue {
    Int64(i64),
    Real64(f64),
    String(String),
    Date(String),
    Time(String),
    Datetime(String),
}

impl ScalarValue {
    pub fn render(&self, out: &mut String) -> fmt::Result {
        match self {
            ScalarValue::Int64(v) => write!(out, "{v}"),
            ScalarValue::Real64(v) => write!(out, "{v}"),
            ScalarValue::String(v) => write!(out, "{v}"),
            ScalarValue::Date(v) => write!(out, "{v}"),
            ScalarValue::Time(v) => write!(out, "{v}"),
            ScalarValue::Datetime(v) => write!(out, "{v}"),
        }
    }
}

#[derive(Debug, Clone)]
pub struct CsvFieldValue {
    value: ScalarValue,
}

impl CsvFieldValue {
    pub fn builder() -> CsvFieldValueBuilder {
        CsvFieldValueBuilder { value: None }
    }

    pub fn render(&self, out: &mut String) -> fmt::Result {
        self.value.render(out)
    }
}

pub struct CsvFieldValueBuilder {
    value: Option<ScalarValue>,
}

impl CsvFieldValueBuilder {
    pub fn value_int64(mut self, value: i64) -> Self {
        self.value = Some(ScalarValue::Int64(value));
        self
    }

    pub fn value_real64(mut self, value: f64) -> Self {
        self.value = Some(ScalarValue::Real64(value));
        self
    }

    pub fn value_string<S: Into<String>>(mut self, value: S) -> Self {
        self.value = Some(ScalarValue::String(value.into()));
        self
    }

    pub fn value_date<S: Into<String>>(mut self, value: S) -> Self {
        self.value = Some(ScalarValue::Date(value.into()));
        self
    }

    pub fn value_time<S: Into<String>>(mut self, value: S) -> Self {
        self.value = Some(ScalarValue::Time(value.into()));
        self
    }

    pub fn value_datetime<S: Into<String>>(mut self, value: S) -> Self {
        self.value = Some(ScalarValue::Datetime(value.into()));
        self
    }

    pub fn build(self) -> Result<CsvFieldValue, BuildError> {
        Ok(CsvFieldValue {
            value: self.value.ok_or(BuildError::MissingField("value"))?,
        })
    }
}

#[derive(Debug, Clone)]
pub struct CsvFieldValues {
    items: Vec<CsvFieldValue>,
}

impl CsvFieldValues {
    pub fn builder() -> CsvFieldValuesBuilder {
        CsvFieldValuesBuilder { items: Vec::new() }
    }

    pub fn from_vec(items: Vec<CsvFieldValue>) -> Self {
        Self { items }
    }

    pub fn into_vec(self) -> Vec<CsvFieldValue> {
        self.items
    }

    pub fn render_joined(&self, out: &mut String, joiner: &str) -> fmt::Result {
        let mut first = true;
        for item in &self.items {
            if !first {
                out.push_str(joiner);
            }
            first = false;
            item.render(out)?;
        }
        Ok(())
    }
}

pub struct CsvFieldValuesBuilder {
    items: Vec<CsvFieldValue>,
}

impl CsvFieldValuesBuilder {
    pub fn push(mut self, value: CsvFieldValue) -> Self {
        self.items.push(value);
        self
    }

    pub fn push_string<S: Into<String>>(mut self, value: S) -> Self {
        let field = CsvFieldValue::builder()
            .value_string(value)
            .build()
            .expect("builder generated invalid field");
        self.items.push(field);
        self
    }

    pub fn push_int64(mut self, value: i64) -> Self {
        let field = CsvFieldValue::builder()
            .value_int64(value)
            .build()
            .expect("builder generated invalid field");
        self.items.push(field);
        self
    }

    pub fn build(self) -> Result<CsvFieldValues, BuildError> {
        Ok(CsvFieldValues { items: self.items })
    }
}

#[derive(Debug, Clone)]
pub struct CsvRecord {
    values: CsvFieldValues,
}

impl CsvRecord {
    pub fn builder() -> CsvRecordBuilder {
        CsvRecordBuilder { values: None }
    }

    pub fn render(&self, out: &mut String) -> fmt::Result {
        self.values.render_joined(out, ", ")
    }
}

pub struct CsvRecordBuilder {
    values: Option<CsvFieldValues>,
}

impl CsvRecordBuilder {
    pub fn values(mut self, values: CsvFieldValues) -> Self {
        self.values = Some(values);
        self
    }

    pub fn build(self) -> Result<CsvRecord, BuildError> {
        Ok(CsvRecord {
            values: self.values.ok_or(BuildError::MissingField("values"))?,
        })
    }
}

#[derive(Debug, Clone)]
pub struct CsvLine {
    record: CsvRecord,
}

impl CsvLine {
    pub fn builder() -> CsvLineBuilder {
        CsvLineBuilder { record: None }
    }

    pub fn render(&self, out: &mut String) -> fmt::Result {
        self.record.render(out)
    }
}

pub struct CsvLineBuilder {
    record: Option<CsvRecord>,
}

impl CsvLineBuilder {
    pub fn record(mut self, record: CsvRecord) -> Self {
        self.record = Some(record);
        self
    }

    pub fn build(self) -> Result<CsvLine, BuildError> {
        Ok(CsvLine {
            record: self.record.ok_or(BuildError::MissingField("record"))?,
        })
    }
}

#[derive(Debug, Clone)]
pub struct CsvLines {
    items: Vec<CsvLine>,
}

impl CsvLines {
    pub fn builder() -> CsvLinesBuilder {
        CsvLinesBuilder { items: Vec::new() }
    }

    pub fn render_joined(&self, out: &mut String, eol: &str) -> fmt::Result {
        let mut first = true;
        for item in &self.items {
            if !first {
                out.push_str(eol);
            }
            first = false;
            item.render(out)?;
        }
        Ok(())
    }
}

pub struct CsvLinesBuilder {
    items: Vec<CsvLine>,
}

impl CsvLinesBuilder {
    pub fn push(mut self, line: CsvLine) -> Self {
        self.items.push(line);
        self
    }

    pub fn build(self) -> Result<CsvLines, BuildError> {
        if self.items.is_empty() {
            return Err(BuildError::CardinalityViolation("lines must not be empty"));
        }
        Ok(CsvLines { items: self.items })
    }
}

#[derive(Debug, Clone)]
pub struct CsvFile {
    lines: CsvLines,
}

impl CsvFile {
    pub fn builder() -> CsvFileBuilder {
        CsvFileBuilder { lines: None }
    }

    pub fn render(&self) -> Result<String, fmt::Error> {
        let mut out = String::new();
        self.lines.render_joined(&mut out, "\n")?;
        Ok(out)
    }
}

pub struct CsvFileBuilder {
    lines: Option<CsvLines>,
}

impl CsvFileBuilder {
    pub fn lines(mut self, lines: CsvLines) -> Self {
        self.lines = Some(lines);
        self
    }

    pub fn build(self) -> Result<CsvFile, BuildError> {
        Ok(CsvFile {
            lines: self.lines.ok_or(BuildError::MissingField("lines"))?,
        })
    }
}

#[derive(Debug)]
pub enum BuildError {
    MissingField(&'static str),
    CardinalityViolation(&'static str),
}
```

And using that generated API could look like this:

```rust
let record1 = CsvRecord::builder()
    .values(
        CsvFieldValues::builder()
            .push_string("Alice")
            .push_int64(42)
            .build()?
    )
    .build()?;

let record2 = CsvRecord::builder()
    .values(
        CsvFieldValues::builder()
            .push_string("Bob")
            .push_int64(37)
            .build()?
    )
    .build()?;

let csv = CsvFile::builder()
    .lines(
        CsvLines::builder()
            .push(CsvLine::builder().record(record1).build()?)
            .push(CsvLine::builder().record(record2).build()?)
            .build()?
    )
    .build()?;

println!("{}", csv.render()?);
```

Output:

```text
Alice, 42
Bob, 37
```

That is the API style Phenotyper is aiming for: generated Rust that follows the builder pattern, preserves the vocabulary of the phenotype, and makes invalid output harder to construct.

---

## Design principles

Phenotyper is being shaped around a few core principles.

### Human-readable first
The DSL should remain readable to humans and preserve the visible structure of the artifact family.

### Declarative structure
Phenotyper should describe **what a valid artifact looks like**, not become a general-purpose programming language.

### Generated tooling from one source of truth
A single source definition should drive builders, renderers, validators, and later parsers.

### Strong artifact identity
Collections, enums, union-like field choices, and render expressions should remain visible as first-class concepts.

### Documentation-friendly authoring
Phenotypes should be easy to explain inline, which is why Markdown-based source documents are first-class.

---

## Compiler pipeline

The v2 compiler pipeline:

1. Read source container (`.md` or `.pht`)
2. If Markdown, extract `pht` blocks and build a source map
3. Parse the Phenotyper language (GLR parser via Rustemo)
4. Build an AST with structural namespace, nested types, and `?` operators
5. Collect symbols (two-pass name resolution)
6. Normalize to an IR (flatten nested types, resolve parent context)
7. Validate structure, cardinality, and render-expression correctness
8. Generate Rust builders, types, and renderers
9. Later: generate parsers and reverse mappings

---

## Name resolution model

Phenotyper's name-resolution model is intentionally simple.

- every file belongs to exactly one structural namespace
- local names must be unique within that namespace
- nested phenotype names are scoped to their parent
- parent fields are referenced via qualified paths (e.g., `@(Parent/field)`)
- duplicate names within the same namespace are hard errors

---

## Current status

Phenotyper v2 is a **working compiler** with a complete pipeline:

| Component | Status |
|-----------|--------|
| Lexer & source map | ✅ Tokenizer with markdown extraction |
| Parser | ✅ GLR grammar via Rustemo |
| Symbol table | ✅ Two-pass name resolution with nested scope support |
| Intermediate representation | ✅ Normalized IR with parent context tracking |
| Semantic validation | ✅ Type, render, nesting, and generation checks |
| Diagnostics | ✅ Rich human-readable and JSON output |
| Code generation | ✅ Idiomatic Rust with `render_with_parent` for nested types |
| CLI |`check`, `build`, `dump-ast`, `dump-ir` |
| Build integration |`phenotyper_core::compile()` API |
| Test suite | ✅ 257 tests (unit, e2e, CLI, compile, runtime) |

### Quick start

```bash
# Install from source
cargo install --path crates/phenotyper-cli

# Check a source file
phenotyper check path/to/file.pht

# Generate Rust code
phenotyper build path/to/file.pht --out generated/

# Use from build.rs
# See docs/howto/build_rs_integration.md
```

### Documentation

- [Authoring phenotypes]docs/howto/authoring_phenotypes.md — DSL syntax guide
- [Using generated code]docs/howto/using_generated_code.md — Rust API guide
- [Compiler usage]docs/howto/compiler_usage.md — CLI reference
- [Build script integration]docs/howto/build_rs_integration.md`build.rs` guide
- [Worked examples]docs/examples/ — CSV, prompt, config, report, javaclass

---

## Project goals

Phenotyper is a foundation for:
- structured prompt engineering
- robust artifact generation for AI systems
- typed textual interfaces
- reusable format definitions
- eventually, round-trippable artifact specifications

> Define the shape of an artifact once, then generate the tooling needed to create it correctly.

---

## License

Apache-2.0