mathlex 0.4.1

Mathematical expression parser for LaTeX and plain text notation, producing a language-agnostic AST
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
# mathlex — Mathcore Integration Specification

**Version:** draft-1 (2026-04-22)
**Status:** draft, pre-freeze
**Target release:** mathlex v0.5.0 (after MX-1..MX-10 land in v0.4.0)
**Companion document:** `thales_v0.9.0_requirements.md` (MX-1..MX-10)

This document specifies the second wave of mathlex work: active consumption
of unit and constant annotations to produce `AnnotatedExpression` (Expression
with unit/constant annotations resolved). These requirements are designated
MI-1..MI-N (Mathlex Integration).

The first wave (MX-1..MX-10) delivered the serde-capable `Expression` AST,
the `AnnotationSet` substrate, and variant-tag stability. That wave is a
prerequisite for every requirement here. Nothing in this document ships
before MX-1..MX-10 are complete.

---

## 1. Scope and Relationship to thales_v0.9.0_requirements.md

### 1.1 What the first wave (MX-1..MX-10) delivered

`thales_v0.9.0_requirements.md` covers the wire-format and substrate layer:

- **MX-1** — serde derives on `Expression`; lossless round-trip.
- **MX-2** — variant tag stability; golden-file regression tests.
- **MX-3**`AnnotationSet` substrate per RFC §M-R1; opaque to thales v0.9.0.
- **MX-4** — existing parsers unchanged; empty `AnnotationSet` by default.
- **MX-5..MX-9** — derivative/ODE stability, no silent breaking changes,
  fixture files, documentation, version alignment.
- **MX-10** — unit propagation and constant resolution explicitly deferred.

The first wave is complete when mathlex ships v0.4.0 and thales ships v0.9.0.

### 1.2 What this wave (MI-1..MI-N) adds

This document defines how mathlex actively consumes unit and constant
annotations to validate, transform, and annotate the `Expression` tree. The
result is an `AnnotatedExpression` carrying:

- The parsed `Expression` tree (unchanged in shape from MX-1 parse output).
- A populated `AnnotationSet` per node, recording resolved `UnitExpression`
  for unit-bearing sub-expressions and `ConstantId` for named-constant
  symbols.
- A root-level `output_unit` giving the factored result unit of the whole
  expression after all conversions are applied.
- A `warnings` list for non-fatal annotation issues.

### 1.3 What this wave does NOT cover

- Numeric evaluation of conversion factors. That is mathlex-eval's and
  thales's responsibility.
- Substitution of constant numeric values. mathlex keeps symbols symbolic
  per thales Rule 2 (Expression is the contract) and thales Rule 1
  (Arc<Expr> internals). Downstream decides whether to substitute.
- Inline unit syntax. Annotations are always a caller-supplied map, never
  embedded in the source text (see § 3.3).
- Changes to `Expression` shape. The `Expression` AST is frozen per MX-2.
  `AnnotatedExpression` is a wrapper that adds the resolved annotation layer
  on top of the existing AST without modifying it.

---

## 2. ParseContext and API Surface

### 2.1 `ParseContext`

```rust
use mathcore_units::{UnitExpression, ConstantId, System};
use std::collections::HashMap;

/// Caller-supplied context for annotation-aware parsing.
pub struct ParseContext {
    /// Unit annotation for each symbol that is unit-bearing in the expression.
    /// Keys are the exact symbol strings as they appear in the source text.
    /// Values are fully formed `UnitExpression` trees (e.g., m/s, kg·m·s⁻²).
    /// The caller is responsible for constructing these; mathlex resolves tokens
    /// via `unitalg::parse_token` during the annotation-resolution phase (§ 4).
    /// Passing a pre-constructed `UnitExpression` directly is also accepted;
    /// string tokens are resolved at parse time and are an alternative entry path.
    pub unit_annotations: HashMap<String, UnitExpression>,

    /// Constant annotation for each symbol that names a known physical or
    /// mathematical constant. Keys match symbol strings in the source text.
    /// Values are `ConstantId` variants from mathcore-units § 4.
    pub constant_annotations: HashMap<String, ConstantId>,

    /// Preferred output system. If `Some(sys)`, all unit annotations are
    /// converted to `sys` via `unitalg::convert`; a conversion factor is
    /// inserted into the `Expression` tree as a symbolic multiplication.
    /// If `None`, the system is inferred by `unitalg::choose_system`.
    pub target_system: Option<System>,
}
```

`ParseContext` is cheap to clone (all keys are `String`; values are enums or
tree types without heap sharing). The default constructor returns a context
with empty maps and `target_system: None`, which produces behavior identical
to the pre-annotation mathlex parser.

### 2.2 `AnnotatedExpression`

```rust
use mathcore_units::UnitExpression;
use crate::{Expression, AnnotationSet};

/// Output of `parse_with_annotations`.
/// Wraps an `Expression` with the resolved annotation layer.
pub struct AnnotatedExpression {
    /// The parsed AST. Shape is identical to what `parse()` would return.
    /// Conversion factors inserted by target-system enforcement (§ 7) appear
    /// as `Expression::Mul` nodes wrapping the original sub-expressions.
    pub expression: Expression,

    /// Per-node and root-level annotations.
    /// Each node's `AnnotationSet` carries:
    ///   key "unit"     → UnitExpression (serialized per mathcore-units § 7)
    ///   key "constant" → ConstantId  (for symbol nodes mapped to a constant)
    /// Root-level annotations (index = root node id) additionally carry:
    ///   key "output_unit" → UnitExpression (the factored result unit)
    pub annotations: AnnotationSet,

    /// The factored result unit of the whole expression, or `None` when the
    /// expression is dimensionless or all annotations are absent.
    pub output_unit: Option<UnitExpression>,

    /// Non-fatal warnings generated during annotation processing.
    pub warnings: Vec<Warning>,
}
```

### 2.3 `Warning`

```rust
#[derive(Debug, Clone)]
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
#[cfg_attr(feature = "serde", serde(tag = "kind", content = "value"))]
pub enum Warning {
    /// A symbol appeared in `unit_annotations` or `constant_annotations`
    /// but was not found anywhere in the parsed `Expression` tree.
    UnusedAnnotation { symbol: String },
}
```

Warnings are non-fatal. `parse_with_annotations` returns `Ok(AnnotatedExpression)`
even when warnings are present. Callers inspect `warnings` to surface
annotation hygiene issues to users.

### 2.4 Entry-point function

```rust
/// Parse `source` with annotation-aware resolution.
///
/// Steps (see § 3):
/// 1. Parse source text → Expression  (existing mathlex parse path, unchanged).
/// 2. Resolve unit annotation strings via unitalg::parse_token (§ 4).
/// 3. Resolve constant annotations via mathcore_constants::lookup_constant (§ 5).
/// 4. Run consistency checks (§ 6).
/// 5. Apply target-system enforcement (§ 7).
/// 6. Walk the Expression tree to compute and factor the output unit (§ 8).
/// 7. Assemble AnnotatedExpression (§ 9).
///
/// Returns `Err(ParseError)` only for hard failures: source text parse error,
/// annotation token parse failure, dimension mismatch, transcendental argument
/// error, unknown constant, or system incompatibility. Warnings are soft and
/// appear in `AnnotatedExpression::warnings`.
pub fn parse_with_annotations(
    source: &str,
    ctx: &ParseContext,
) -> Result<AnnotatedExpression, ParseError> { ... }
```

The existing `parse()` and `parse_latex()` entry points remain unchanged and
continue to return `Expression` without annotations. `parse_with_annotations`
is a new, additive entry point.

---

## 3. Parsing Pipeline with Annotations

### 3.1 Stage 1 — Text → Expression

Call the existing mathlex parser (LaTeX or plain text, detected from the
source format flag passed by the caller or auto-detected). This stage is
unchanged from the pre-annotation path (per MX-4). The returned `Expression`
may contain any variant; no annotation logic runs here.

Errors from this stage propagate immediately as `ParseError::Syntax`.

### 3.2 Stage 2 — Annotation token resolution

After Stage 1 succeeds, iterate `ctx.unit_annotations`. For each value:
- If the value is already a `UnitExpression` (the caller constructed it
  directly), accept it as-is.
- If the caller passed a string-form unit token (the alternate entry path
  `unit_annotations_raw: HashMap<String, String>` on `ParseContext`), call
  `unitalg::parse_token(token_str)` for each. Failures surface as
  `ParseError::AnnotationParseError`.

See § 4 for the token-resolution algorithm.

### 3.3 No inline unit syntax

mathlex does not introduce any grammar extension for inline unit annotation.
The caller always supplies annotations as a separate map. This decision
continues the policy established in the mathlex v0.4.0 RFC:

- `5 m/s` is parsed as `5 * m * s^-1`. The symbols `m` and `s` are
  `Expression::Variable` nodes. Their unit semantics are supplied by the
  annotation map, not by the source text.
- `[m/s]` bracket notation is NOT recognized as inline unit syntax. Bracket
  expressions are illegal unless the plain-text parser already supports them
  for another purpose.
- Constants such as `c`, `h`, `` are parsed as `Expression::Variable` nodes.
  The annotation map (via `constant_annotations`) declares their identity.
  No new grammar is introduced.

Rationale: grammar changes are breaking; annotation maps are additive. A
caller that does not supply a `ParseContext` gets the same parse result as
always (MX-4 backward-compat guarantee). This is the load-bearing design
decision.

### 3.4 Stage 3 — Constant annotation resolution

For each entry in `ctx.constant_annotations`, call
`mathcore_constants::lookup_constant(id)` to obtain the `ConstantSpec`. This
call never fails for valid `ConstantId` values (the mathcore-constants catalog
is exhaustive for v0.1.0, per MC-1). An `id` value that does not match any
`ConstantId` variant is a type error caught at compile time; no runtime
"unknown constant" case exists for well-typed callers.

The `ConstantSpec::unit` field (a `UnitExpression`) participates in
dimension checks alongside user-declared unit annotations (§ 6). See § 5 for
the full resolution algorithm.

### 3.5 Stage 4 — Consistency checks

Call the dimension check and transcendental-argument check algorithms. See § 6.

Failures surface as `ParseError` variants with source position information
drawn from the `Expression` node's position metadata (the position fields
threaded through the Expression AST in the existing parser).

### 3.6 Stage 5 — Target-system enforcement

If `ctx.target_system` is `Some(sys)`, convert all annotated unit values to
`sys`. See § 7.

### 3.7 Stage 6 — Output unit computation

Walk the `Expression` tree bottom-up, propagating unit annotations through
operators to derive the result unit of the root node. Call `unitalg::factor`
on the root unit. See § 8.

### 3.8 Stage 7 — Assembly

Populate `AnnotationSet` per node, set `output_unit`, collect `warnings`,
and return `AnnotatedExpression`. See § 9.

---

## 4. Unit Annotation Resolution

### 4.1 Token parsing

For the `unit_annotations_raw` path, each value string is passed to
`unitalg::parse_token(s)` (unitalg § 3.13). The parse_token algorithm:

1. Rejects known-reserved tokens (`"c"`, `"e"`, `"rem"`) with
   `ParseError::ReservedToken`.
2. Attempts direct alias match via `mathcore_units::alias::lookup_alias`.
3. Falls back to prefix-decomposition via `mathcore_units::alias::lookup_with_prefix`.
4. Checks `PrefixPolicy` admissibility for the resolved `(prefix, id)` pair.

On success, returns a `UnitExpression::Atom { id, prefix }`. Composite unit
strings (e.g., `"m/s"`, `"kg*m*s^-2"`) must be pre-parsed by the caller into
a `UnitExpression` tree using unitalg's expression-level parser (a function
separate from `parse_token`; see MI-12, open issue). For v0.5.0, callers
supply either single-token strings or pre-constructed `UnitExpression` trees.

Failures from `unitalg::parse_token` map to mathlex's
`ParseError::AnnotationParseError { symbol, token, source }`.

### 4.2 Symbol coverage check

After all `unit_annotations` values are resolved, walk the parsed `Expression`
tree collecting all `Expression::Variable` symbol strings. For each symbol `s`
in `ctx.unit_annotations` that does not appear in the collected set, append
`Warning::UnusedAnnotation { symbol: s }` to the warnings list. Do not raise
an error; unused annotations are a user-side hygiene issue, not a correctness
failure.

### 4.3 Annotation attachment

For each symbol in `ctx.unit_annotations` that appears in the `Expression`
tree, locate every `Expression::Variable(name)` node where `name == symbol`
and tag its `AnnotationSet` with key `"unit"` and value equal to the resolved
`UnitExpression` (serialized per mathcore-units § 7 wire format).

When the same symbol appears multiple times in the expression (e.g., `x + x`),
every occurrence receives the same unit annotation.

---

## 5. Constant Annotation Resolution

### 5.1 Lookup

For each `(symbol, id)` in `ctx.constant_annotations`:

1. Call `mathcore_constants::lookup_constant(id)` to obtain the
   `&'static ConstantSpec`. This is the only point in mathlex that imports
   `mathcore-constants`.
2. Locate every `Expression::Variable(name)` node in the tree where
   `name == symbol`.
3. Tag each such node's `AnnotationSet` with:
   - key `"constant"``id` (as `ConstantId`, serialized per
     mathcore-units § 4 wire format, i.e., the enum's tag string).
   - key `"unit"``spec.unit` (the constant's `UnitExpression`), which
     participates in dimension checks (§ 6) exactly as a user-declared unit
     annotation would.
4. If `symbol` does not appear in the `Expression` tree, append
   `Warning::UnusedAnnotation { symbol }`.

### 5.2 No numeric substitution

mathlex does NOT substitute `spec.value` into the `Expression` tree.
The `Expression::Variable(symbol)` node remains symbolic. The constant's
identity is recorded in the `AnnotationSet`; downstream consumers (thales
for symbolic computation, mathlex-eval for numeric evaluation) decide whether
and when to substitute the value.

This preserves thales Rule 1 (Arc<Expr> internals are `Arc<Expr>` exclusively)
and thales Rule 2 (Expression is the I/O contract): the Expression shape is
not modified by annotation resolution beyond the insertion of conversion-factor
multiplications required by target-system enforcement (§ 7).

### 5.3 Constant unit merging with user-declared units

When a symbol appears in both `ctx.unit_annotations` and
`ctx.constant_annotations`, the explicit user-declared unit takes precedence.
The constant's `spec.unit` is used for dimension checking but the annotation
key `"unit"` records the user-declared value. A warning is emitted if the
two units have different dimensions (this is almost always a user error):
`Warning::ConstantUnitConflict { symbol, declared_unit, constant_unit }` (see
§ 10 for the full error model; this is a warning variant).

---

## 6. Consistency Checks

All checks run after unit and constant annotations are resolved and attached.
Failures are hard errors returned as `ParseError` variants. The checks are
applied in the order listed.

### 6.1 Unused annotation check

Already performed in § 4.2 (unit) and § 5.1 (constant). Produces
`Warning::UnusedAnnotation`, not an error.

### 6.2 Additive site dimension check

Walk the `Expression` tree. For every `Expression::Add(lhs, rhs)` or
`Expression::Sub(lhs, rhs)` node:

1. Determine the effective unit of `lhs` and of `rhs` by looking up their
   resolved annotation in the per-node `AnnotationSet`. If a sub-expression
   has no unit annotation, it is treated as dimensionless.
2. Call `unitalg::combine_additive(u_lhs, u_rhs, ctx.target_system)`.
3. On `Err(DimError::IncompatibleDimensions { lhs: d1, rhs: d2 })`:
   - Surface as `ParseError::DimensionMismatch { op_pos, left_dim, right_dim }`.
   - `op_pos` is the source-position span of the operator node, obtained from
     the position metadata already present in the `Expression` AST.
4. On `Err(DimError::LogarithmicReferenceMismatch { .. })`:
   - Surface as `ParseError::LogarithmicMismatch { op_pos, lhs_ref, rhs_ref }`.
5. On `Ok(AdditiveResult)`:
   - If `factor_for_u1 != 1` or `factor_for_u2 != 1`: the operand(s) need a
     conversion. Wrap them in `Expression::Mul(factor, original_operand)`
     where `factor` is the symbolic conversion factor from `AdditiveResult`.
     This rewrites the tree in-place (the returned `expression` field of
     `AnnotatedExpression` will reflect this rewrite; see § 9.1).
   - Update the addition node's unit annotation to `AdditiveResult::unified_unit`.

This is the only stage where the `Expression` tree is rewritten. All other
stages only annotate; they do not modify node structure.

### 6.3 Transcendental argument check

For every function application node where the function name is one of `sin`,
`cos`, `tan`, `sec`, `csc`, `cot`, `arcsin`, `arccos`, `arctan`, `exp`,
`log`, `log10`, `log2`, `ln`:

1. Obtain the unit of the argument from its `AnnotationSet` key `"unit"`.
   If absent, treat as dimensionless (no check needed).
2. Call `unitalg::check_transcendental_argument(arg_unit)`.
3. On `Err(ArgError::DimensionedArgument { function, found })`:
   - Surface as `ParseError::TranscendentalArgumentError { fn_name, arg_dim, arg_pos }`.
   - `arg_pos` is the source span of the argument node.

Per unitalg § 3.12, `Radian` and `Degree` (dimension `{Angle: 1}`) and all
dimensionless units pass. Any other dimension fails.

### 6.4 Power with symbolic or non-integer exponent

For every `Expression::Pow(base, exponent)` node:

1. Obtain the unit of `base`.
2. If `exponent` is `Expression::Integer(n)`: call `unitalg::power(u_base, n)`.
   No dimension error is possible for integer powers.
3. If `exponent` is `Expression::Rational(num, den)`: call
   `unitalg::power_rational(u_base, num, den)`.
   On `Err(DimError::IncompatibleDimensions)`: surface as
   `ParseError::FractionalDimensionError { op_pos, base_unit }`.
4. If `exponent` is any other variant (a variable, a function call, etc.):
   - Require `u_base` to be dimensionless (empty Dimension). If it is not,
     append `Warning::SymbolicExponentWithUnit { op_pos, base_unit }`.
     This is a warning, not a hard error, because the CAS may still handle
     it symbolically in thales.

### 6.5 Incompleteness warning for addition sites

If a sub-expression appearing as an operand in `+` or `-` has no unit
annotation and the other operand does carry a unit annotation, emit
`Warning::UnannotatedAddend { op_pos, unannotated_side }`. This warns the
user that dimension checking was skipped for one arm of an additive expression,
which is a common source of latent errors.

---

## 7. Target-System Enforcement

Target-system enforcement runs after all consistency checks pass.

### 7.1 System selection

Call `unitalg::choose_system(all_unit_annotations, ctx.target_system)`:

- If `ctx.target_system = Some(sys)` and any annotation unit is incompatible
  with `sys`, return
  `ParseError::SystemIncompatible { target: sys, unresolvable_units: ... }`.
- If `ctx.target_system = None`, `choose_system` returns the inferred system
  (first-wins heuristic per unitalg § 5). If `NoCompatibleSystem`, that error
  bubbles up as `ParseError::SystemIncompatible`.

### 7.2 Conversion of annotation units

For each annotation unit `u` in the resolved `ctx.unit_annotations` (and the
constant-side `spec.unit` values), if `u`'s system differs from the selected
system `sys`:

1. Find the target unit `u_target` for `u`'s dimension in `sys` via
   `unitalg::choose_system(&[u], Some(sys))`.
2. Call `unitalg::convert(Expression::Integer(1), &u, &u_target)` to emit the
   symbolic conversion factor `f`.
3. For every `Expression::Variable(symbol)` node annotated with `u`, wrap it
   as `Expression::Mul(f.clone(), Expression::Variable(symbol))`.
4. Update the node's `AnnotationSet` key `"unit"` to `u_target`.

The returned `AnnotatedExpression::expression` field contains the rewritten
tree. This is the second (and last) stage that may rewrite the tree (the first
being additive site unification in § 6.2).

### 7.3 Constants and target system

Constants expressed in SI (as all `ConstantSpec::value` fields are) do not
have their numeric values converted by mathlex. mathlex tags the
`AnnotationSet` with a key `"target_system"` carrying the selected `System`
value, so downstream consumers (thales, mathlex-eval) know which system was
requested and can apply the conversion at evaluation time if needed. This
avoids numeric evaluation inside mathlex (Rule 1 of mathlex CLAUDE.md:
parser produces AST, never evaluates).

---

## 8. Output Unit Factoring

### 8.1 Bottom-up unit propagation

After all consistency checks and target-system conversions are applied, walk
the `Expression` tree bottom-up. For each node, compute its effective unit by
applying the following rules:

| Node type | Effective unit |
|---|---|
| `Variable(s)` | Look up `AnnotationSet["unit"]`; if absent, dimensionless. |
| `Integer`, `Float`, `Rational` | Dimensionless. |
| `Mul(a, b)` | `unitalg::multiply(unit(a), unit(b))` |
| `Div(a, b)` | `unitalg::divide(unit(a), unit(b))` |
| `Pow(base, Integer(n))` | `unitalg::power(unit(base), n)` |
| `Pow(base, Rational(p,q))` | `unitalg::power_rational(unit(base), p, q)` |
| `Add(a, b)`, `Sub(a, b)` | `unified_unit` from the additive result (§ 6.2); both sides must carry the same unit after unification. |
| `Neg(a)` | `unit(a)` |
| `sin`, `cos`, `tan`, `exp`, `ln`, `log` | Dimensionless (transcendental functions return dimensionless results). |
| `sqrt(a)` | `unitalg::power_rational(unit(a), 1, 2)` |
| `Derivative(expr, var, _)` | `unitalg::divide(unit(expr), unit(var))` |
| `Integral(expr, var, _, _)` | `unitalg::multiply(unit(expr), unit(var))` |
| All other nodes | Dimensionless (conservative default; annotate with a warning if a unit-bearing sub-expression is present). |

### 8.2 Root unit factoring

After propagation, obtain the unit of the root node. Call
`unitalg::factor(root_unit)` to fold the base-unit product to the most
specific named unit available in the catalog (per unitalg § 3.7). Assign the
result to `AnnotatedExpression::output_unit`.

If the root unit is dimensionless (empty `Dimension`), set `output_unit` to
`None`.

### 8.3 Per-node annotation population

Store the effective unit of each node in its `AnnotationSet` under key
`"unit"`. This completes the annotation population pass. Nodes that were
already annotated by § 4 (Variable nodes with direct unit annotations) already
have this key; the propagation pass fills in all remaining nodes that derive
their unit from their operands.

---

## 9. Wire Format

### 9.1 Expression tree rewrite policy

The `expression` field of `AnnotatedExpression` is the primary `Expression`
AST. Its shape is based on the output of Stage 1 (text parse), with two
possible categories of structural addition:

1. **Conversion multiplications** inserted at additive sites where unit
   conversion is needed (§ 6.2, `AdditiveResult`). These appear as
   `Expression::Mul(conversion_factor, original_operand)` nodes wrapping the
   operand that required scale adjustment.
2. **Target-system conversion multiplications** inserted when a variable's
   annotation unit does not match the selected system (§ 7.2). These also
   appear as `Expression::Mul(conversion_factor, Expression::Variable(sym))`
   nodes.

The `Expression` type is not modified; `Expression::Mul` is an existing
variant. The tree is structurally valid and round-trips through serde
unchanged (MX-1 guarantee applies).

### 9.2 AnnotationSet payload conventions

The `AnnotationSet` on each node is a map from string keys to typed payloads.
The keys defined by this specification:

| Key | Payload type | Scope | Notes |
|---|---|---|---|
| `"unit"` | `UnitExpression` (mathcore-units § 2.8 wire format) | Per node | Set on Variable nodes by § 4.3; propagated to all nodes by § 8.3 |
| `"constant"` | String (serialized `ConstantId` tag) | Per node | Set on Variable nodes by § 5.1 |
| `"output_unit"` | `UnitExpression` | Root node only | Set by § 8.2; same value as `AnnotatedExpression::output_unit` |
| `"target_system"` | String (serialized `System` tag) | Root node only | Set by § 7.3 |

All annotation payloads are stored as JSON-compatible values. `UnitExpression`
serializes per mathcore-units § 7 wire format (tag-and-content convention).
`ConstantId` and `System` serialize as their variant tag strings.

### 9.3 JSON example

Expression `m * a` with `m: Kilogram`, `a: Meter/Second²`. The root `Mul`
node's propagated unit folds to `Newton` via `unitalg::factor`.

```json
{
  "expression": {
    "kind": "Mul",
    "value": {
      "left":  {"kind": "Variable", "value": {"name": "m"},
                "annotations": {"unit": {"kind": "Atom", "value": {"id": "Kilogram", "prefix": null}}}},
      "right": {"kind": "Variable", "value": {"name": "a"},
                "annotations": {"unit": {"kind": "Binary", "value": {
                  "op": {"kind": "Div"},
                  "left":  {"kind": "Atom", "value": {"id": "Meter", "prefix": null}},
                  "right": {"kind": "Binary", "value": {
                    "op":    {"kind": "Pow"},
                    "left":  {"kind": "Atom",    "value": {"id": "Second", "prefix": null}},
                    "right": {"kind": "Literal", "value": "2"}}}}}}},
      "annotations": {
        "unit":        {"kind": "Atom", "value": {"id": "Newton", "prefix": null}},
        "output_unit": {"kind": "Atom", "value": {"id": "Newton", "prefix": null}}
      }
    }
  },
  "output_unit": {"kind": "Atom", "value": {"id": "Newton", "prefix": null}},
  "warnings": []
}
```

### 9.4 Serde feature gating

`AnnotatedExpression` serde derives are gated behind the `serde` feature,
consistent with MX-1. All types referenced by `AnnotatedExpression`
(`UnitExpression`, `ConstantId`, `System`) require `mathcore-units/serde`
to be enabled. The feature dependency chain:

```toml
[features]
serde = [
    "dep:serde",
    "mathcore-units/serde",
    "mathcore-constants/serde",
    "unitalg/serde",
]
```

Enabling `mathlex/serde` without also enabling `mathcore-units/serde` is a
compile error (the annotation payload types are not independently serde-capable).

---

## 10. Error Model

All errors from annotation processing are variants of mathlex's existing
`ParseError` enum. The following new variants are added in v0.5.0. All
variants are `#[non_exhaustive]`; future minor versions may add variants.

```rust
// Additions to the existing ParseError enum:

/// A unit token in `ParseContext::unit_annotations_raw` could not be resolved
/// by `unitalg::parse_token`.
AnnotationParseError {
    /// The symbol for which the annotation was declared.
    symbol: String,
    /// The token string that failed to parse.
    token: String,
    /// The underlying `unitalg::ParseError`.
    source: unitalg::ParseError,
},

/// An additive operator (`+` or `-`) has operands with incompatible dimensions.
DimensionMismatch {
    /// Source span of the operator node.
    op_pos: SourceSpan,
    /// Dimension of the left operand.
    left_dim: mathcore_units::Dimension,
    /// Dimension of the right operand.
    right_dim: mathcore_units::Dimension,
},

/// An additive operator mixes a logarithmic unit (dB family) with a linear
/// unit of the same underlying dimension (e.g., dBm + Watt).
LogarithmicMismatch {
    op_pos: SourceSpan,
    lhs_ref: Option<mathcore_units::UnitId>,
    rhs_ref: Option<mathcore_units::UnitId>,
},

/// A transcendental function received an argument with a non-dimensionless,
/// non-angle unit (e.g., sin(x) where x has unit Meter).
TranscendentalArgumentError {
    fn_name: String,
    arg_dim: mathcore_units::Dimension,
    arg_pos: SourceSpan,
},

/// A `Pow` node has a rational exponent that would produce fractional
/// dimension exponents (e.g., sqrt(Newton) → {L:0.5, M:0.5, T:-1}).
FractionalDimensionError {
    op_pos: SourceSpan,
    base_unit: mathcore_units::UnitExpression,
},

/// The requested `target_system` cannot accommodate one or more of the
/// annotation units (e.g., target = Imperial, unit = Tesla).
SystemIncompatible {
    target: mathcore_units::System,
    unresolvable_units: Vec<mathcore_units::UnitExpression>,
},
```

The following are `Warning` variants (non-fatal, returned in
`AnnotatedExpression::warnings`):

```rust
/// A symbol in `unit_annotations` or `constant_annotations` did not appear
/// in the parsed `Expression` tree.
Warning::UnusedAnnotation { symbol: String },

/// An addend in `+` or `-` had no unit annotation while the other addend did.
Warning::UnannotatedAddend { op_pos: SourceSpan, unannotated_side: Side },

/// A `Pow` node had a symbolic (non-literal) exponent and the base had a
/// non-dimensionless unit. The check was skipped; thales will handle this.
Warning::SymbolicExponentWithUnit {
    op_pos: SourceSpan,
    base_unit: mathcore_units::UnitExpression,
},

/// A symbol appeared in both `unit_annotations` and `constant_annotations`;
/// the explicit user-declared unit took precedence, but the constant's
/// cataloged unit has a different dimension.
Warning::ConstantUnitConflict {
    symbol: String,
    declared_unit: mathcore_units::UnitExpression,
    constant_unit: mathcore_units::UnitExpression,
},
```

---

## 11. Backward Compatibility

### 11.1 Existing parse paths unchanged

The existing `parse(source: &str) -> Result<Expression, ParseError>` and
`parse_latex(source: &str) -> Result<Expression, ParseError>` entry points
are unchanged. They return a plain `Expression` with an empty `AnnotationSet`
on every node, exactly as in v0.4.0. No caller that uses the pre-annotation
API is affected by this wave of work.

This guarantee is the continuation of the MX-4 commitment. Specifically:

- No existing test fixture in mathlex's test suite requires modification.
- The `Expression` enum gains no new variants in v0.5.0 for the annotation
  integration (the existing `Variable`, `Mul`, etc. are sufficient).
- `AnnotationSet` remains present on every `Expression` node as an empty set
  when `parse()` is called, exactly as established in v0.4.0.

### 11.2 `AnnotatedExpression` is a new type

`AnnotatedExpression` is additive. It does not replace `Expression`; it wraps
it. Consumers that have not migrated to `parse_with_annotations` continue to
work without any changes.

### 11.3 MI-N note

Every requirement tagged `Blocker` in § 13 gates mathlex v0.5.0. Requirements
tagged `Required` must ship in v0.5.0 alongside the Blocker items.
Requirements tagged `Deferred` are explicitly out of scope for v0.5.0 and
tracked for v0.6.0 or later.

---

## 12. Test Strategy

### 12.1 Unit annotation resolution tests

For each `unitalg::ParseError` variant, a test in mathlex's test suite calls
`parse_with_annotations` with a deliberately malformed unit annotation and
asserts the correct `ParseError::AnnotationParseError` is returned. Cover at
minimum: `UnknownToken`, `PrefixWithoutAtom`, `ReservedToken`,
`PrefixNotAllowed`.

### 12.2 Dimension check tests

For each dimension-check path (additive site, transcendental argument,
rational power):

- One positive test: annotation-consistent expression returns `Ok`.
- One negative test: annotation-inconsistent expression returns the
  correct `ParseError` variant with accurate `op_pos` or `arg_pos`.

Representative cases:

```rust
// Dimension mismatch: adding meters to seconds
parse_with_annotations(
    "x + y",
    &ParseContext {
        unit_annotations: [("x", meter()), ("y", second())],
        ..Default::default()
    },
) == Err(ParseError::DimensionMismatch { left_dim: {Length:1}, right_dim: {Time:1}, .. })

// Compatible additive: meters and feet → conversion factor inserted
parse_with_annotations(
    "x + y",
    &ParseContext {
        unit_annotations: [("x", meter()), ("y", foot())],
        target_system: Some(System::SI),
        ..Default::default()
    },
) == Ok(AnnotatedExpression { output_unit: Some(Meter), warnings: [], .. })
// expression.right is wrapped: Mul(Rational(3048, 10000), Variable("y"))

// Transcendental: sin(x_meters) fails
parse_with_annotations(
    "sin(x)",
    &ParseContext {
        unit_annotations: [("x", meter())],
        ..Default::default()
    },
) == Err(ParseError::TranscendentalArgumentError { fn_name: "sin", arg_dim: {Length:1}, .. })

// Transcendental: sin(theta_rad) passes
parse_with_annotations(
    "sin(theta)",
    &ParseContext {
        unit_annotations: [("theta", radian())],
        ..Default::default()
    },
) == Ok(..)
```

### 12.3 Constant annotation tests

- Positive: `c` annotated as `ConstantId::SpeedOfLight` produces a
  `AnnotationSet` on the Variable node with `"constant": "SpeedOfLight"` and
  `"unit"` matching the `ConstantSpec::unit` for `SpeedOfLight`.
- Negative: symbol in `constant_annotations` not present in expression
  produces `Warning::UnusedAnnotation`.
- No-substitute: verify that `Expression::Variable("c")` is NOT replaced by
  the numeric value `299792458`; the Variable node is preserved.

### 12.4 Target-system enforcement tests

- Mixed SI + Imperial in an additive expression with `target_system: Some(SI)`:
  verify the Imperial operand is wrapped in a conversion factor.
- `target_system: Some(Imperial)` with a Tesla annotation:
  verify `ParseError::SystemIncompatible`.
- No target system with all-SI annotations:
  verify `choose_system` selects SI and no conversions are emitted.

### 12.5 Output unit tests

- `F = m * a` with `m: Kilogram`, `a: Meter/Second²`:
  `output_unit = Some(Newton)` (factored by unitalg).
- `v = d / t` with `d: Meter`, `t: Second`:
  `output_unit = Some(Atom { id: Meter, prefix: None }) / Atom { id: Second, prefix: None }`
  — no named unit match, stays as base-unit product.
- `E = m * c^2` with `m: Kilogram`, `c` annotated as `SpeedOfLight`
  (constant, `unit: m/s`):
  effective unit of `c^2` = `m²/s²`; `m * c^2``kg·m²·s⁻²` = `Joule`.

### 12.6 Integration tests (cross-crate)

Integration tests live under `mathlex/tests/integration/` with
`dev-dependencies` on `unitalg`, `mathcore-units`, and `mathcore-constants`.

Required cases:

- **Solar mass additive**: `1.5 * M_sun + 0.3 * M_earth` (both annotated as
  constants and mass units). Dimension check passes; `FromConstant` conversion
  factors emitted; `output_unit = Kilogram`.
- **Speed-of-light energy**: `m * c^2`, `m: Kilogram`, `c: SpeedOfLight`
  constant. `output_unit = Joule`.
- **dBm + Watt**: `x + y`, `x: DecibelMilliwatt`, `y: Watt`.
  `ParseError::LogarithmicMismatch`.
- **Temperature conversion**: `T1 + T2`, `T1: DegreeCelsius`,
  `T2: DegreeFahrenheit`, `target_system: Some(SI)`. Affine factors emitted;
  `output_unit = Kelvin`.

### 12.7 Backward compatibility regression tests

Run the full existing mathlex test suite (all tests that existed before
v0.5.0) against the v0.5.0 build. Zero failures permitted. This is a CI gate.
Additionally, run the MX-7 golden fixture round-trip tests to confirm the
`Expression` serde format is unchanged.

### 12.8 Unused annotation warning tests

- Annotate a symbol that does not appear in the expression: verify exactly one
  `Warning::UnusedAnnotation` entry.
- Annotate all symbols that appear in the expression: verify zero warnings.

---

## 13. Requirements (MI-1..MI-N)

| ID | Requirement | Severity |
|---|---|---|
| MI-1 | `parse_with_annotations(source, ctx)` is a new public entry point returning `Result<AnnotatedExpression, ParseError>`; the existing `parse()` and `parse_latex()` entry points are unchanged | Blocker |
| MI-2 | `ParseContext` carries `unit_annotations: HashMap<String, UnitExpression>`, `constant_annotations: HashMap<String, ConstantId>`, and `target_system: Option<System>` | Blocker |
| MI-3 | Unit annotation tokens are resolved via `unitalg::parse_token`; failures surface as `ParseError::AnnotationParseError` carrying the symbol, the token, and the underlying `unitalg::ParseError` | Blocker |
| MI-4 | Symbols in `unit_annotations` that do not appear in the `Expression` tree emit `Warning::UnusedAnnotation` (not an error) | Required |
| MI-5 | Every `Expression::Variable(s)` node where `s` appears in `unit_annotations` receives a `AnnotationSet["unit"]` entry with the resolved `UnitExpression` | Blocker |
| MI-6 | `mathcore_constants::lookup_constant(id)` is called for each entry in `constant_annotations`; the resulting `ConstantSpec::unit` participates in dimension checks | Blocker |
| MI-7 | Constant-annotated Variable nodes receive `AnnotationSet["constant"]` with the serialized `ConstantId` tag; their `AnnotationSet["unit"]` is set from `ConstantSpec::unit` | Blocker |
| MI-8 | mathlex does NOT substitute `ConstantSpec::value` into the `Expression` tree; Variable nodes remain symbolic | Blocker |
| MI-9 | Additive (`+`/`-`) sites call `unitalg::combine_additive`; incompatible dimensions produce `ParseError::DimensionMismatch`; compatible but different-unit cases wrap the out-of-system operand in a symbolic `Expression::Mul` conversion | Blocker |
| MI-10 | Transcendental function sites (`sin`, `cos`, `tan`, `exp`, `log`, `ln`, and equivalents) call `unitalg::check_transcendental_argument` on the argument's resolved unit; non-dimensionless non-angle units produce `ParseError::TranscendentalArgumentError` | Blocker |
| MI-11 | Rational-exponent `Pow` sites call `unitalg::power_rational`; fractional dimension results produce `ParseError::FractionalDimensionError`; symbolic exponents with unit-bearing bases produce `Warning::SymbolicExponentWithUnit` | Required |
| MI-12 | `ctx.target_system = Some(sys)` causes all annotation units incompatible with `sys` to be converted via `unitalg::convert`; the conversion factor is inserted as a symbolic `Expression::Mul` node and the annotation is updated to the target-system unit | Blocker |
| MI-13 | `ParseError::SystemIncompatible` is returned when `target_system` is set and at least one annotation unit cannot be converted to that system | Blocker |
| MI-14 | Bottom-up unit propagation walks the entire `Expression` tree after checks and conversions; each node receives an `AnnotationSet["unit"]` entry derived from its operator and its children's units | Required |
| MI-15 | `unitalg::factor` is called on the root node's propagated unit; the result is stored in `AnnotatedExpression::output_unit` and in the root node's `AnnotationSet["output_unit"]` | Required |
| MI-16 | `AnnotatedExpression` carries `expression: Expression`, `annotations: AnnotationSet`, `output_unit: Option<UnitExpression>`, and `warnings: Vec<Warning>` | Blocker |
| MI-17 | All new `ParseError` variants use `#[non_exhaustive]` and `#[serde(tag = "kind", content = "value")]` when the `serde` feature is enabled | Required |
| MI-18 | The `serde` feature for `AnnotatedExpression` requires `mathcore-units/serde`, `mathcore-constants/serde`, and `unitalg/serde` to be co-enabled; enabling `mathlex/serde` alone without them is a compile error | Required |
| MI-19 | All existing parse paths (`parse()`, `parse_latex()`, etc.) return `Expression` with an empty `AnnotationSet` on every node; no existing test fixture requires modification for v0.5.0 | Blocker |
| MI-20 | CI integration tests cover: (a) dimension mismatch at additive site, (b) transcendental argument with dimensional unit, (c) constant annotation with no substitution, (d) target-system conversion factor insertion, (e) output unit factored to a named unit, (f) backward-compat golden fixture round-trip | Blocker |

---

## 14. Resolved Decisions and Open Issues

### 14.1 Resolved decisions

**D-1: No inline unit grammar.** The decision from the v0.4.0 RFC period is
confirmed: annotations are always caller-supplied maps; mathlex introduces
no bracket syntax or other inline annotation grammar. Rationale: grammar
additions are breaking changes; the map API is purely additive and keeps
existing parse inputs working without modification.

**D-2: No numeric substitution at parse time.** mathlex keeps constants
symbolic. The `Expression::Variable(sym)` node for a constant survives into
the output AST unchanged. thales Rule 1 and mathlex Rule 1 (parser produces
AST, never evaluates) both require this. Downstream consumers substitute when
they decide to evaluate.

**D-3: `AnnotatedExpression` is a wrapper, not a replacement.** `Expression`
shape is frozen by MX-2. `AnnotatedExpression` is a new struct that holds
`Expression` and adds the resolution layer. No new `Expression` variants are
needed for v0.5.0.

**D-4: Conversion factors use `Expression::Mul` (existing variant).** Tree
rewrites for unit conversion insert standard `Expression::Mul` nodes carrying
a `Rational` or `Variable` literal as the conversion factor. No new
`Expression` variant is needed.

**D-5: Additive site with `LogarithmicReferenceMismatch` is a hard error.**
Mixing `dBm + Watt` is rejected at parse time. It is not a warning because
the CAS cannot represent the result symbolically without unwrapping the
logarithmic relationship, which is thales's responsibility, not mathlex's. If
the caller intends a purely symbolic expression without dimensional semantics,
they should not supply unit annotations for those symbols.

**D-6: `Warning::UnannotatedAddend` is a warning, not an error.** An
unannotated addend might legitimately be a dimensionless constant. Forcing
annotation on all addends would break expressions like `E = mc^2 + 0` where
the `0` is unannotated. A warning surfaces the issue without hard-failing
legitimate expressions.

**D-7: `mathcore-constants` is a direct dependency of mathlex starting in
v0.5.0.** The `lookup_constant` call in § 5.1 requires the dependency.
In v0.4.0, mathlex had no dependency on mathcore-constants (constant
annotation was deferred per MX-10). Adding this dependency in v0.5.0 is a
`[dependencies]` addition; it is not a breaking change to the public API.

**D-8: `unitalg` is a direct dependency of mathlex starting in v0.5.0.**
Same rationale as D-7. `parse_token`, `combine_additive`,
`check_transcendental_argument`, `convert`, `factor`, and `choose_system` are
all called by the annotation pipeline. In v0.4.0 mathlex had no dependency on
unitalg.

### 14.2 Open issues

**MI-ISSUE-1: Composite unit token parsing API in unitalg.**
`unitalg::parse_token` handles single atomic tokens. Composite unit strings
such as `"m/s"` or `"kg*m*s^-2"` are not handled by `parse_token`. The
`unit_annotations_raw` path in `ParseContext` (§ 4.1) promises that callers
can pass string tokens for resolution, but for composite strings this requires
an expression-level unit parser in unitalg that does not yet exist in the
unitalg specification. Options: (a) restrict `unit_annotations_raw` to single
tokens only and require callers to pre-construct `UnitExpression` trees for
composite units; (b) add a `parse_unit_expression(s: &str)` function to
unitalg that tokenizes and builds a `UnitExpression` tree. Option (a) is
conservative; option (b) is the right long-term answer but requires a unitalg
spec amendment. This issue must be resolved before the v0.5.0 API is frozen.
**Blocking for MI-3 acceptance criteria.**

**MI-ISSUE-2: `AnnotationSet` key type for `UnitExpression` payload.**
The `AnnotationSet` substrate from MX-3 carries opaque `serde_json::Value`
payloads keyed by strings. When mathlex v0.5.0 populates `"unit"` keys with
`UnitExpression` values, the payload is a strongly typed `UnitExpression`
on the Rust side but a JSON object on the wire. The question is whether
`AnnotationSet` should carry typed variant payloads (via an enum) or continue
as a string-keyed `serde_json::Value` map. A typed enum would allow compile-time
correctness checking; a value map is more flexible for future extension.
The decision affects the `AnnotationSet` type shape defined in MX-3, which is
already shipped in v0.4.0. Changing the shape would be a breaking change.
This must be resolved in coordination with thales before v0.5.0 scope is
locked. **Non-blocking for v0.5.0 implementation start but must be resolved
before the v0.5.0-rc wire format is considered provisional-final.**

**MI-ISSUE-3: `SourceSpan` type in new `ParseError` variants.**
The new `ParseError` variants (`DimensionMismatch`, `TranscendentalArgumentError`,
etc.) carry a `SourceSpan` field encoding the source position of the error
site. The current mathlex `ParseError` type uses a position representation
that may need to be exposed as a public type alias or struct for the new
variants. If the existing position type is internal, it must be made public
in v0.5.0. This is a minor API addition but must be confirmed with the
mathlex CLAUDE.md backward-compat policy. **Non-blocking; resolved during
v0.5.0 PRD authoring.**