runmat-hir 0.2.2

High-level IR for RunMat with type inference and lowering utilities
Documentation
# RunMat HIR

High-level Intermediate Representation for MATLAB code. HIR is the semantic hub between parsing and execution (interpreter/JIT). It resolves identifiers to `VarId`s, attaches static types, normalizes constructs, and runs early semantic validations so downstream components can be simpler and faster.

## Goals

- Provide a typed, SSA-friendly structure for the engine
- Preserve MATLAB semantics (indexing, cells, classes, methods, metaclass)
- Enable flow-sensitive inference and optimizations (constant folding, dispatch)
- Catch structural and attribute errors early (classdef attributes, imports)

## Core data structures

- `VarId(usize)`: stable variable identifiers after name binding
- `Type` (from `runmat-builtins`):
  - `Int`, `Num`, `Bool`, `String`
  - `Tensor { shape: Option<Vec<Option<usize>>> }` (column-major semantics)
  - `Cell { element_type: Option<Box<Type>>, length: Option<usize> }`
  - `Function { params: Vec<Type>, returns: Box<Type> }`
  - `Struct { known_fields: Option<Vec<String>> }` (inference-only)
  - `Void`, `Unknown`, `Union(Vec<Type>)`
- `HirExpr { kind, ty }` (selected variants):
  - Literals and names: `Number`, `String`, `Var(VarId)`, `Constant`
  - Ops: `Unary`, `Binary`
  - Aggregates: `Tensor`, `Cell`, `Range`, `Colon`, `End`
  - Indexing: `Index`, `IndexCell`
  - Calls and members: `FuncCall`, `FuncHandle`, `AnonFunc`, `Member`, `MemberDynamic`, `MethodCall`
  - Metaclass: `MetaClass("pkg.Class")`
- `HirStmt` (selected variants):
  - `ExprStmt(expr, suppressed)` (semicolon suppression)
  - `Assign(VarId, expr, suppressed)`
  - `MultiAssign(Vec<Option<VarId>>, expr, suppressed)` with `~` as `None`
  - `AssignLValue(HirLValue, expr, suppressed)` where `HirLValue` ∈ { `Var`, `Member`, `MemberDynamic`, `Index`, `IndexCell` }
  - Control flow: `If`, `While`, `For`, `Switch`, `TryCatch`
  - Declarations: `Function { name, params, outputs, body, has_varargin, has_varargout }`, `Global`, `Persistent`
  - Flow control: `Break`, `Continue`, `Return`
  - Class: `ClassDef { name, super_class, members }`
  - Imports: `Import { path: Vec<String>, wildcard: bool }`
- `HirClassMember`: `Properties`, `Methods`, `Events`, `Enumeration`, `Arguments` (carry `parser::Attr` attributes)
- `HirProgram { body }`

## Lowering (AST → HIR)

- `Ctx` manages scopes, binds names to `VarId`, and maintains `var_types` for flow typing.
- Variables shadow constants; bare identifiers that are known functions lower to `FuncCall(name, [])`.
- Indexing vs calls is already disambiguated by the parser; HIR keeps `Index`/`IndexCell` and `FuncCall` distinct.
- L-values lower to `HirLValue` for dot/paren/brace writes. Plain `A(…) = v` is `AssignLValue`.
- `Function` statements record `has_varargin`/`has_varargout` flags.
- `ClassDef` lowers structurally into `HirClassMember` blocks with attributes preserved.
- `Import` lowers to a dedicated `HirStmt::Import` (no runtime effect; used by name resolution/validation).
- Metaclass `?Qualified.Name` lowers to `HirExprKind::MetaClass("Qualified.Name")`; postfix is handled in the compiler.
- Function-level `arguments ... end` blocks (when present) are parsed; names are accepted and exposed to later validation. Constraint checking (types/defaults/ranges) is enforced at HIR/VM time rather than parsing time.

## Early validations and helpers

- `validate_classdefs(&HirProgram)` runs during `lower()`:
  - Detects duplicate properties/methods and name conflicts between them
  - Enforces attribute constraints (e.g., Methods: `Abstract``Sealed` invalid; Properties: `Static``Dependent` invalid; `Access`/`GetAccess`/`SetAccess` values limited to `public|private`)
  - Performs basic sanity checks for `Events`, `Enumeration`, and `Arguments` (unique names; no conflicts with props/methods)
- Imports:
  - `collect_imports(&HirProgram)`
  - `normalize_imports(&HirProgram) -> Vec<NormalizedImport { path, wildcard, unqualified }]`
  - `validate_imports(&HirProgram)` checks duplicates and ambiguity among specifics with the same unqualified name
- Multi-LHS structural validation: lowering rejects invalid LHS shapes early (e.g., empty LHS vectors, unsupported mixed forms); shape/size rules are enforced by the interpreter at assignment.
- Globals/Persistents: a per-program symbol set is collected across units to model lifetimes and name binding consistently.

## Type inference (expressions)

- Numbers/strings/booleans map to `Num`/`String`/`Bool`.
- Arithmetic/elementwise ops: if any operand is `Tensor`, result is `Tensor` (shape may unify when known).
- Range/colon produce `Tensor`.
- Indexing computes output type conservatively. For tensors with known rank, scalar indices drop dimensions.
- Cells compute a unified element type across literals when possible.
- Member/Method calls are `Unknown` by default (value-dependent at runtime).
- Metaclass expression has `String` type.

## Flow-sensitive inference

Two complementary passes exist:

1) Inter-procedural return summaries

- `infer_function_output_types(&HirProgram) -> HashMap<String, Vec<Type>>`
  - Gathers all function names (top-level and class methods)
  - Seeds summaries from each function's own exits/fallthrough, then iterates to a small fixed point (cap at 3 iters)
  - Merges types at joins; Unknown ⊔ T = T; otherwise unify
  - Uses an internal `analyze_stmts(outputs, …, func_returns)` whose env joins propagate return types

2) Per-function variable environments

- `infer_function_variable_types(&HirProgram) -> HashMap<String, HashMap<VarId, Type>`
  - Similar dataflow that produces a final environment for each function
  - Uses return summaries from (1) to type `FuncCall`
  - Includes a simple callsite fallback for direct callees: when a callee's summary is missing/Unknown, a single-pass analysis of the callee body (seeding parameter types conservatively) infers direct output assignments. This stabilizes per-position types for `[a,b]=f(...)` at callers.

### Struct-field flow inference

- HIR uses `Type::Struct { known_fields: Option<Vec<String>> }` to conservatively track observed fields on variables.
- The analysis refines struct knowledge in two ways:
  - Writes: `s.field = expr` marks `s` as Struct and adds `"field"` to `known_fields`.
  - Conditions (then-branch refinement): detect any of the following and add asserted fields:
    - `isfield(s, 'x')`
    - `ismember('x', fieldnames(s))` or `ismember(fieldnames(s), 'x')`
    - `strcmp(fieldnames(s), 'x')` / `strcmpi(…)`, including `any(strcmp(…))` or `all(strcmp(…))`
    - Conjunctions using `&&` or `&` are traversed; negations are ignored (no refinement)
- Refinements are applied to the then-branch env only and merged back at joins using `Type::unify` for Structs.

## Multi-assign typing

- `[a,b] = f(...)` is typed per-position using the callee's return summary when available.
- If a summary is incomplete or missing, a simple fallback (single-pass over the callee) infers direct assignments to outputs and fills Unknowns conservatively.
- Mixed forms like `[~,b] = f(...)` are handled by storing `None` in the LHS vector and skipping the slot.

## Function call typing

- Builtins: signatures come from the registry (`runmat-builtins`).
- User functions: return summaries and the per-position logic above are used for accurate call result typing in both expression and `MultiAssign` contexts.

## Remapping utilities

- `remapping::create_function_var_map`, `create_complete_function_var_map`
- `remapping::remap_function_body` / `remap_stmt` / `remap_expr` to rewrite `VarId`s for local execution frames
- `remapping::collect_function_variables` scans bodies to compute complete maps

## Public entry points

- `lower(&AstProgram) -> Result<HirProgram, String>`: lowers AST, runs return-summary inference (for seeding), then validates classes
- `lower_with_context` / `lower_with_full_context`: lowering for REPL with preexisting variables/functions
- Validation helpers: `validate_classdefs`, `collect_imports`, `normalize_imports`, `validate_imports`
- Inference helpers: `infer_function_output_types`, `infer_function_variable_types`

## Testing

- Mirrors parser coverage for syntax constructs; adds HIR-specific tests:
  - L-value lowering (member/paren/brace), multi-assign and `~` placeholder
  - Control-flow joins across if/elseif/else, switch/otherwise, while/for loops, try/catch
  - Class attribute validation (invalid combos, duplicates, conflicts)
  - Import normalization/ambiguity checks
  - Fuzz seeds for lowering edge cases

## Notes and differences from MATLAB

- MATLAB is dynamically typed; HIR attaches conservative static types for optimization only. Programs acceptable to MATLAB remain acceptable; Unknown is used when insufficient info.
- Column-major Tensor semantics are preserved throughout indexing/slicing/shape operations.
- Class blocks are carried structurally; access/attribute validations run during lowering; advanced OOP attributes may have future passes.
- Metaclass expressions are represented explicitly; postfix static member/method usage is compiled appropriately downstream.

## Roadmap / future enhancements

- Inter-procedural propagation of struct field knowledge across calls
- Deeper OOP attribute validations (Hidden/Constant/Transient interplay; static/instance access rules)
- Richer import resolution summaries for static method/property lookup in the HIR stage
- Shape reasoning improvements for Tensor broadcasting and indexing

## Remaining edges

- Arguments metadata: carry `arguments ... end` declared names/constraints (when available from parser) and surface to runtime validation. Current parser accepts names; HIR will add optional metadata structs without breaking format.
- Multi-LHS validation: parser structurally restricts to identifiers/`~`; HIR enforces shape semantics at runtime. Additional unit tests exist; no further work is blocking.
- Globals/Persistents: cross-unit name binding is wired; additional tests around nested functions/closures will be added.

## Minimal example

MATLAB:

```
function y = f(s)
  if isfield(s, 'x') && any(strcmp(fieldnames(s), 'y'))
    s.y = 1;
  end
  y = g(s.x);
end
```

HIR sketch:

```
Function { name: "f", params: [s], outputs: [y], ... }
  If { cond: FuncCall("isfield", [Var(s), String('x')]) && any(strcmp(fieldnames(s),'y')), then: [ AssignLValue(Member(Var(s),'y'), Number(1)) ] }
  Assign(Var(y), FuncCall("g", [Member(Var(s), "x")]))
```

Return summaries infer type of `g`'s first output if available; variable analysis refines `s` as a Struct with fields `{x,y}` along the then-branch.