infr 0.2.2

A gradually typed superset of R
# Infr — A Gradually Typed Superset of R

Infr (pronounced "infer") adds optional type annotations, `const`/`let` bindings, and static type checking to R. All valid R code is valid Infr code — zero annotations means zero errors. Infr transpiles to plain R with no runtime overhead.

See [infr-spec.md](infr-spec.md) for the full language specification.

## Quick Start

```bash
# Build the CLI
cargo build --release

# Type-check a file
./target/release/infr check script.infr

# Type-check and emit .R
./target/release/infr build script.infr

# Batch transpile a directory
./target/release/infr build src/ -o R/

# Watch mode (re-checks on save)
./target/release/infr watch src/

# Initialize a new project
./target/release/infr init
```

## Example

```r
# script.infr
const greet <- function(name: character, excited: logical = FALSE) -> character {
  if (excited) {
    paste0("Hello, ", name, "!")
  } else {
    paste("Hello", name)
  }
}

const msg: character <- greet("Alice", excited = TRUE)

# Data frame with typed columns
const df: data.frame<{id: integer, name: character, score: numeric}> <- data.frame(
  id = 1:3L,
  name = c("Alice", "Bob", "Charlie"),
  score = c(95.5, 87.0, 92.3)
)

df$score      # OK — returns numeric
df$nonexistent  # Error: Column `nonexistent` does not exist
```

Running `infr build script.infr` produces clean R:

```r
greet <- function(name, excited = FALSE) {
  if (excited) {
    paste0("Hello, ", name, "!")
  } else {
    paste("Hello", name)
  }
}

msg <- greet("Alice", excited = TRUE)
```

## Features

### Bindings
- **`const`** — prevents reassignment (`const x <- 5; x <- 10` → error)
- **`let`** — explicitly mutable binding
- **Bare `<-`** — behavior depends on strictness level

### Type System
- **Primitives**: `numeric`, `integer`, `character`, `logical`, `complex`, `raw`
- **Nullable**: `numeric?` (shorthand for `numeric | NULL`)
- **Unions**: `numeric | character`
- **Typed lists**: `list<{name: character, age: numeric}>`
- **Data frames**: `data.frame<{id: integer, name: character}>`
- **Function types**: `(numeric, numeric) -> numeric`
- **S4 classes**: `S4<ClassName>{slot: type}`
- **Sized vectors**: `numeric[3]`
- **Readonly**: `readonly numeric`
- **Generics**: `function<T>(x: T) -> T`

### Type Checking
- Type inference from literals, operators, and known functions
- Function signature checking (argument types, return types, arity)
- Data frame column access checking
- Typed list field access checking
- Type narrowing via `is.*()` and `is.null()` in conditionals
- `inherits()` narrowing
- Pipe (`|>`) type propagation
- S4 slot access checking

### Escape Hatches
- **`any`** type — opts out of checking
- **`# @infr-ignore`** — suppresses the next diagnostic
- **`# @infr-nocheck`** — disables checking for the entire file

### Strictness Levels

| Level | Bare `<-` | Nullable access |
|---|---|---|
| `relaxed` (default) | No warning | No diagnostic |
| `moderate` | Warning | Warning |
| `strict` | Error | Error |

Configure in `infr.toml`:
```toml
[check]
strictness = "moderate"
warn_implicit_any = true
warn_unused_const = true
```

## Project Structure

```
src/
  lexer/          # Tokenizer — R + Infr extensions (const, let, :, ->)
    mod.rs
    token.rs
  parser/         # Recursive descent parser — full R grammar + Infr types
    mod.rs
    ast.rs
  types/          # Type definitions and assignability rules
    mod.rs
  checker/        # Type checker — inference, narrowing, diagnostics
    mod.rs
    builtins.rs   # ~350 built-in function signatures (base, stats, utils, grDevices)
  transpiler/     # Emits clean R — strips const/let, type annotations
    mod.rs
  config/         # TOML configuration (infr.toml)
    mod.rs
  declarations/   # .d.infr declaration file parser
    mod.rs
  cli/            # CLI commands: check, build, watch, init, lsp
    mod.rs
  lsp/            # LSP server (diagnostics, completion, hover)
    mod.rs
  main.rs         # Entry point

declarations/     # Built-in .d.infr type declarations
  base.d.infr
  stats.d.infr
  dplyr.d.infr
  tidyr.d.infr
  purrr.d.infr
  ggplot2.d.infr
  readr.d.infr
  stringr.d.infr

editors/
  vscode/         # VS Code extension (syntax highlighting + LSP client)

tests/
  integration_tests.rs   # CLI-level integration tests
  conformance/           # .infr files with expected diagnostics in #> comments
  snapshots/             # Transpilation input/output pairs (.infr → .R)
```

## Development

### Prerequisites
- Rust (edition 2024)
- For VS Code extension: Node.js + npm

### Building

```bash
cargo build           # Debug build
cargo build --release # Release build
```

### Running Tests

```bash
# Run all tests (unit + integration + conformance + snapshots)
cargo test

# Run only unit tests
cargo test --lib

# Run only integration tests
cargo test --test integration_tests

# Run a specific test
cargo test test_const_reassignment

# Run with output visible
cargo test -- --nocapture
```

### Test Architecture

Tests are organized at three levels:

**1. Unit tests** (`#[cfg(test)]` in each module):
- Lexer: tokenization of R and Infr syntax
- Parser: AST construction for all statement/expression types
- Type system: assignability rules, narrowing operations
- Checker: type checking for all features (const/let, annotations, inference, narrowing, pipes, S4, etc.)
- Transpiler: output correctness for all Infr constructs
- Config: TOML parsing and defaults

**2. Integration tests** (`tests/integration_tests.rs`):
- End-to-end CLI tests using the compiled binary
- Tests every feature through `infr check` and `infr build`
- Includes zero-false-positive tests on plain R code

**3. Conformance tests** (`tests/conformance/*.infr`):
- Self-contained `.infr` files with expected diagnostics as `#>` comments
- Format: `expr  #> Error [infr]: message pattern` or `expr  #> OK`
- Run automatically as part of `cargo test`
- Easy to add new test cases — just create a new `.infr` file

**4. Snapshot tests** (`tests/snapshots/`):
- Pairs of `.infr` input and expected `.R` output
- Verifies transpilation produces exact expected output

### Adding a Conformance Test

Create a `.infr` file in `tests/conformance/`:

```r
# tests/conformance/my_feature.infr

# Lines with #> OK expect no error
const x: numeric <- 5  #> OK

# Lines with #> Error expect a matching error
const y: numeric <- "hello"  #> Error [infr]: Type mismatch

# Lines with #> Warning expect a matching warning
const df <- data.frame(a = 1)
df$b <- 2  #> Warning [infr]: Mutating const binding
```

### VS Code Extension

```bash
cd editors/vscode
npm install
npm run compile

# To test: open VS Code, press F5 to launch Extension Development Host
# The extension connects to `infr lsp` for diagnostics
```

### LSP Server

```bash
# Start the LSP server (used by the VS Code extension)
./target/release/infr lsp
```

## Declaration Files

Type declarations for R packages use `.d.infr` files (similar to TypeScript's `.d.ts`):

```r
# types/mypackage.d.infr
my_function <- function(x: numeric, y: character) -> logical
another_func <- function(...: any) -> data.frame
```

Include them in `infr.toml`:
```toml
[declarations]
include = ["types/mypackage.d.infr"]
```

Built-in declarations are provided for: base, stats, dplyr, tidyr, purrr, ggplot2, readr, stringr.

## Architecture

The pipeline is: **Source → Lexer → Parser → AST → Type Checker → Transpiler → R output**

- **Lexer** tokenizes R syntax plus Infr extensions (`const`, `let`, `:` for type annotations, `->` for return types)
- **Parser** builds a full AST using recursive descent with precedence climbing for expressions
- **Type Checker** walks the AST, infers types, checks constraints, and emits diagnostics
- **Transpiler** walks the AST and emits clean R, stripping all Infr-specific syntax

The type system is **gradual**: unannotated code is unchecked (resolves to `any`), so existing R code passes through with zero errors. Types are checked only where annotations are present.

## License

See the project license file for details.