archscript 0.2.2

ArchScript programming language — Python-like syntax, Haskell-inspired features, Arch Linux integration
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
# CLAUDE.md — ArchScript Project Reference

## Quick Start

```bash
cargo build                          # build
cargo test                           # run all 121 tests
cargo run -- eval "2 + 3 * 4"       # eval expression (prints: 14)
cargo run -- run examples/hello.as   # run a script
cargo run -- run examples/archlinux.as # run stdlib demo
cargo run -- parse examples/hello.as # dump AST
cargo run -- repl                    # start interactive REPL
just check                           # fmt + lint + test
just test-stdlib                     # run stdlib tests only
just repl                            # start REPL (shorthand)
```

File extension: `.as`

## Project Overview

ArchScript is a programming language designed on top of Arch Linux: Python-like syntax, Haskell-inspired functional features, first-class Arch ecosystem integration. Implementation is Rust-based with Pest PEG parser.

**Status**: Working interpreter for core subset with Arch Linux standard library (pacman, systemd, AUR, file I/O). No compiler/bytecode yet.

**Domain**: archscript.org (AWS Route53, zone Z06425983KLFNTEL8SI3D)

## Architecture

```
source.as → Pest PEG parser → AST → Tree-walking interpreter → output
              (archscript.pest)   (ast.rs)   (interpreter.rs)
                                              stdlib modules
                                    (pacman, systemd, aur, fs)
```

Pipeline: `parser::parse(source) -> Node::Module(Vec<Node>)` then `Interpreter::new().run(&ast) -> Value`

Stdlib modules are registered as Dict values in the global environment at interpreter creation. Member access (`pacman.install`) resolves to `BuiltinFn` values which are dispatched through `stdlib::call()`.

## File Map

| File | Role | Key types/functions |
|------|------|-------------------|
| `src/archscript.pest` | PEG grammar (Pest) | 193 lines, top rule: `archscript` |
| `src/ast.rs` | AST node definitions | `Node`, `Expr`, `BinOp`, `Pattern`, `FunctionDecl` |
| `src/parser.rs` | Pest tree → AST | `parse(source) -> Result<Node, ParseError>` |
| `src/interpreter.rs` | Tree-walking interpreter | `Interpreter::run(&Node) -> Result<Value, RuntimeError>` |
| `src/main.rs` | CLI (clap) | Subcommands: `run`, `parse`, `eval`, `repl` |
| `src/repl.rs` | Interactive REPL | `run_repl()`, `run_repl_with_io()`, REPL commands |
| `src/lib.rs` | Module exports | `pub mod ast, interpreter, parser, repl, stdlib` |
| `src/stdlib/mod.rs` | Stdlib registry + dispatch | `register_modules()`, `call()`, `command_result()` |
| `src/stdlib/pacman.rs` | Pacman package management | `install`, `remove`, `update`, `search`, `list`, `info`, `clean` |
| `src/stdlib/systemd.rs` | Systemd service management | `start`, `stop`, `restart`, `enable`, `disable`, `status` |
| `src/stdlib/aur.rs` | AUR helper wrapper | `install`, `search`, `update`, `info` |
| `src/stdlib/fs.rs` | File system operations | `read`, `write`, `exists`, `ls`, `mkdir`, `remove` |
| `tests/integration.rs` | 48 integration tests | Uses `parser::parse` + `Interpreter` directly |
| `examples/hello.as` | Hello world | Variables, arithmetic, lists, println |
| `examples/functions.as` | Functions demo | def, recursion, lambda |
| `examples/archlinux.as` | Arch Linux stdlib demo | pacman, systemd, aur, fs usage |
| `justfile` | Task runner recipes | build, test, test-stdlib, lint, run, ci |

## Dependencies

```toml
pest = "2.7"          # PEG parser
pest_derive = "2.7"   # derive macro for grammar
thiserror = "1"        # error types
clap = "4"             # CLI argument parsing
```

## Grammar Summary (archscript.pest)

**Top-level items** (separated by newlines or `;`):
- `import_statement``import path`, `import {a,b} from path`, `import path as alias`
- `variable_declaration``var name: Type = expr` (type annotation optional)
- `function_declaration``def name(params) -> RetType: body` (return type optional, `:` or `=` before body)
- `type_definition``type Name = TypeExpr`
- `data_definition``data Name = Con1(fields) | Con2`
- `trait_definition``trait Name { def ... }`
- `instance_definition``instance TraitName for Type { def ... }`
- `expression_statement` — any expression

**Expression precedence** (lowest to highest):
1. Pipe: `|>`
2. Assignment: `= += -= *= /= %= **=`
3. Logical OR: `|| or`
4. Logical AND: `&& and`
5. Equality: `== !=`
6. Relational: `< <= > >=`
7. Additive: `+ -`
8. Multiplicative: `* / // %`
9. Power: `**` (right-associative)
10. Unary: `- + ! not`
11. Postfix: `f()` `a[i]` `a.b`
12. Primary: literals, identifiers, parenthesized, if/for/while/match/lambda/comprehension

**Literals**: `int_literal` (42), `float_literal` (3.14, 1e5), `string_literal` ("..." or '...'), `boolean_literal` (True/False/true/false)

**Collections**: `[1, 2, 3]` (list), `{"k": "v"}` (dict), `(1, 2)` (tuple), `[x*2 for x in list if cond]` (comprehension)

**Whitespace**: `WHITESPACE = _{ " " | "\t" }` (implicit, auto-consumed). `NEWLINE` is separate and used as statement separator.

**Block indentation**: Uses Pest `PUSH/PEEK_ALL/DROP` for indent-based blocks (`block` rule: 4 spaces or tab).

**Keywords**: import, from, as, var, def, if, elif, else, for, while, in, match, type, data, trait, instance, and, or, not, is, return, True, true, False, false, lambda

**Lambda**: `lambda x, y: expr` (uses `lambda_params` not `param_list` to avoid `:` conflict with type annotations)

## AST Node Types

```
Node::Module(Vec<Node>)
Node::Import(Import::{Simple, Selective, Aliased})
Node::VariableDeclaration(name, Option<type>, Box<Expr>)
Node::FunctionDeclaration(FunctionDecl{name, params, return_type, body})
Node::TypeDefinition(name, TypeExpr)
Node::DataDefinition(name, Vec<DataConstructor>)
Node::TraitDefinition(name, Vec<FunctionDecl>)
Node::InstanceDefinition(name, TypeExpr, Vec<FunctionDecl>)
Node::Expression(Expr)
Node::Return(Option<Expr>)
```

**Expr variants** (26 total):
- Literals: `Integer(i64)`, `Float(f64)`, `StringLit(String)`, `Boolean(bool)`, `Identifier(String)`
- Collections: `List(Vec<Expr>)`, `Dict(Vec<(Expr,Expr)>)`, `Tuple(Vec<Expr>)`
- Operations: `BinaryOp(BinOp, lhs, rhs)`, `UnaryOp(UnaryOp, expr)`, `Pipe(lhs, rhs)`
- Access: `FunctionCall(callee, args)`, `Index(expr, idx)`, `MemberAccess(expr, field)`
- Control: `If(cond, then, elifs, else)`, `For(var, iter, body)`, `While(cond, body)`, `Match(subject, arms)`
- Functions: `Lambda(params, body)`, `ListComprehension(expr, var, iter, filter)`
- Blocks: `Block(Vec<Node>)`
- Assignment: `Assign(target, value)`, `CompoundAssign(op, target, value)`

**BinOp**: Add, Sub, Mul, Div, IntDiv, Mod, Pow, Eq, Neq, Lt, Lte, Gt, Gte, And, Or
**UnaryOp**: Neg, Pos, Not
**Pattern**: Wildcard, Literal(Expr), Identifier(String), Tuple(Vec), List(Vec, Option<rest>), Constructor(name, Vec)

## Interpreter Runtime

**Value enum**: Integer(i64), Float(f64), String, Boolean, List(Vec<Value>), Dict(Vec<(Value,Value)>), Tuple(Vec<Value>), Function(FuncValue), BuiltinFn(String), None

**Environment**: `Vec<HashMap<String, Value>>` — stack of scopes. Methods: `get(name)`, `set(name, val)` (updates existing or creates in current), `define(name, val)` (always current scope), `push_scope()`, `pop_scope()`.

**Function calls**: Closure-based. `FuncValue` stores `name: Option<String>`, params, body expr, closure env. On call: clone closure, push scope, bind function name for recursion, bind params, swap env, eval body, restore env.

**Signal enum**: `Value(Value)` | `Return(Value)` — used internally for early return propagation.

**Built-in functions** (12):
| Function | Signature | Description |
|----------|-----------|-------------|
| `print` | `print(args...)` | Output to stdout (captured in `interpreter.output`) |
| `println` | `println(args...)` | Same as print (no newline difference in capture mode) |
| `len` | `len(collection)` | Length of list, string, or dict |
| `range` | `range(end)` / `range(start, end)` / `range(start, end, step)` | Generate integer list |
| `str` | `str(value)` | Convert to string |
| `int` | `int(value)` | Convert to integer |
| `float` | `float(value)` | Convert to float |
| `type` | `type(value)` | Return type name as string |
| `map` | `map(fn, list)` | Apply function to each element |
| `filter` | `filter(fn, list)` | Keep elements where fn returns truthy |
| `sum` | `sum(list)` | Sum numeric list |
| `append` | `append(list, item)` | Return new list with item appended |

**Truthiness**: False=falsy, 0=falsy, 0.0=falsy, ""=falsy, []=falsy, None=falsy, everything else truthy.

**Type coercion**: Int+Float -> Float, Int/Int -> Float (division always produces float), String+String -> concat, List+List -> concat, String*Int -> repeat.

## Known Bugs Fixed

1. **`var y = 13.1` parse failure** — Fixed by ensuring `float_literal` is in `primary_expression` and reachable via expression chain from `variable_declaration`. Root cause was the original grammar's `expression` not reaching `literal` via `primary_expression`.

2. **Lambda `:` ambiguity**`lambda x: expr` conflicted with param type annotation `param: Type`. Fixed by using separate `lambda_params = { identifier ~ ("," ~ identifier)* }` instead of `param_list`.

3. **Recursive function undefined**`FuncValue.closure` captured env before function was defined. Fixed by adding `name: Option<String>` to `FuncValue` and binding the function itself in call env: `call_env.define(name, func.clone())`.

4. **Postfix `call_args` not reaching parser**`postfix_op` was non-silent, wrapping `call_args`/`index_access`/`member_access` inside a `Rule::postfix_op`. Parser code matched on `Rule::call_args` directly. Fixed by making `postfix_op` silent: `postfix_op = _{ ... }`.

## Test Coverage

- **20 unit tests** in `src/parser.rs` and `src/interpreter.rs`
- **26 unit tests** in `src/repl.rs` (REPL session tests, continuation detection)
- **29 unit tests** in `src/stdlib/` (pacman: 9, systemd: 8, aur: 6, fs: 6)
- **48 integration tests** in `tests/integration.rs`
- **123 total tests**
- Tests cover: all literal types, variable declarations (including the float bug), arithmetic with precedence, string concat, comparisons, logical operators, all built-in functions, function def/call, recursion, lambdas, imports, multi-statement programs, stdlib module access, pacman/systemd/aur command generation, fs read/write/exists/ls/mkdir/remove, stdlib error handling, REPL session persistence, REPL commands, multiline continuation detection

## Design Goals (from original spec)

1. Python-like syntax, Haskell-inspired functional features
2. First-class Arch Linux integration (pacman, systemctl, AUR wrappers in stdlib)
3. Consumer-driven contract testing (CDCT) support
4. Microservices and distributed systems first-class support
5. Actor-based concurrency model
6. WebAssembly compile target
7. LSP support
8. ArchAgent integration (GPT-4 agent that generates/runs ArchScript)

## Standard Library — Arch Linux Integration

The stdlib provides four modules registered as Dict values in the global scope. Each module's functions are accessible via member access (e.g., `pacman.install("vim")`). Functions that wrap system commands execute them via `std::process::Command` and return a result Dict with `{success, output, code, command}` fields.

### `pacman` — Package Management

| Function | Command | Description |
|----------|---------|-------------|
| `pacman.install(name)` | `sudo pacman -S --noconfirm <name>` | Install a package |
| `pacman.remove(name)` | `sudo pacman -R --noconfirm <name>` | Remove a package |
| `pacman.update()` | `sudo pacman -Syu --noconfirm` | Full system update |
| `pacman.search(query)` | `pacman -Ss <query>` | Search packages |
| `pacman.list()` | `pacman -Q` | List installed packages |
| `pacman.info(name)` | `pacman -Qi <name>` | Get package info |
| `pacman.clean()` | `sudo pacman -Scc --noconfirm` | Clean package cache |

### `systemd` — Service Management

| Function | Command | Description |
|----------|---------|-------------|
| `systemd.start(service)` | `sudo systemctl start <service>` | Start a service |
| `systemd.stop(service)` | `sudo systemctl stop <service>` | Stop a service |
| `systemd.restart(service)` | `sudo systemctl restart <service>` | Restart a service |
| `systemd.enable(service)` | `sudo systemctl enable <service>` | Enable at boot |
| `systemd.disable(service)` | `sudo systemctl disable <service>` | Disable at boot |
| `systemd.status(service)` | `systemctl status <service>` | Check status |

### `aur` — AUR Package Management

Uses `yay` as the default AUR helper.

| Function | Command | Description |
|----------|---------|-------------|
| `aur.install(name)` | `yay -S --noconfirm <name>` | Install AUR package |
| `aur.search(query)` | `yay -Ss <query>` | Search AUR |
| `aur.update()` | `yay -Sua --noconfirm` | Update AUR packages |
| `aur.info(name)` | `yay -Qi <name>` | Get AUR package info |

### `fs` — File System Operations

Uses Rust's `std::fs` for safe, cross-platform file I/O. Returns values directly (not command result Dicts).

| Function | Returns | Description |
|----------|---------|-------------|
| `fs.read(path)` | `String` | Read file contents |
| `fs.write(path, content)` | `Boolean` (true) | Write string to file |
| `fs.exists(path)` | `Boolean` | Check if path exists |
| `fs.ls(path)` | `List[String]` | List directory entries |
| `fs.mkdir(path)` | `Boolean` (true) | Create directory (with parents) |
| `fs.remove(path)` | `Boolean` (true) | Remove file or directory |

### Stdlib Implementation Notes

- Modules are registered in `Environment::new()` via `stdlib::register_modules()`
- Each module is a `Value::Dict` containing `Value::BuiltinFn` entries
- Member access (`pacman.install`) resolves through the interpreter's `MemberAccess``Dict` lookup
- System command functions use `std::process::Command` (no shell interpolation — safe from injection)
- The `call_builtin` method delegates to `stdlib::call()` for dotted names (e.g., `"pacman.install"`)

### Stdlib Usage Example

```
// Package management
var result = pacman.install("vim")
println(result.command)    // "sudo pacman -S --noconfirm vim"
println(result.success)    // True/False

// Service management
systemd.enable("sshd")
systemd.start("sshd")
var status = systemd.status("sshd")

// AUR packages
aur.install("visual-studio-code-bin")

// File system
fs.write("/tmp/hello.txt", "Hello from ArchScript!")
var content = fs.read("/tmp/hello.txt")
var exists = fs.exists("/tmp/hello.txt")
var files = fs.ls("/tmp")
fs.mkdir("/tmp/mydir")
fs.remove("/tmp/hello.txt")
```

## Interactive REPL

The REPL (Read-Eval-Print Loop) provides an interactive interpreter session via `archscript repl` or `just repl`.

### REPL Commands

| Command | Description |
|---------|-------------|
| `:help`, `:h`, `:?` | Show help message |
| `:quit`, `:exit`, `:q` | Exit the REPL |
| `:env` | Show all user-defined variables |
| `:reset` | Reset the interpreter environment |
| `:ast <expr>` | Show the AST for an expression |

### REPL Features

- **Persistent state**: Variables, functions, and values persist across lines within a session
- **Multiline input**: Lines ending with `:`, `(`, `[`, `{`, or `\` automatically continue to the next line. An empty line in multiline mode submits the accumulated input.
- **Error resilience**: Parse errors and runtime errors are displayed inline without terminating the session
- **Result display**: Non-None expression results are automatically printed
- **Testable I/O**: `run_repl_with_io<R: BufRead, W: Write>()` accepts generic I/O for testing

### REPL Implementation Notes

- `src/repl.rs` contains the full REPL implementation
- `run_repl()` is the stdin/stdout entry point; `run_repl_with_io()` is the testable generic version
- The interpreter instance is reused across inputs for state persistence
- `needs_continuation()` detects incomplete expressions by tracking bracket balance and trailing tokens
- REPL commands (`:` prefix) are parsed before attempting expression evaluation
- The `:env` command filters out built-in functions and stdlib modules, showing only user-defined names

## Planned Features (Not Yet Implemented)

### Language
- Indentation-based block scoping (partially: `block` rule exists but not fully wired)
- Class definitions (`class` keyword per BNF spec)
- Async/await
- Generator expressions
- Set literals
- Decorators/macros
- Type checking / gradual typing
- `try`/`catch`/`throw` error handling

### Standard Library (Planned Extensions)
- `postgresql.setup({...})` — service configuration
- `ssh.configure({...})` — SSH hardening
- `disk.optimize({...})` — disk operations
- `backup.create({...})` — backup management
- Networking, crypto modules

### Tooling
- LSP server
- WASM compilation target
- Package manager for ArchScript modules
- CI/CD templates (GitHub Actions)

### ArchAgent Integration
- ArchAgent is a GPT-4 agent that outputs ArchScript
- Contract: ArchAgent generates calls like `postgresql.setup({...})`, `pacman.install({...})`
- Stdlib APIs should match those intents
- Vanilla variant uses raw Arch/bash commands instead

## Coding Conventions

- Rust edition 2021, formatted with `rustfmt`
- `cargo clippy -- -D warnings` must pass
- Grammar changes in `archscript.pest` require rebuild (Pest derive macro)
- Parser functions named `build_<rule>` matching grammar rules
- Tests inline in modules (`#[cfg(test)] mod tests`) + separate `tests/integration.rs`
- No `unwrap()` in production code paths; use `?` or proper error handling
- `ParseError(String)` and `RuntimeError(String)` for error types

## Operator Reference

| Operator | Type | Notes |
|----------|------|-------|
| `+` `-` `*` `/` `%` | Arithmetic | `/` always returns float |
| `//` | Integer division | Integers only |
| `**` | Power | Right-associative |
| `==` `!=` | Equality | Cross-type int/float comparison works |
| `<` `<=` `>` `>=` | Relational | Numeric and string comparison |
| `and` `&&` | Logical AND | Short-circuit |
| `or` `\|\|` | Logical OR | Short-circuit |
| `not` `!` | Logical NOT | |
| `\|>` | Pipe | `expr \|> fn` calls `fn(expr)` |
| `=` | Assignment | |
| `+=` `-=` `*=` `/=` `%=` `**=` | Compound assignment | |

## Example ArchScript Code

```
// Variables and arithmetic
var x = 42
var pi = 3.14159
var greeting = "Hello, " + "World!"

// Functions
def add(a, b) = a + b
def factorial(n) = if n <= 1: 1
    else: n * factorial(n - 1)

// Lambda and higher-order
var double = lambda x: x * 2
var nums = range(10)
var evens = filter(lambda x: x % 2 == 0, nums)

// Pattern matching
match value {
    0 => println("zero"),
    x if x > 0 => println("positive"),
    _ => println("other")
}

// Pipe operator
result = data |> transform |> summarize

// Type definitions
data Color = Red | Green | Blue | Custom(r, g, b)
type Point = Tuple(Float, Float)

// Imports
import math
import { sqrt, pi } from math
import math as m

// Arch Linux stdlib
var result = pacman.install("vim")
println(result.command)
systemd.enable("sshd")
var content = fs.read("/etc/hostname")
```