cas-parser 0.1.0

Parser for the CalcScript language
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
Parser for the CalcScript language.

`cas-rs` utilizes a custom scripting language called "CalcScript" to enable
interaction with all of its features. CalcScript is a mostly imperative,
expression-oriented language. It attempts to be as close to common mathematical
notation as possible, and keep syntax and visual noise minimal and readable,
while still adding useful features. See the [`examples/`](../examples) directory
for examples of basic programs written in CalcScript.

# Usage

This crate is responsible for making sense of code written in CalcScript, before
passing the parsed code to `cas-compute` to be evaluated, so it's typical to use
`cas-parser` and `cas-compute` together. Here is an example of how to parse
CalcScript code:

```rust
use cas_parser::parser::{ast::Stmt, Parser};

let code = "f(x) = x^2 + 5x + 6; f(25)";
let expr = Parser::new(code).try_parse_full_many::<Stmt>().unwrap();
println!("{:#?}", expr);

// output (simplified):
// [
//     Stmt {
//         expr: Assign(Assign {
//             target: Func(FuncHeader {
//                 name: LitSym { name: "f", span: 0..1 },
//                 params: [
//                     Symbol(LitSym { name: "x", span: 2..3 }),
//                 ],
//                 span: 0..4,
//             }),
//             op: AssignOp { kind: Assign, span: 5..6 },
// ...
```

See `cas-compute`'s documentation for examples of running CalcScript programs.

# Language guide

Below is a guide to the language. To try out the examples below quickly, use the
included REPL. Clone this repositroy and run the following command to start it.

```sh
cargo run --example repl
```

User-input is prefixed with `> `.

## Operators

Common operators used in math and programming are supported, including:

- `+`, `-`, `*`, `/`, `%` (modulo)
- `^` (exponentiation)
- `!` (factorial)
- `>`, `<`, `>=`, `<=`, `==`, `!=`, `&&`, `||`, `not`
- `~==`, `~!=` (approximate equality)
- `&`, `|`, `<<`, `>>`, `~` (bitwise operators)

```text
> 1 + 3
4

> 2 * 3 > 5
true

> 41!
3.3452526613163807108170062053440751665152 × 10 ^ 49
```

Implicit multiplication (writing multiplication without the `*` operator) is
also reasonably supported:

```text
> x = 2
2

> 2x
4

> 2(x + 3(x + 4))
40
```

See [here](#implicit-multiplication) for more information (and caveats) about
implicit multiplication.

## Assignment to variables and functions

Variables and functions are created using the `=` operator. Compound assignments
are also supported:

```text
> x = 2
2

> f(x) = x^2 + 2x + 1
()

> f(x)
9

> x *= x
4

> x ^= x
256

> f(x)
66049
```

Functions also support default parameters:

```text
> log(100)
2

> log(8, 2)
3

> f(x, factor = 1) = x * factor
()

> f(5)
5

> f(5, 2)
10
```

## Expression-oriented and types

CalcScript is an expression-oriented language. This means that all statements
are expressions that evaluate to some value. For example, these are all valid
expressions that yield integers or floating-point numbers:

```text
> 1 + 2
3

> 3(4 + 5)
27

> 6 * 7 / 8
5.25
```

These are also valid expressions:

```text
> t = 0
0

> (x = 2) + (y = 3)
5

> while t < 10 then {
  x *= y % 416
  t += 1
}; x
118098
```

The block expression, `{ ... }`, can contain multiple statements inside of it.
It evaluates to the last statement within it:

```text
> {
  x = 2
  y = 3
  x + y
}
5
```

Being an expression, a block can be used in any place where an expression would
be expected. As an example:

```text
> a = 3; b = 4
4

> if { if a > b then a else b } == a then a + b else a - b
-1
```

While this example is contrived, it provides a good example of how expressive
CalcScript can be.

### Unit type

The unit type `()` is a special type that has only one value, also called `()`.
It is used to indicate that a value is not particularly useful, and is the
return type of functions that don't have an explicit `return` expression. This
is similar to `void` in C-like languages, `None` in Python, `undefined` in
JavaScript, and `()` in Rust.

Adding a semicolon to the end of an expression will evaluate, then discard the
value of that expression and return `()` instead.

Using most operators with `()` will result in an evaluation error, with the
exception of comparison-based operators, such as `==`, `!=`, `>`, `<`, etc. This
can be useful for checking if a function call succeeded or not:

```test
quadratic_formula(a, b, c, plus = true) = {
    discriminant = b^2 - 4 a c
    if discriminant >= 0 then {
        left = -b / (2a)
        right = sqrt(discriminant) / (2a)
        if plus then left + right else left - right
    }
}

if quadratic_formula(1, 2, 3) == () then {
    // has no real roots
} else {
    // has real roots
}
```

## Comments

Comments in CalcScript are denoted by `//` and continue until the end of the
line. Comments can be placed anywhere in the code, and any text following `//`
will be ignored by the parser.

Comments are typically used to describe or explain the reasoning behind your
code, or to temporarily disable a line of code for debugging purposes:

```test
// the x and y-position of a point, in meters
x = 2
y = 3

// computes the distance from the origin
// distance = sqrt(x^2 + y^2)
distance = hypot(x, y) // faster than sqrt(x^2 + y^2)

distance
```

## Programming constructs

`cas-rs` supports usual programming constructs, such as `if` / `else`
statements, `loop`s, and `while` loops.

In the case of `if` / `else` statements, you often will not need to enclose
conditions or branches with any special syntax (you can do so with curly braces
or parentheses if needed):

```test
my_abs(x) = if x < 0 then -x else x
quadratic_formula(a, b, c, plus = true) = {
    discriminant = b^2 - 4 a c
    if discriminant >= 0 then {
        left = -b / (2a)
        right = sqrt(discriminant) / (2a)
        if plus then left + right else left - right
    }
}
```

`loop`s and `while` loops are also supported. A `loop` expression will execute
its body forever, while a `while` expression will run its body for as long as
the given condition is true. Within the scope of a `loop` / `while` expression,
the `break` and `continue` keywords can be used to break out of the loop or skip
to the next iteration, respectively:

```test
my_factorial(n) = {
    i = 1
    result = 1
    while i < n then {
        i += 1
        result *= i
    }
    result
}
```

The `break` keyword can also be used to exit a loop while also returning a value
from the loop. For example, the following function returns the least common
multiple of two numbers:

```test
lcm(a, b) = {
    i = 0
    loop {
        i += 1
        if i % a == 0 && i % b == 0 then {
            break i
        }
    }
}
```

## Radix notation

Radix notation is CalcScript's standard method of writing integers in bases
other than base-10. To type a number in radix notation, type the base, followed
by a single quote, followed by the digits of the number. For example, this is
the number 1072, expressed in various different bases:

```text
> a = 2'10000110000
1072

> b = 8'2060
1072

> c = 25'1hm
1072

> d = 32'11g
1072

> f = 47'mC
1072
```

Each base is defined in terms of the following alphabet:

```text
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ+/
```

## Implicit multiplication

CalcScript features implicit multiplication as a convenience. This means in many
cases, you can omit the `*` symbol when multiplying two expressions, as one
might in commonly accepted mathematical notation. For example, the following
code is valid:

```text
> x = 2
2

> 2(x + 3)
10
```

However, there are some important things to note about implicit multiplication:

### Shares precedence with explicit multiplication

In CalcScript, implicitly inserted multiplication has the **same precedence** as
explicit multiplication, division, and remainder division. Adding an explicit
multiplication operator in the place of implicit multiplication will **always**
evaluate to the same result.

This is contrary to some calculators and mathematical literature, which will
often treat implicit multiplication as having higher precedence than explicit
multiplication. For example, running this example on some other calculators
would result in `f` and `g` having the same value:

```text
// !!! THIS IS NOT THE BEHAVIOR OF CAS-RS! !!!

a = 4

f = 1 / 2a
g = 1 / (2a)
```

In the following CalcScript example, `f`, `g`, and `h` evaluate to the **same
value** (`1 / 2 * 4 = 2`):

```text
a = 4

f = 1 / 2a
g = (1 / 2)a
h = 1 / 2 * a
```

It's important to remember this distinction when copying mathematical notation
into CalcScript.

### Whitespace

CalcScript is parsed deterministically, meaning that the parser will always
produce the same result for the same input. However, implicit multiplication and
whitespace can have unexpected interactions that may appear ambiguous.

#### Between symbols

In the below example, there must be whitespace between `a` and `c`, otherwise
the parser will treat `ac` as a single symbol.

```text
discriminant(a, b, c) = b^2 - 4a c
```

This example would fail when calling the function at runtime, due to the
variable `ac` not being defined:

```text
discriminant(a, b, c) = b^2 - 4ac
```

#### Function calls

Currently, any expression like `f(x)` is interepted as a function call, not `f`
multiplied by `x`. Additionally, `f (x)` (with one or more spaces in between) is
_also_ interpreted as a function call.

I am considering changing this behavior in the future; see
[this issue](https://github.com/ElectrifyPro/cas-rs/issues/3) where I've
weighed multiple alternatives.

#### Newlines

Implicit multiplication is restricted to **individual lines**.

This may seem like an obvious choice, but in the past, this wasn't the case,
which made it incredibly easy to write ambiguous code that produced unexpected
results. For example, today, this code will output `true`, as expected:

```text
my_factorial(n) = {
    out = n
    while n > 1 then {
        n -= 1
        out *= n
    }
    out
}

my_factorial(14) == 14!
```

But in the past, this code would not have compiled due to implicit
multiplication being inserted everywhere (literally). A semicolon (`;`) was
required if one wanted to avoid this behavior:

```text
my_factorial(n) = {
    out = n;
    while n > 1 then {
        n -= 1;
        out *= n;
    };
    out
};

my_factorial(14) == 14!
```

Today, these semicolons are optional, and are only necessary if you want to
write multiple statements on a single line.

## High-quality error reporting

It is a design goal to make the parser as helpful as possible. For example, this
is the generated error if the user inputs incomplete [radix
notation](#radix-notation):

```text
> 2' + 3
Error: missing value in radix notation
   ╭─[input:1:2]
   │
 1 │ 2' + 3
   │  ┬
   │  ╰── I was expecting to see a number in base 2, directly after this quote
   │
   │ Help: base 2 uses these digits (from lowest to highest value): 01
───╯
```

Here is a variant of the above error:

```text
> 2'+ 3
Error: invalid digits in radix notation: `+`
   ╭─[input:1:3]
   │
 1 │ 2'+ 3
   │   ┬
   │   ╰── if you're trying to add two values, add a space between each value and this operator
   │
   │ Help: base 2 uses these digits (from lowest to highest value): 01
───╯
```

There is a lot of room for improvement in these error messages, but this is a
good start.