tsrun 0.1.19

A TypeScript interpreter designed for embedding in applications
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
# Profiling tsrun

This document describes how to profile the tsrun interpreter to identify performance bottlenecks.

## Build Profiles

The project has three build profiles:

| Profile | Command | Use Case |
|---------|---------|----------|
| `dev` | `cargo build` | Development, debugging |
| `release` | `cargo build --release` | Production, benchmarks |
| `profiling` | `cargo build --profile profiling` | Performance analysis with symbols |

The `profiling` profile inherits from `release` but keeps debug symbols and doesn't strip the binary, which is essential for meaningful profiler output.

## Quick Performance Check

Use `/usr/bin/time -v` for a quick overview of execution time and memory usage:

```bash
# Build release binary
cargo build --release

# Run with timing
/usr/bin/time -v ./target/release/tsrun examples/memory-management/gc-no-cycles.ts
```

Example output:
```
User time (seconds): 0.09
Maximum resident set size (kbytes): 4284
Minor (reclaiming a frame) page faults: 344
```

Key metrics to watch:
- **User time (seconds)**: CPU time spent in user mode
- **Maximum resident set size (kbytes)**: Peak memory usage
- **Minor page faults**: Memory allocation activity

## Profiling with perf

### Basic CPU Profiling

```bash
# Build with profiling profile (includes debug symbols)
cargo build --profile profiling

# Record performance data
perf record -g ./target/profiling/tsrun examples/memory-management/gc-no-cycles.ts

# View flat profile (top functions by self time)
perf report --stdio --sort=symbol --no-children | head -50

# View call graph
perf report --stdio --sort=symbol | head -80
```

### Hardware Performance Counters

```bash
# Get detailed CPU statistics
perf stat -e cycles,instructions,cache-references,cache-misses \
    ./target/profiling/tsrun examples/memory-management/gc-no-cycles.ts
```

Example output:
```
   492,057,481      cycles:u
 1,544,622,706      instructions:u                   #    3.14  insn per cycle
    19,850,244      cache-references:u
        79,108      cache-misses:u                   #    0.40% of all cache refs
```

Key metrics:
- **insn per cycle (IPC)**: Higher is better (max ~4-6 on modern CPUs)
- **cache-misses %**: Lower is better (<5% is good)

## Generating Flamegraphs

Flamegraphs provide a visual representation of where time is spent:

```bash
# Install flamegraph tool (one-time)
cargo install flamegraph

# Generate flamegraph (requires profiling build for good symbols)
cargo flamegraph --profile profiling --bin tsrun \
    -o flamegraph.svg -- examples/memory-management/gc-no-cycles.ts

# Open in browser
firefox flamegraph.svg
```

## Interpreting Profiler Output

### Example perf report output

From profiling `gc-no-cycles.ts` (50K iterations, 150K objects):

```
# Overhead  Symbol
# ........  .......................................
    20.82%  tsrun::interpreter::bytecode_vm::BytecodeVM::run
     8.79%  tsrun::interpreter::bytecode_vm::BytecodeVM::execute_op
     8.67%  tsrun::interpreter::Interpreter::env_get
     5.10%  tsrun::gc::Guard<T>::alloc
     4.41%  tsrun::value::PropertyStorage::iter
     4.12%  tsrun::interpreter::bytecode_vm::BytecodeVM::set_reg
     3.58%  core::ptr::drop_in_place<Rc<RefCell<Space>>>
     2.83%  <PropertyStorageIter as Iterator>::next
     2.38%  alloc::vec::Vec<T,A>::push
     2.32%  alloc::rc::Weak<T,A>::upgrade
```

### Common hotspots and their meanings

| Symbol | What it does | Optimization hints |
|--------|--------------|-------------------|
| `BytecodeVM::run` | Main VM execution loop | Core work, hard to optimize |
| `BytecodeVM::execute_op` | Opcode dispatch | Core work |
| `env_get` | Variable lookup | Reduce scope chain depth |
| `Guard::alloc` | GC object allocation | Reduce allocations, reuse objects |
| `PropertyStorage::iter` | Property iteration | Reduce property access in loops |
| `set_reg` | VM register writes | Core work |
| `drop_in_place<Rc>` | Reference counting cleanup | Reduce Rc usage in hot paths |
| `Weak::upgrade` | Weak reference upgrade | GC bookkeeping |
| `malloc` / `cfree` | Memory allocation | Reduce allocations, use arena allocators |
| `clone` | Value cloning | Use `Rc` for shared ownership |

## Benchmarking Best Practices

### Consistent benchmarking environment

```bash
# Disable CPU frequency scaling (requires root)
sudo cpupower frequency-set --governor performance

# Run multiple times and take the median
for i in {1..5}; do
    /usr/bin/time -f "%e" ./target/release/tsrun \
        examples/memory-management/gc-no-cycles.ts 2>&1 | tail -1
done
```

### Comparing before/after changes

```bash
# Save baseline
git stash  # or checkout baseline commit
cargo build --release
/usr/bin/time -v ./target/release/tsrun \
    examples/memory-management/gc-no-cycles.ts 2>&1 | tee baseline.txt

# Apply changes
git stash pop  # or checkout new commit
cargo build --release
/usr/bin/time -v ./target/release/tsrun \
    examples/memory-management/gc-no-cycles.ts 2>&1 | tee optimized.txt

# Compare
diff baseline.txt optimized.txt
```

## Test Files for Profiling

The `examples/memory-management/` directory contains good profiling targets:

| File | What it tests |
|------|---------------|
| `gc-no-cycles.ts` | Baseline: 150K simple objects without cycles (recommended starting point) |
| `gc-scale-test.ts` | Same object count WITH circular references (compare to baseline) |
| `stress-test.ts` | Object creation, property access, closures |
| `main.ts` | Comprehensive test (imports all other tests) |
| `object-churn.ts` | Heavy object allocation/deallocation |
| `closure-lifetime.ts` | Closure creation and capture |
| `circular-refs.ts` | Circular reference handling |
| `scope-cleanup.ts` | Scope/environment management |

## Memory Profiling with Valgrind

```bash
# Build debug binary (more accurate but slower)
cargo build

# Check for memory leaks
valgrind --leak-check=full \
    ./target/debug/tsrun examples/memory-management/gc-no-cycles.ts

# Profile memory allocation patterns
valgrind --tool=massif \
    ./target/debug/tsrun examples/memory-management/gc-no-cycles.ts

# View massif output
ms_print massif.out.*

# Clean up
rm -f massif.out.*
```

### Example massif output

From profiling `gc-no-cycles.ts`:

```
    KB
343.0^#
     |#:::::::::::@:::@:::::::@:::::::@::::::@::::::@::::::@
   0 +----------------------------------------------------------------------->
```

**Summary:**
| Metric | Value |
|--------|-------|
| Peak heap usage | 343 KB |
| Total allocations | 357,032 |
| Total bytes allocated | 57 MB |
| Memory leaks | 0 bytes |

**Allocation breakdown at peak:**
| % | Source |
|---|--------|
| 91% | `regex_automata` - Lexer regex compilation (one-time startup) |
| 4% | `Rc<Expression>` - AST nodes |
| 3% | `JsString` - String interning |
| 1% | `Guard::alloc` - GC object allocation |
| 1% | Parser vectors |

The flat memory profile confirms GC is working - 150K objects created but peak stays at 343 KB.

## Profiling Specific Operations

### Profile only parsing

```bash
# Create a test that only parses (no execution)
cat > /tmp/parse-only.rs << 'EOF'
use tsrun::parser::parse;
use std::fs;

fn main() {
    let source = fs::read_to_string("examples/memory-management/gc-no-cycles.ts").unwrap();
    for _ in 0..1000 {
        let _ = parse(&source);
    }
}
EOF
```

### Profile with specific workload

```bash
# Create a focused test case
cat > /tmp/test.ts << 'EOF'
// Test specific feature
for (let i = 0; i < 100000; i++) {
    const obj = { x: i, y: i * 2 };
    const sum = obj.x + obj.y;
}
EOF

cargo build --profile profiling
perf record -g ./target/profiling/tsrun /tmp/test.ts
perf report --stdio
```

## Optimization Checklist

When profiling reveals a bottleneck, consider:

1. **Hashing overhead (>10%)**
   - Switch from `std::HashMap` to `rustc_hash::FxHashMap`
   - Pre-compute hash values for frequently used keys

2. **String operations (>5%)**
   - Use `Rc<str>` (JsString) instead of `String`
   - Avoid `to_string()` in hot paths
   - Use direct comparison methods like `eq_str()`

3. **Memory allocation (>5%)**
   - Use object pools or arenas
   - Reduce cloning with `Rc` shared ownership
   - Pre-allocate vectors with known capacity

4. **Property lookup (>10%)**
   - Cache prototype chain lookups
   - Use inline caches for repeated property access
   - Consider shape/hidden class optimization

## Continuous Performance Monitoring

Add performance tests to CI:

```bash
# In CI script
cargo build --release
BASELINE=0.10  # seconds (gc-no-cycles.ts baseline)
ACTUAL=$(/usr/bin/time -f "%e" ./target/release/tsrun \
    examples/memory-management/gc-no-cycles.ts 2>&1)

if (( $(echo "$ACTUAL > $BASELINE * 1.2" | bc -l) )); then
    echo "Performance regression: ${ACTUAL}s > ${BASELINE}s * 1.2"
    exit 1
fi
```

## Criterion Benchmarks

The project includes criterion-based microbenchmarks for the lexer and parser.

### Running Benchmarks

```bash
# Run all benchmarks
cargo bench

# Run only lexer benchmarks
cargo bench --bench lexer

# Run only parser benchmarks
cargo bench --bench parser

# Run specific benchmark group
cargo bench --bench lexer -- lexer/throughput
cargo bench --bench parser -- parser/individual

# Quick benchmark run (fewer samples)
cargo bench -- --quick
```

### Benchmark Groups

**Lexer benchmarks** (`benches/lexer.rs`):
| Group | What it measures |
|-------|------------------|
| `lexer/individual` | Tokenization of specific code patterns (strings, operators, classes, etc.) |
| `lexer/throughput` | Throughput at different source sizes (1KB to 500KB) |
| `lexer/token_types` | Performance of tokenizing specific token types |

**Parser benchmarks** (`benches/parser.rs`):
| Group | What it measures |
|-------|------------------|
| `parser/individual` | Parsing specific code patterns |
| `parser/throughput` | Throughput at different source sizes |
| `parser/expression_depth` | Binary expression tree parsing at different depths |
| `parser/statements` | Parsing many statements (lets, functions, classes) |
| `parser/string_interning` | String dictionary performance with repeated vs unique identifiers |

### Viewing Results

Criterion generates HTML reports in `target/criterion/`:

```bash
# Open the report index
firefox target/criterion/report/index.html
```

### Comparing Against Baseline

```bash
# Save current as baseline
cargo bench -- --save-baseline main

# Make changes, then compare
cargo bench -- --baseline main
```

## Dedicated Profiling Binaries

For detailed profiling with `perf` or `flamegraph`, use the dedicated profiling binaries:

### Lexer Profiling

```bash
# Build with profiling profile
cargo build --profile profiling --bin profile_lexer

# Run with custom size and iterations
# Usage: profile_lexer [size_bytes] [iterations]
./target/profiling/profile_lexer 500000 50

# Profile with perf
perf record -g ./target/profiling/profile_lexer 500000 50
perf report --stdio --sort=symbol --no-children | head -50

# Generate flamegraph
cargo flamegraph --profile profiling --bin profile_lexer -o lexer-flamegraph.svg -- 500000 50
```

### Parser Profiling

```bash
# Build with profiling profile
cargo build --profile profiling --bin profile_parser

# Run with custom size and iterations
# Usage: profile_parser [size_bytes] [iterations]
./target/profiling/profile_parser 500000 10

# Profile with perf
perf record -g ./target/profiling/profile_parser 500000 10
perf report --stdio --sort=symbol --no-children | head -50

# Generate flamegraph
cargo flamegraph --profile profiling --bin profile_parser -o parser-flamegraph.svg -- 500000 10
```

### Example Output

```
$ ./target/profiling/profile_lexer 500000 50
Generating 488KB source...
Source size: 500202 bytes
Running 50 iterations of lexer...
Done in 319.13ms
Total tokens: 6211350
Throughput: 78.37 MB/s

$ ./target/profiling/profile_parser 500000 10
Generating 488KB source...
Source size: 500137 bytes
Running 10 iterations of parser...
Done in 217.46ms
Total statements: 22530
Throughput: 23.00 MB/s
```

## Lexer/Parser Optimization Guide

### Common Lexer Hotspots

| Symbol | Typical % | What it does | Optimization hints |
|--------|-----------|--------------|-------------------|
| `Lexer::next_token` | 15-20% | Main tokenization loop | Core work, hard to optimize |
| `__memcmp` / `str::eq` | 10-15% | Keyword matching | Use perfect hash or trie |
| `String::push` | 5-10% | Building identifier/string tokens | Pre-allocate capacity |
| `Lexer::advance` | 5-10% | Character iteration | Core work |
| `drop_in_place<Token>` | 5-10% | Token cleanup | Reduce token allocations |

### Common Parser Hotspots

| Symbol | Typical % | What it does | Optimization hints |
|--------|-----------|--------------|-------------------|
| `Parser::advance` | 5-10% | Token advancement | Core work |
| `Parser::parse_*` | varies | Parse functions | Reduce backtracking |
| `Rc::new` / `drop<Rc>` | 5-10% | AST node allocation | Use arena allocator |
| `malloc` / `free` | 5-10% | Memory allocation | Reduce allocations |
| `Hash::hash` | 2-5% | String interning | Already using FxHashMap |

### Red Flags in Profiler Output

Watch for these patterns that indicate problems:

1. **`Lexer::restore` > 1%**: The checkpoint/restore mechanism should be O(1). If it shows up, the lexer is recreating iterators inefficiently.

2. **`Peekable::peek` > 5%** in parser: Excessive lookahead or backtracking.

3. **`clone` > 10%**: Too much cloning. Use `Rc` or references.

4. **`CharIndices::next` > 10%** outside lexer: Iterator being recreated instead of reused.

## Hyperfine for Quick Comparisons

For quick A/B comparisons between commits:

```bash
# Install hyperfine
cargo install hyperfine

# Compare two versions
git stash
cargo build --release
cp target/release/tsrun /tmp/baseline

git stash pop
cargo build --release

hyperfine \
    '/tmp/baseline examples/memory-management/gc-no-cycles.ts' \
    './target/release/tsrun examples/memory-management/gc-no-cycles.ts' \
    --warmup 3
```