```
█████╗ ██╗ ██╗██╗ ██╗██████╗ ███████╗
██╔══██╗██║ ██║██║ ██╔╝██╔══██╗██╔════╝
███████║██║ █╗ ██║█████╔╝ ██████╔╝███████╗
██╔══██║██║███╗██║██╔═██╗ ██╔══██╗╚════██║
██║ ██║╚███╔███╔╝██║ ██╗██║ ██║███████║
╚═╝ ╚═╝ ╚══╝╚══╝ ╚═╝ ╚═╝╚═╝ ╚═╝╚══════╝
```
[](https://github.com/MenkeTechnologies/awkrs/actions/workflows/ci.yml)
[](https://crates.io/crates/awkrs)
[](https://crates.io/crates/awkrs)
[](https://docs.rs/awkrs)
[](https://menketechnologies.github.io/awkrs/)
[](https://opensource.org/licenses/MIT)
### `[WORLDS FASTEST AWK BYTECODE ENGINE // PARALLEL RECORD PROCESSOR // RUST CORE]`
> *"Pattern. Action. Domination."*
`awkrs` runs **pattern → action** programs over input records like POSIX `awk` / GNU `gawk` / `mawk`, with a fused-superinstruction bytecode VM (plus a default-on [`fusevm`](https://crates.io/crates/fusevm)/Cranelift offload path for eligible numeric chunks), parallel record processing, and a CLI that accepts the union of POSIX, gawk, and mawk options.
┌──────────────────────────────────────────────────────────────┐
│ STATUS: ONLINE THREAT LEVEL: NEON SIGNAL: ████████░░ │
└──────────────────────────────────────────────────────────────┘
### [`Read the Docs`](https://menketechnologies.github.io/awkrs/) · [`Engineering Report`](https://menketechnologies.github.io/awkrs/report.html) · [`strykelang`](https://github.com/MenkeTechnologies/strykelang) · [`zshrs`](https://github.com/MenkeTechnologies/zshrs) · [`fusevm`](https://github.com/MenkeTechnologies/fusevm)
---
## Table of Contents
- [\[0x00\] System Scan](#0x00-system-scan)
- [\[0x01\] System Requirements](#0x01-system-requirements)
- [\[0x02\] Installation](#0x02-installation)
- [\[0x03\] Language Coverage](#0x03-language-coverage)
- [Compatibility matrix (BSD awk / mawk / gawk)](docs/COMPATIBILITY.md)
- [\[0x04\] Multithreading](#0x04-multithreading--parallel-execution-grid)
- [\[0x05\] Bytecode VM](#0x05-bytecode-vm--execution-core)
- [\[0x06\] Benchmarks](#0x06-benchmarks--combat-metrics-vs-awk--gawk--mawk)
- [\[0x07\] Build](#0x07-build--compile-the-payload)
- [\[0x08\] Test](#0x08-test--integrity-verification)
- [\[0x09\] Documentation](#0x09-documentation--rendered-html--markdown)
- [\[0xFF\] License](#0xff-license)
---
## [0x00] SYSTEM SCAN
**Positioning:** POSIX awk + the gawk extensions that show up in real scripts (`BEGINFILE`/`ENDFILE`, coprocess `|&`, CSV mode, `PROCINFO`/`SYMTAB`/`FUNCTAB`, `@include`/`@load`/`@namespace`, `/inet/tcp|udp`, MPFR via `-M`). Performance goal: beat `awk`/`mawk`/`gawk` on supported workloads — see [§0x06](#0x06-benchmarks--combat-metrics-vs-awk--gawk--mawk).
**Bytecode cache:** `-f script.awk` runs memoize compiled bytecode to `~/.awkrs/scripts.rkyv` — repeat runs skip lex/parse/compile entirely. Source-file mtime and the awkrs-binary mtime invalidate entries silently. `AWKRS_CACHE=0` disables it. Details in [§0x05](#0x05-bytecode-vm--execution-core).
**Implemented gawk-style CLI flags** (where they differ from gawk, the gap is documented):
| Flag | Behavior |
|---|---|
| `-d`/`--dump-variables` | Dump globals after run (stdout, `-`, or file) |
| `-D`/`--debug` | Static rule/function listing — **not** gawk's interactive debugger |
| `-p`/`--profile` | Wall-clock summary + per-record-rule hit counts (`-j 1` only) — **not** gawk's per-line profiler |
| `-o`/`--pretty-print` | AST pretty-print — **not** gawk's canonical reformatter |
| `-g`/`--gen-pot` | Print and exit before execution |
| `-L`/`-t`/`LINT` | Static lint (extension rules, uninit-var hints, `printf` format checks); when **`LINT`** is truthy at runtime, also emit **`awkrs: warning:`** on stderr for `sqrt`/`log` domain issues (negative / zero args) |
| `-S`/`--sandbox` | Block `system()`, file redirects, pipes, coprocesses, inet I/O |
| `-l name` | Load `name.awk` from `AWKPATH` (default `.`) |
| `-b` | Byte length for `length`/`substr`/`index` |
| `-n` | `strtonum`-style hex/octal coercion |
| `-s`/`--no-optimize` | Disable peephole/JIT optimization (forces the plain bytecode interpreter) |
| `-c`/`-P` | Stored on runtime; minimal effect today |
| `-r`/`--re-interval` | Parsed; no runtime effect (regex crate already supports `{m,n}`) |
| `-N`/`--use-lc-numeric` | Locale decimal radix and `%'` grouping in `sprintf`/`printf`/`print`. Does **not** affect string→number parsing |
**Gawk parity gaps to know:**
- **`RS`** — newline by default; one (UTF-8) char = literal delimiter; `RS=""` = paragraph mode; multi-char = gawk regex (`RT` is the matched text). `FIELDWIDTHS` selects fixed-width when non-empty.
- **`PROCINFO`** — refreshed before and after `BEGIN`. Includes gawk-style `platform` (`posix`/`mingw`/`vms`, **not** Rust's `macos`/`linux`), `version`, ids, `errno`, `api_major`/`api_minor`, `argv`, `identifiers`, `FS` (active split mode), `strftime`, `pgrpid`, `groupN`, `mb_cur_max` (Linux `sysconf`), per-input `READ_TIMEOUT`/`RETRY` composite keys with fallback chain → global `PROCINFO["READ_TIMEOUT"]` → `GAWK_READ_TIMEOUT` env. Unix primary record reads `poll` when a timeout applies. With `-M`: `gmp_version`, `mpfr_version`, `prec_min`, `prec_max`. User-set keys persist across the post-`BEGIN` refresh.
- **`PROCINFO["sorted_in"]`** — `@ind_*`/`@val_*` modes, plus user comparator function (2-arg = index sort, 4-arg `(i1, v1, i2, v2)` = value sort). Returns negative/zero/positive like `qsort`.
- **`SYMTAB`** — assignment, `for-in`, `length(SYMTAB)` like gawk's global introspection (not GNU's variable-object references).
- **`@load`** — non-`.awk` paths only accepted for **gawk's bundled extension names** (`filefuncs`, `readdir`, `time`, …) as no-ops; the builtins are native. Arbitrary `.so`/gawkapi modules error at parse time.
- **`-M`/`--bignum`** — MPFR via `rug` (default 256 bits, `PROCINFO["prec"]`/`["roundmode"]` apply). Arithmetic, `sprintf`/`printf` integer formats (no f64/i64 clamp), `int`/`intdiv`/`strtonum`/`++`/`--`, bit ops, transcendentals, `srand` (low 32 bits of previous seed), `CONVFMT`/`OFMT`/`%s`/concat/regex coercion all use MPFR. Default `CONVFMT`-style number→string for scalars uses each `Float`'s own precision for the MPFR `sprintf` path (so raising `PROCINFO["prec"]` is not undermined by a hardcoded bit count at display time). JIT is disabled in `-M` mode.
- **Unicode vs bytes:** `-b` honored for `length`/`substr`/`index`. Full multibyte field-splitting parity is not audited.
#### HELP // SYSTEM INTERFACE
<img src="assets/awkrs-help.png" alt="awkrs -h cyberpunk help (termshot)" width="100%">
---
## [0x01] SYSTEM REQUIREMENTS
- Rust toolchain (`rustc` + `cargo`)
- A C compiler and `make` for `gmp-mpfr-sys` (pulled in by `rug` for `-M`); typical macOS/Linux setups already satisfy this.
---
## [0x02] INSTALLATION
```sh
brew tap MenkeTechnologies/menketech # one-time
brew install awkrs # via Homebrew tap
cargo install awkrs # from crates.io
git clone https://github.com/MenkeTechnologies/awkrs # from source
cd awkrs && cargo build --release
```
[awkrs on Crates.io](https://crates.io/crates/awkrs)
**Zsh completion:**
```sh
fpath=(/path/to/awkrs/completions $fpath)
autoload -Uz compinit && compinit
```
---
## [0x03] LANGUAGE COVERAGE
**Compatibility matrix:** [BSD awk, mawk, and gawk vs awkrs](docs/COMPATIBILITY.md).
┌──────────────────────────────────────────────────────────────┐
│ SUBSYSTEM: LEXER ████ PARSER ████ COMPILER ████ VM ████ │
└──────────────────────────────────────────────────────────────┘
- **Rules:** `BEGIN`, `END`, `BEGINFILE`/`ENDFILE`, empty pattern, `/regex/`, expression patterns, range patterns (`/a/,/b/` or `NR==1,NR==5`). Like gawk, the four special patterns **must** use `{ … }`; record rules may omit braces for the default `{ print $0 }`.
- **Statements:** `if`/`while`/`do…while`/`for` (C-style and `for (i in arr)`), `switch`/`case`/`default` (gawk-style: no fall-through, regex `case /re/`), `print`/`printf` (with `>`, `>>`, `|`, `|&` redirection), `break`, `continue`, `next`, `nextfile`, `exit`, `delete`, `return`, `getline` (primary, `< file`, `<& cmd`, `expr | getline [var]`).
- **`getline` as expression:** value `1` (read), `0` (EOF), `-1` (error), `-2` (gawk retryable I/O when `PROCINFO[input,"RETRY"]` is set).
- **Operators:** arithmetic, comparison, string concat, ternary, `in`, `~`/`!~`, `++`/`--` (prefix/postfix on vars, `$n`, `a[k]`), `^`/`**` (right-associative; unary `+`/`-`/`!` bind looser, so `-2^2` = `-(2^2)`). **Division by zero** (`/`, compound `/=`) is a **fatal error** (gawk-style: `division by zero attempted`), not infinity.
- **Primary `/`:** The lexer may emit `/` as division when `regex_mode` is false (e.g. after `=`). At a **primary** position `/` cannot start division (division is a binary operator), so the parser re-reads it as `/regex/`. In **expression** context a bare `/re/` means **`$0 ~ /re/`** (POSIX), so e.g. `!/foo/`, `if (/foo/)`, and `x = /foo/` use a match against the current record, not a string literal. The **RHS** of `~` / `!~` and regexp arguments to `gsub`/`sub`/`gensub`/`match`/`split`/`patsplit` still treat `/re/` as the **pattern** only (so `b/c` in a replacement string stays division).
- **gawk regexp constants:** `@/pattern/` yields a **regexp** value (`typeof` reports `regexp`); `~` uses the pattern as a regex.
- **Data:** fields, scalars, associative arrays (`a[k]`, `a[i,j]` with `SUBSEP`), `ARGC`/`ARGV` (set before `BEGIN`; `ARGV[0]` is the executable, `ARGV[1..]` are file paths). `FS` (regex when multi-char), `FPAT` (gawk-style: non-empty splits by regex match), `split`/`patsplit` (3rd arg accepts regex; `patsplit` 4-arg form populates `seps`). **POSIX record model:** `NF = n` truncates or extends fields and rebuilds `$0` with `OFS`; `$0 = "…"` re-splits and updates `NF`. **`FS`/`FPAT` from literals:** bytecode may store source `"…"` as an internal literal string, but `cached_fs` sync on read still tracks those assignments. **Whole array in a scalar context** (`print a`, string concat, `printf` args, `~` operands, etc.) is a **fatal runtime error** (gawk-style), not a silent empty string. **Scalars:** uninitialized variables compare like numeric `0` where POSIX expects dual 0/`""`. **String constants vs input:** program string literals are not *numeric strings* for `<`/`<=`/`>`/`>=` the way `$n` can be; arithmetic still uses longest-prefix string→number (`"3.14abc"+0` → `3.14`). **`split("", arr, fs)`** returns `0` (no empty pseudo-field).
- **Records & env:** `RS`/`RT` as documented above. `ENVIRON`, `CONVFMT`, `OFMT`, `FIELDWIDTHS`, `IGNORECASE` (case-insensitive regex + `==`/`!=`/ordering via `strcoll`), `ARGIND`, `ERRNO`, `LINT`, `TEXTDOMAIN`, `BINMODE`. `PROCINFO`/`FUNCTAB`/`SYMTAB` as in [§0x00](#0x00-system-scan).
- **CLI extensions:** `-k`/`--csv` enables CSV mode (RFC-style quoting, `""` escape) — sets `FS`/`FPAT` and uses a dedicated parser aligned with `gawk --csv`.
- **Builtins:** `length`, `index` (empty needle → `1`, matching gawk), `substr` (**gawk rule**, not POSIX: if **`start < 1`**, clamp to **`1`** and leave **`length`** unchanged — POSIX shortens `length` by `1 - start`), `intdiv`, `mkbool`, `split`, `sprintf`/`printf` (flags, `*` and `%n$` positional, gawk `%'`, conversions `%s %d %i %u %o %x %X %f %e %E %g %G %c %%` — `%e`/`%E` use signed two-digit exponents; `%c` uses a string’s first character), `gsub`/`sub`/`match`, `gensub`, `isarray`, `tolower`/`toupper`, `int`, math (`sin` `cos` `atan2` `exp` `log` `sqrt`), `rand`/`srand`, `systime`, `strftime` (0–3 args), `mktime`, `system`, `close`, `fflush`, bit ops (`and` `or` `xor` `lshift` `rshift` `compl`), `strtonum`, `asort`/`asorti`. User-defined `function` with parameter locals.
- **Static checks:** Before bytecode emission, the compiler rejects parenthesized comma lists except in `print`/`printf` arguments and `(… ) in arr` keys (e.g. `(1,2)` alone as a statement is an error). `gsub`/`sub` require at least two arguments; `split`/`match` at least two; `patsplit` two to four; `gensub` three or four — otherwise a clear **runtime**-style error is returned at compile load time. User-defined recursion is capped (256 nested calls in release builds; lower in unit tests) so pathological self-calls fail with an error instead of overflowing the host stack.
- **Expressions:** integer literals use gawk rules in source — `0x`/`0X` hex; leading `0` octal when all digits are `0`–`7` (otherwise decimal, e.g. `01238` → 1238); floats with a `.` use a decimal integer part (`077.5` → 77.5). Multidimensional membership `(i,j) in arr` uses a parenthesized comma list (gawk); it may appear alone as a `print` argument to emit several fields.
- **I/O model:** main record loop and unredirected `getline` share one `BufReader` so line order matches POSIX. `exit` from `BEGIN` or a pattern action still runs `END` rules, then exits with the requested code.
- **Locale & pipes:** Unix string compare/order uses `strcoll` (`LC_COLLATE`/`LC_ALL`). `|&` and `<&` run under `sh -c` (mixing `|` and `|&` on the same command is an error). With `-N`, `LC_NUMERIC` applies to `sprintf`/`printf` floats and `%'` grouping; without `-N`, `%'` still uses `localeconv()`'s thousands separator (fallback `,`). `-N` does **not** affect parsing of numeric strings from input.
- **Gawk extras:** `@include`, `@load "*.awk"`, `@namespace "…"` (default identifier prefixing; built-ins exempt), indirect calls (`@name(…)` / `@(expr)(…)`), `/inet/tcp/…` and `/inet/udp/…` client sockets, gettext builtins (`bindtextdomain`, `dcgettext`, `dcngettext` with `.mo` catalogs via the `gettext` crate), `-M`/`--bignum` MPFR.
---
## [0x04] MULTITHREADING // PARALLEL EXECUTION GRID
```
┌─────────────────────────────────────────────┐
│ WORKER 0 ▓▓ CHUNK 0 ██ REORDER QUEUE │
│ WORKER 1 ▓▓ CHUNK 1 ██ ──────────────>│
│ WORKER 2 ▓▓ CHUNK 2 ██ DETERMINISTIC │
│ WORKER N ▓▓ CHUNK N ██ OUTPUT STREAM │
└─────────────────────────────────────────────┘
```
Default `-j`/`--threads` is **1**. Pass a higher value when the program is **parallel-safe** (static check: no range patterns, no `exit`/`nextfile`/`delete`, no primary `getline`, no pipe/coproc `getline`, no `asort`/`asorti`, no indirect calls, no `print`/`printf` redirection, no cross-record assignments). Records are processed in parallel via **rayon** and output is **reordered to input order** within each batch so pipelines stay deterministic.
**Regular files** are memory-mapped (`memmap2`) and scanned with the same `RS` rules as the sequential path — no `read()` copy of the whole file. **Stdin** parallel chunks up to `--read-ahead` lines (default 1024) per batch, dispatches to workers, emits in order, then refills.
Workers run the same bytecode VM as the sequential path. The compiled program is shared via `Arc<CompiledProgram>` (one compile, cheap refcount per worker) with per-worker runtime state.
**Fallback:** non-parallel-safe programs run sequentially with a warning when `-j > 1`. Programs that use primary `getline` (including in `BEGIN`) also run sequentially for file input. `END` only sees post-`BEGIN` global state — record-rule mutations from parallel workers are not merged.
---
## [0x05] BYTECODE VM // EXECUTION CORE
┌──────────────────────────────────────────────────────────────┐
│ ARCHITECTURE: STACK VM OPTIMIZATION: PEEPHOLE FUSED │
└──────────────────────────────────────────────────────────────┘
awkrs compiles AWK programs into a flat bytecode instruction stream and runs them on a stack VM. Short-circuit `&&`/`||`, control flow, and range patterns resolve to jump-patched offsets at compile time. The string pool interns variable names and string constants for cheap `u32` indexing.
**fusevm offload (on by default):** eligible **numeric** bytecode chunks are lowered to [`fusevm`](https://crates.io/crates/fusevm) — the shared bytecode VM also used by `zshrs` and `strykelang` — and run on `fusevm::VM`. Set `AWKRS_FUSEVM=0` to force the bytecode interpreter for every chunk. `src/fusevm_bridge.rs` translates an eligible chunk (`is_fusevm_eligible`) into a `fusevm::Chunk` (`build_numeric_chunk`), runs it, and writes modified slots back into the awkrs runtime. Each per-record chunk is a **stable** chunk (no baked-in seed preamble); the accumulator's prior value is seeded as *data* into the VM's base frame before `run()`, so the chunk — and therefore its op-hash — is identical across every record, which is what lets the JIT-compiled native code be reused across records and across processes. `int(x)` is admitted into the numeric chunk and lowers to the native `fusevm::Op::AwkInt` (Cranelift `trunc`) — the first AWK builtin reachable end-to-end from awkrs into fusevm's native AWK-op set and JIT. The transcendental math builtins `sin`/`cos`/`exp`/`atan2` are likewise admitted, lowering to native `fusevm::Op::AwkSin`/`AwkCos`/`AwkExp`/`AwkAtan2` — Cranelift libcalls to small Rust helpers that canonicalize a NaN result to `+nan`, matching awkrs/gawk's NaN-sign normalization. The gawk bitwise builtins `and`/`or`/`xor` (variadic, ≥2 args) are admitted too, lowering to native `fusevm::Op::AwkAnd`/`AwkOr`/`AwkXor` (Cranelift `band`/`bor`/`bxor` over a **saturating** `f64`→`i64` conversion that matches awkrs's `num_to_u64` for huge/NaN operands). `sqrt`/`log` stay interpreter-side because they emit a host stderr warning on a negative argument that a pure native op cannot reproduce; `lshift`/`rshift`/`compl` stay interpreter-side because they raise a fatal on a negative argument (a value-dependent trap), unlike the trap-free `and`/`or`/`xor`. **Eager block-JIT compilation for offloaded loops:** a `BEGIN`/`END` loop chunk offloaded via `run_fusevm_region` calls `fusevm::VM::run()` exactly once, so under fusevm's normal warm-up threshold (compile after N invocations) the block JIT would *never* compile it and the whole loop would run on fusevm's slower interpreter. Because `eligible_loop_prefix` only selects a region that contains a backward jump (a genuinely hot loop), the bridge forces the block-JIT threshold to `0` (compile on the first invocation) around that single `run()`, so the loop compiles to native code immediately. Measured: a 20M-iteration `s += sin(i)` loop dropped from ~12.8s (interpreter) to ~0.15s, and an `x = and(x, i) + 1` loop from ~14.0s to ~0.13s — ~85–110× — both bit-for-bit identical to `AWKRS_FUSEVM=0`. Division and modulo **are** offloaded and block-JIT-compiled: a chunk containing `/`, `%`, or compound `/=`/`%=` lowers to the native trapping ops `fusevm::Op::AwkDivJit`/`AwkModJit`, which the block JIT compiles with a **guarded zero-divisor early-exit** (`fcmp eq divisor, 0.0` → a trap libcall that raises the fatal `division by zero attempted` / `…in `%'` for modulo, else `fdiv`/`fmod`). So a hot division loop runs as native code — a 20M-iteration division loop dropped from ~8.0s (interpreter) to ~0.12s, ~68× — while a zero divisor reached inside the compiled loop still raises the POSIX fatal instead of producing `inf`/`NaN` or hanging. The trap libcall is not a registered host-helper id, so `AwkDivJit`/`AwkModJit` chunks JIT in-process only and skip the on-disk cache (no schema impact). fusevm also still defines the interpreter-only trapping `Op::AwkDiv`/`AwkMod` (distinct from its shell-arithmetic `Op::Div`/`Op::Mod` used by `zshrs`/`strykelang`); awkrs emits the `*Jit` variants.
**fusevm persistent JIT cache (`~/.cache/fusevm-jit`):** awkrs builds `fusevm` with the `jit-disk-cache` feature (see [`Cargo.toml`](Cargo.toml)), so when the fusevm path is active fusevm's Cranelift tiers persist native-compiled code to `~/.cache/fusevm-jit` and reload it across processes — the same on-disk JIT cache `zshrs` uses. Two tiers write there: the **block JIT** compiles a fully-eligible per-record numeric chunk to native code (`*.blk.fjit`), and the **tracing JIT** compiles hot in-chunk loop traces (`*.trc.fjit`). This is distinct from awkrs's own `~/.awkrs/scripts.rkyv` *bytecode* cache (below): one caches fusevm-emitted **machine code** keyed by chunk op-hash (and slot-kind hash), the other caches awkrs **bytecode** keyed by source. Override the directory with `FUSEVM_JIT_CACHE_DIR` (`off` disables); cap it with `FUSEVM_JIT_CACHE_MAX_BYTES`. fusevm's block JIT round-trips awk's `f64` slots through its `SlotKind::Float` bit-pattern model (a slot is an `i64` holding the raw `f64` bits; `GetSlot`/`SetSlot` bitcast through `f64`), so an `x = int(x + c)` accumulator now block-JIT-compiles natively and caches a `*.blk.fjit` artifact reused on the next run — verified producing the artifact and matching the bytecode interpreter bit-for-bit. Coverage is still **partial** — the tracing JIT does not yet hot-trace every top-tested `for`/`while` shape, and only the numeric chunk set below is eligible. So the cache engages for the chunks fusevm can compile and is inert for the rest. The offload is **on by default** (`AWKRS_FUSEVM=0` disables it); with the block JIT's `f64`-slot support (below) and eager compilation of offloaded loop regions (above) it is a large win on JIT-compilable numeric loops — measured 14–110× over the bytecode interpreter for pure-arithmetic and builtin (`sin`/`and`/…) accumulator loops — while staying bit-for-bit faithful to the bytecode interpreter, and ineligible chunks run on the interpreter exactly as before. The **default** execution path for ineligible chunks is the bytecode interpreter with peephole-fused superinstructions; `AWKRS_JIT=0` (or `-s`/`--no-optimize`) disables peephole optimization too.
**fusevm bridge per-record caches (in-process):** the persistent on-disk cache above caches *compiled native code* keyed by fusevm-op-hash; the bridge ALSO maintains in-process caches on `Runtime` that catch the upstream work the disk cache can't touch. (1) `fuse_chunk_cache: HashMap<(chunk_ptr, bignum), Option<Arc<(fusevm::Chunk, Vec<u16>)>>>` caches the built `fusevm::Chunk` per awkrs Chunk so `build_numeric_chunk`'s eligibility check + 2-pass op→fusevm translation only runs once per (chunk, bignum), not per record. (2) `fuse_last_chunk_key`/`fuse_last_chunk_value` form a single-slot side-table that hoists the HashMap lookup out of the per-record path entirely for the common single-rule case (one tuple compare + `Arc::clone`, no HashMap touch). (3) `fuse_vm_pool: fusevm::VMPool` recycles `fusevm::VM` instances across records — `VM::reset(chunk)` preserves Vec capacities (stack, frames, slot_buf, globals) so subsequent records reuse the underlying allocations. (4) The cache value's `Vec<u16>` is the precomputed write-slot set — per-record writeback only walks `Op::SetSlot` targets instead of all N runtime slots. (5) Slot seeding goes direct from `ctx.rt.slots` into `vm.set_slot` with no intermediate `Vec<f64>` allocation. Cumulative win on the awkrs JIT path: 11% on `count_gt_5m` over 10M records, 6% on `compound_pred` (release best-of-3). The JIT path is still net-slower than the awkrs interpreter on tight one-op-body micro-benches — the remaining gap is fusevm's per-call VM setup (chunk move into the VM, JIT entry overhead) which would need fusevm-side API changes (Arc-aware `VM::reset`) to cut further.
**Eligible chunks (fusevm offload):** pure-numeric bodies — constants, arithmetic/comparisons, `++`/`--`, compound assignments (`+=`/`-=`/`*=`/`/=`/`%=`/`^=`), division/modulo (`/`/`%`, native trapping `Op::AwkDivJit`/`AwkModJit` with a guarded zero-divisor early-exit), jumps and fused loop tests, scalar slot reads/writes whose values stay numeric, `int(x)` (native `Op::AwkInt`), the transcendentals `sin`/`cos`/`exp`/`atan2` (native `Op::AwkSin`/`AwkCos`/`AwkExp`/`AwkAtan2`, NaN→`+nan`), and the bitwise builtins `and`/`or`/`xor` (native `Op::AwkAnd`/`AwkOr`/`AwkXor`, saturating `f64`→`i64`). When such a chunk is a hot loop offloaded as a region, the block JIT is eager-compiled on the first `run()` (see above) so the native code is used immediately rather than after a warm-up. Chunks that touch strings, fields, arrays, regexes, `getline`, `print`, user calls, or other builtins (including `sqrt`/`log`, which warn on negative args, and `lshift`/`rshift`/`compl`, which raise a fatal on negative args) are ineligible. `-M`/`--bignum` disables the path entirely (MPFR values can't be represented as `f64` slots). Consulted by default (`AWKRS_FUSEVM=0` forces everything onto the bytecode interpreter).
**Peephole fusion** combines common sequences into single opcodes:
- `print $N` → `PrintFieldStdout` (zero-alloc field write)
- `s += $N` → `AddFieldToSlot` (in-place numeric parse)
- `i = i + 1` / `i++` / `++i` → `IncrSlot` (one numeric add, no stack traffic)
- `s += i` between slots → `AddSlotToSlot`
- `$1 "," $2` literal concat → `ConcatPoolStr`
- `NR++` HashMap-path → `IncrVar`
**Inline fast paths** bypass `VmCtx` entirely for single-rule programs with one fused opcode (`{ print $1 }`, `{ s += $1 }`). Memory-mapped files also recognize `{ gsub("lit", "repl"); print }` with literal pattern: when the needle is absent, the loop writes each line from the mapped buffer with `ORS` and skips the VM.
**Bytecode cache:** `-f script.awk` invocations memoize the compiled `CompiledProgram` to `~/.awkrs/scripts.rkyv` — an rkyv-archived shard with `mmap` + zero-copy `ArchivedHashMap` lookup on the read path (`check_archived_root` validation) and `flock`-serialized atomic-rename writes. Each entry's inner `CompiledProgram` blob is bincode (rkyv outer, bincode inner — same architecture as zshrs/stryke). Repeat runs skip lex/parse/compile entirely — only the matched entry's blob is decoded. Entries are invalidated on source-file mtime change or when the running `awkrs` binary is newer than the cached entry (any rebuild silently rebuilds the cache). Disable with `AWKRS_CACHE=0`. The cache only engages for the simple `-f script.awk` form — inline `-e`/`--source`, `-E`, `-i`/`--include`, `-l`/`--load`, `--debug`, `--lint`, `--pretty-print`, and `--gen-pot` skip the cache because they need the AST.
Based on a survey of the major public awk implementations (BWK awk, gawk, mawk, goawk, [frawk](https://github.com/ezrosent/frawk), [zawk](https://github.com/linux-china/zawk)), awkrs appears to be the **first awk implementation to pair a bytecode VM with a persistent on-disk bytecode cache**. frawk is the closest prior art on JIT — it has VM + Cranelift/LLVM JIT — but re-compiles on every invocation; its [overview](https://github.com/ezrosent/frawk/blob/master/info/overview.md) and [README](https://github.com/ezrosent/frawk/blob/master/README.md) contain no mention of disk-persisted compiled artifacts. gawk's [pm-gawk](https://www.gnu.org/software/gawk/manual/pm-gawk/pm-gawk.html) persists script-defined *variables and functions* across runs, not compiled bytecode — different feature. (awkrs's own fusevm/Cranelift offload is on by default for eligible numeric chunks and, with eager block-JIT compilation of offloaded loop regions, is now a large win on those loops — 14–110× over the bytecode interpreter — though it does not engage for the string/field-dominated programs most awk workloads run, so it is not claimed as the headline feature here; note the fusevm offload additionally carries its own separate persistent *machine-code* cache at `~/.cache/fusevm-jit`, distinct from the bytecode cache claimed here.)
| Implementation | Bytecode VM | JIT | Persistent bytecode cache |
|---|---|---|---|
| BWK awk (one-true-awk) | ✗ tree-walker | ✗ | ✗ |
| gawk | ✓ | ✗ | ✗ (`pm-gawk` is for vars) |
| mawk | ✓ | ✗ | ✗ |
| goawk | ✓ | ✗ | ✗ |
| frawk | ✓ | ✓ Cranelift + LLVM | ✗ |
| zawk (frawk fork) | ✓ | ✓ Cranelift + LLVM | ✗ |
| **awkrs** | **✓** | ◐ fusevm/Cranelift (on by default; block + tracing tiers, `~/.cache/fusevm-jit`) | **✓** |
**Raw byte field extraction:** `print $N` with default `FS` scans raw bytes in the mapped file buffer to find the Nth whitespace field, writes it to the output buffer, and appends `Runtime::ors_bytes` — no record copy, no UTF-8 validation.
**Other optimizations:**
- **Indexed slots:** scalars get `u16` slot indices; reads/writes are flat-array indexing instead of `HashMap` lookups (specials like `NR`/`FS`/`OFS` and array names stay on the HashMap path).
- **Zero-copy fields:** fields stored as `(u32, u32)` byte ranges into the record string; owned `String`s only on `set_field`.
- **Direct-to-buffer print:** stdout writes go straight into a 64 KB `Vec<u8>` (flushed at file boundaries) — no per-record `String`, `format!()`, or stdout locking.
- **Cached separators:** `OFS`/`ORS` bytes cached on the runtime, updated only on assignment. The direct-to-buffer stdout `print` path uses the full `ofs_bytes`/`ors_bytes` slices (arbitrary length; not capped at 64 bytes).
- **Byte-level input:** `read_until(b'\n')` into a reusable `Vec<u8>` skips per-line UTF-8 validation.
- **Regex cache:** compiled `Regex` objects cached in a `HashMap<String, Regex>`.
- **`sub`/`gsub`:** when target is `$0`, applies the new record in one step. Literal needles reuse a cached `memmem::Finder`. Constant string operands pass via `Cow` (no per-call alloc).
- **`parse_number`:** fast-paths plain decimal integer field text before falling back to `str::parse::<f64>()`.
- **Slurped input:** newline scanning uses `memchr`.
- **Parallel:** compiled program shared via `Arc` across rayon workers (zero-copy).
---
## [0x06] BENCHMARKS // COMBAT METRICS (vs awk / gawk / mawk)
┌──────────────────────────────────────────────────────────────┐
│ HARDWARE: APPLE M5 MAX OS: macOS ARCH: arm64 │
└──────────────────────────────────────────────────────────────┘
Measured with [hyperfine](https://github.com/sharkdp/hyperfine). BSD awk (`/usr/bin/awk`), GNU gawk 5.4.0, mawk 1.3.4, awkrs (see [`Cargo.toml`](Cargo.toml) for current version). **Relative** = mean ÷ fastest mean in that table. awkrs has two rows: default (JIT attempted) vs `AWKRS_JIT=0` (bytecode only). Each table is one `hyperfine` invocation across all five commands on the same 1 M-line input, generated 2026-04-10 UTC by `./scripts/benchmark-vs-awk.sh` and copied verbatim from [`benchmarks/benchmark-results.md`](benchmarks/benchmark-results.md). For the awkrs-only JIT-vs-bytecode A/B see [`benchmarks/benchmark-readme-jit.md`](benchmarks/benchmark-readme-jit.md).
> **Caveat (2026-06-01):** the `awkrs (JIT)` rows below were generated against awkrs's *former* in-tree Cranelift module (`src/jit.rs`), which has since been removed; JIT now means the `fusevm`/Cranelift offload (on by default; `AWKRS_FUSEVM=0` disables it), which does not engage for these string/field programs anyway (they are ineligible). The default path for such programs is the fused-superinstruction bytecode interpreter — effectively the `awkrs (bytecode)` row. Re-run `./scripts/benchmark-vs-awk.sh` to regenerate.
### 1. Throughput: `{ print $1 }` over 1 M lines
| Command | Mean | Min | Max | Relative |
|:---|---:|---:|---:|---:|
| BSD awk | 195.0 ms | 179.8 ms | 221.6 ms | 12.43× |
| gawk | 100.8 ms | 92.8 ms | 115.8 ms | 6.42× |
| mawk | 66.2 ms | 61.9 ms | 78.4 ms | 4.22× |
| awkrs (JIT) | 15.7 ms | 13.3 ms | 19.6 ms | **1.00×** |
| awkrs (bytecode) | 16.1 ms | 13.1 ms | 20.2 ms | 1.03× |
### 2. CPU-bound BEGIN (no input)
`BEGIN { s = 0; for (i = 1; i < 400001; i = i + 1) s += i; print s }`
| Command | Mean | Min | Max | Relative |
|:---|---:|---:|---:|---:|
| BSD awk | 15.8 ms | 14.0 ms | 18.6 ms | 1.71× |
| gawk | 20.7 ms | 18.8 ms | 22.9 ms | 2.24× |
| mawk | 9.7 ms | 8.3 ms | 11.4 ms | 1.06× |
| awkrs (JIT) | 9.2 ms | 8.4 ms | 12.0 ms | **1.00×** |
| awkrs (bytecode) | 9.6 ms | 8.2 ms | 12.0 ms | 1.04× |
### 3. Sum first column (`{ s += $1 } END { print s }`, 1 M lines)
Cross-record state is not parallel-safe, so awkrs stays single-threaded here. On regular-file input, awkrs uses a **raw byte** path: parses the Nth whitespace field directly from the mmap'd buffer.
| Command | Mean | Min | Max | Relative |
|:---|---:|---:|---:|---:|
| BSD awk | 158.5 ms | 147.0 ms | 172.7 ms | 12.27× |
| gawk | 62.9 ms | 58.4 ms | 68.9 ms | 4.87× |
| mawk | 37.5 ms | 33.7 ms | 39.9 ms | 2.90× |
| awkrs (JIT) | 13.0 ms | 11.9 ms | 15.4 ms | 1.01× |
| awkrs (bytecode) | 12.9 ms | 11.5 ms | 16.1 ms | **1.00×** |
### 4. Multi-field print (`{ print $1, $3, $5 }`, 1 M lines, 5 fields/line)
| Command | Mean | Min | Max | Relative |
|:---|---:|---:|---:|---:|
| BSD awk | 647.6 ms | 623.5 ms | 686.3 ms | 11.60× |
| gawk | 266.1 ms | 257.4 ms | 301.8 ms | 4.77× |
| mawk | 156.6 ms | 149.8 ms | 170.7 ms | 2.81× |
| awkrs (JIT) | 56.4 ms | 53.1 ms | 61.8 ms | 1.01× |
| awkrs (bytecode) | 55.8 ms | 53.4 ms | 61.6 ms | **1.00×** |
### 5. Regex filter (`/alpha/ { c += 1 } END { print c }`, 1 M lines, no matches)
| Command | Mean | Min | Max | Relative |
|:---|---:|---:|---:|---:|
| BSD awk | 191.8 ms | 180.1 ms | 208.9 ms | 17.31× |
| gawk | 351.4 ms | 342.7 ms | 363.3 ms | 31.72× |
| mawk | 19.3 ms | 17.5 ms | 21.8 ms | 1.74× |
| awkrs (JIT) | 11.1 ms | 9.5 ms | 13.5 ms | **1.00×** |
| awkrs (bytecode) | 11.1 ms | 9.5 ms | 14.6 ms | 1.00× |
### 6. Associative array (`{ a[$5] += 1 } END { for (k in a) print k, a[k] }`, 1 M lines)
| Command | Mean | Min | Max | Relative |
|:---|---:|---:|---:|---:|
| BSD awk | 826.2 ms | 792.2 ms | 896.0 ms | 2.43× |
| gawk | 342.4 ms | 330.6 ms | 362.5 ms | 1.01× |
| mawk | 610.0 ms | 588.9 ms | 648.7 ms | 1.79× |
| awkrs (JIT) | 340.0 ms | 324.2 ms | 377.7 ms | **1.00×** |
| awkrs (bytecode) | 343.7 ms | 323.5 ms | 356.7 ms | 1.01× |
### 7. Conditional field (`NR % 2 == 0 { print $2 }`, 1 M lines, 2 fields/line)
| Command | Mean | Min | Max | Relative |
|:---|---:|---:|---:|---:|
| BSD awk | 289.1 ms | 263.1 ms | 321.1 ms | 9.58× |
| gawk | 116.1 ms | 111.0 ms | 124.4 ms | 3.85× |
| mawk | 71.1 ms | 66.9 ms | 83.6 ms | 2.36× |
| awkrs (JIT) | 30.2 ms | 28.1 ms | 34.0 ms | **1.00×** |
| awkrs (bytecode) | 30.7 ms | 28.0 ms | 35.5 ms | 1.02× |
### 8. Field computation (`{ sum += $1 * $2 } END { print sum }`, 1 M lines, 2 fields/line)
On regular-file input with default FS, awkrs extracts both fields in a single byte scan and parses them as numbers directly from the mmap'd buffer.
| Command | Mean | Min | Max | Relative |
|:---|---:|---:|---:|---:|
| BSD awk | 261.8 ms | 251.4 ms | 280.8 ms | 13.96× |
| gawk | 100.5 ms | 95.3 ms | 109.5 ms | 5.36× |
| mawk | 57.7 ms | 54.5 ms | 61.1 ms | 3.08× |
| awkrs (JIT) | 19.0 ms | 17.6 ms | 23.0 ms | 1.01× |
| awkrs (bytecode) | 18.8 ms | 17.5 ms | 22.8 ms | **1.00×** |
### 9. String concat print (`{ print $3 "-" $5 }`, 1 M lines, 5 fields/line)
| Command | Mean | Min | Max | Relative |
|:---|---:|---:|---:|---:|
| BSD awk | 640.8 ms | 611.9 ms | 689.3 ms | 12.68× |
| gawk | 182.2 ms | 168.1 ms | 197.2 ms | 3.61× |
| mawk | 121.0 ms | 113.6 ms | 128.1 ms | 2.39× |
| awkrs (JIT) | 51.0 ms | 49.2 ms | 53.8 ms | 1.01× |
| awkrs (bytecode) | 50.5 ms | 48.8 ms | 54.8 ms | **1.00×** |
### 10. gsub (`{ gsub("alpha", "ALPHA"); print }`, 1 M lines, no matches)
Lines do not contain `alpha`, so this measures **no-match** `gsub` plus `print`. On regular-file input, awkrs uses a **slurp inline** path: byte `memmem` scan + `print` with no VM or per-line `set_field_sep_split` when the literal is absent.
| Command | Mean | Min | Max | Relative |
|:---|---:|---:|---:|---:|
| BSD awk | 291.5 ms | 282.3 ms | 300.4 ms | 21.15× |
| gawk | 436.3 ms | 425.7 ms | 459.3 ms | 31.66× |
| mawk | 74.3 ms | 68.8 ms | 84.2 ms | 5.39× |
| awkrs (JIT) | 13.8 ms | 12.8 ms | 16.2 ms | **1.00×** |
| awkrs (bytecode) | 13.9 ms | 12.7 ms | 17.6 ms | 1.01× |
```bash
./scripts/benchmark-vs-awk.sh # cross-engine §1–§10 (1 M lines)
AWKRS_BENCH_LINES=5000000 ./scripts/benchmark-vs-awk.sh # 5 M line sweep
./scripts/benchmark-readme-jit-vs-vm.sh # awkrs-only JIT vs bytecode A/B
```
### Demo scripts
Quick tours over the feature surface — runnable against the debug build (`cargo build`); each script auto-builds if the binary is missing.
```bash
./scripts/demo-quickstart.sh # 16-section feature tour (CSV, FIELDWIDTHS, FPAT, RS regex,
# gensub, match() capture, pipes, getline, parallel, JIT toggle)
./scripts/demo-log-parsing.sh # applied: access log → status dist, CSV → per-host stats,
# ps → top RSS, /etc/passwd → shell histogram
LINES=200000 ./scripts/demo-parallel-jit.sh # /usr/bin/time -p sweep of -j 1 vs -j N and
# JIT default vs AWKRS_JIT=0 bytecode
```
`NO_COLOR=1` strips ANSI from the demo output. The applied demo writes its inputs to `$TMPDIR` and cleans up on exit.
### Deep examples (`examples/*.awk`)
Substantive standalone awk programs that exercise recursion, multidim arrays via `SUBSEP`, `PROCINFO["sorted_in"]`, three-arg `match()`, `asort`, and bit ops. Each `<name>.awk` ships with `<name>.in` (stdin input). Every example is byte-for-byte verified against gawk by the parity job in CI (`bash parity/run_parity.sh gawk`), so they double as gawk-extension regression tests.
| File | What it shows |
| --- | --- |
| `bst.awk` | BST insert + inorder / preorder / postorder traversals via recursion |
| `heap_sort.awk` | Min-heap push / pop → heapsort |
| `trie.awk` | Trie membership + prefix-count using `SUBSEP` two-level keys |
| `levenshtein.awk` | O(la·lb) edit-distance DP table held in a `SUBSEP` 2D array |
| `calc_rd.awk` | Recursive-descent arithmetic parser (`+ - * / % ^` unary `( )`, right-assoc `^`) |
| `topo_sort.awk` | Kahn's algorithm topological sort + cycle detection |
| `brainfuck.awk` | Brainfuck interpreter (precomputed bracket map, modulo-256 cells) |
| `rpn.awk` | Postfix calculator (dup / swap / drop / neg + arith) on an explicit stack |
| `json_pretty.awk` | JSON tokeniser → 2-space indented pretty-printer |
| `hexdump.awk` | `xxd`-style hex + ASCII dump (16-byte rows with offset, hex, gutter, ascii) |
| `csv_pivot.awk` | CSV → per-group min / max / mean / total aggregation |
| `graph_bfs.awk` | Undirected BFS from a source + path reconstruction |
| `markov.awk` | Bigram model → top-3 continuations + deterministic 12-step walk |
| `sql_like.awk` | Mini-SQL on CSV: `SELECT … WHERE … GROUP BY … SUM/AVG/COUNT … ORDER BY …` |
| `sudoku.awk` | 9×9 Sudoku solver via recursive backtracking (row/col/box witness sets) |
| `regex_engine.awk` | Recursive regex matcher (`. * + ? ^ $ [...] \`) written *in* awk |
| `dijkstra.awk` | Single-source shortest paths with a binary min-heap PQ |
| `kruskal.awk` | MST via union-find (path halving + union-by-rank) |
| `maze_bfs.awk` | BFS shortest path through an ASCII maze; path overlaid with `*` |
| `diff_lcs.awk` | LCS-based unified diff with recursive traceback |
| `n_queens.awk` | N-queens backtracking with col / diag1 / diag2 constant-time witnesses |
| `kmp.awk` | Knuth-Morris-Pratt substring search (failure function + scan) |
| `intervals.awk` | Merge overlapping closed intervals; report total covered length |
| `roman.awk` | Roman numerals ↔ integers, subtractive form, range 1..3999 |
| `knapsack.awk` | 0/1 knapsack DP table + traceback to recover chosen items |
| `prime_sieve.awk` | Sieve of Eratosthenes up to N + ten-per-row pretty printing |
| `floyd_warshall.awk` | All-pairs shortest paths (negative weights allowed) |
| `bellman_ford.awk` | Single-source shortest paths + negative-cycle detection |
| `scc_tarjan.awk` | Tarjan's strongly connected components (recursion + lowlink) |
| `conway.awk` | Conway's Game of Life — fixed-grid evolution for N generations |
| `a_star.awk` | A* on an ASCII grid (Manhattan heuristic, tuple-keyed min-heap PQ) |
| `base64.awk` | RFC-4648 base64 encode + decode (no external tools) |
| `base_conv.awk` | Integer base conversion 2..36 in either direction |
| `rle.awk` | Run-length encode + decode (whitespace preserved) |
| `vigenere.awk` | Vigenère cipher (encrypt + decrypt; case preserved) |
| `bigint_mul.awk` | Arbitrary-precision multiplication via schoolbook on digit arrays |
| `lru_cache.awk` | LRU cache: O(1) get + put via doubly-linked list + hash map |
| `segment_tree.awk` | Iterative segment tree (point update + range-sum query) |
| `fenwick.awk` | Fenwick / Binary Indexed Tree (prefix + range sums) |
| `convex_hull.awk` | Convex hull via Andrew's monotone chain + shoelace area |
| `lis.awk` | Longest increasing subsequence in O(n log n) with traceback |
| `huffman.awk` | Huffman coding — build prefix tree, encode + round-trip decode |
| `manacher.awk` | Manacher's longest palindromic substring in O(n) |
| `subset_sum.awk` | Subset-sum DP + reconstruction of one valid subset |
| `permutations.awk` | Heap's algorithm — enumerate all n! permutations |
| `tsp_dp.awk` | Held-Karp bitmask DP for travelling salesman (≤15 cities) |
| `anagrams.awk` | Group anagrams by sorted-letter signature |
| `rule30.awk` | Wolfram Rule 30 elementary 1D cellular automaton |
| `aho_corasick.awk` | Aho-Corasick multi-pattern search (goto/fail/output trie) |
| `z_function.awk` | Z-array — linear-time prefix-match table + pattern search |
| `rabin_karp.awk` | Rolling-hash substring search (Rabin-Karp) |
| `shunting_yard.awk` | Dijkstra's shunting-yard infix → postfix → evaluate |
| `modexp.awk` | Modular exponentiation + deterministic Miller-Rabin primality |
| `gcd_extended.awk` | Extended Euclidean algorithm + modular inverse |
| `mandelbrot.awk` | ASCII Mandelbrot escape-time render |
| `ini_parser.awk` | INI config parser with sections / comments / globals |
| `url_parser.awk` | URL decomposer (scheme / user / pass / host / port / path / query / fragment) |
| `tictactoe.awk` | Minimax tic-tac-toe solver — best move + outcome from any position |
| `coin_change.awk` | Min-coins DP + reconstruction of one optimal combination |
| `prim_mst.awk` | Prim's MST via lazy linear frontier scan |
| `boyer_moore.awk` | Boyer-Moore substring search (bad-character heuristic) |
| `suffix_array.awk` | Suffix array + LCP (lex-sort-based) for each input line |
| `avl_tree.awk` | AVL self-balancing BST with insert, inorder, height, balance-check |
| `quickselect.awk` | Kth smallest element via Hoare partitioning |
| `horner.awk` | Horner's method: evaluate, derivative, synthetic division |
| `pollard_rho.awk` | Pollard's rho integer factorization + Miller-Rabin |
| `lzw.awk` | LZW compression encode + decode (256-byte dictionary start) |
| `markdown_basic.awk` | Markdown → HTML for a subset (headings, lists, code, links) |
| `date_calc.awk` | Day-of-week (Zeller), calendar generator, date difference |
| `gauss_elim.awk` | Gaussian elimination with partial pivoting (Ax = b) |
| `twenty48.awk` | 2048 board: apply L/R/U/D moves, merge tiles, track score |
| `email_extract.awk` | Find emails in free text + sorted unique tally |
Run any example directly:
```bash
./target/release/awkrs -f examples/calc_rd.awk <examples/calc_rd.in
./target/release/awkrs -f examples/brainfuck.awk <examples/brainfuck.in
```
---
## [0x07] BUILD // COMPILE THE PAYLOAD
```bash
cargo build --release
```
`awkrs --help` / `-h` prints a cyberpunk HUD (ASCII banner, status box, taglines, footer) in the style of MenkeTechnologies `tp -h`. ANSI colors apply when stdout is a TTY; set `NO_COLOR` to force plain text.
Regenerate the help screenshot after UI changes: `./scripts/gen-help-screenshot.sh` (needs [termshot](https://github.com/homeport/termshot) on `PATH` and a prior `cargo build`). The capture runs on a PTY with `NO_COLOR` unset and renders at 256 columns.
---
## [0x08] TEST // INTEGRITY VERIFICATION
```bash
cargo test
```
CI runs on pushes and pull requests to `main` via [GitHub Actions](.github/workflows/ci.yml): one Ubuntu lint job (`cargo fmt --check`, `cargo clippy -D warnings`, `cargo doc` with `RUSTDOCFLAGS=-D warnings`) plus a build/test matrix on Ubuntu and macOS.
Coverage spans library unit tests for every module (lexer, parser, format, builtins, interp, vm, jit, compiler, runtime, locale, cli, cyber_help) and integration suites under `tests/` that exercise the gawk-style additions, the slurped-input path, parallel record behavior, and the full CLI surface. Cross-feature combinations (CSV + ENDFILE, paragraph `RS=""` + getline, `FIELDWIDTHS` + NF reassignment, ...) live in [`tests/cross_feature_integration.rs`](tests/cross_feature_integration.rs).
---
## [0x09] DOCUMENTATION // RENDERED HTML + MARKDOWN
`docs/` is published to GitHub Pages on every push to `main` and is the authoritative source for the rendered reference + engineering report.
| Doc | Source | Live URL |
|---|---|---|
| User reference (quickstart, builtins, variables, examples, cache + parallel notes) | [`docs/index.html`](docs/index.html) | <https://menketechnologies.github.io/awkrs/> |
| Engineering report (architecture, module table, perf stack, divergence ledger, competitive matrix) | [`docs/report.html`](docs/report.html) | <https://menketechnologies.github.io/awkrs/report.html> |
| Compatibility matrix vs BSD awk / mawk / gawk | [`docs/COMPATIBILITY.md`](docs/COMPATIBILITY.md) | renders on GitHub |
| Benchmarks vs BSD awk / mawk / gawk (hyperfine, 1 M lines) | [`benchmarks/benchmark-results.md`](benchmarks/benchmark-results.md) | renders on GitHub |
| JIT-on vs JIT-off A/B (awkrs-only) | [`benchmarks/benchmark-readme-jit.md`](benchmarks/benchmark-readme-jit.md) | renders on GitHub |
| Rust API docs (autogenerated) | `cargo doc --open` | <https://docs.rs/awkrs> |
The HUD-themed HTML docs (`docs/index.html`, `docs/report.html`) share `hud-static.css`, `hud-theme.js`, and `tutorial.css` — open them locally via `file://` or browse the GitHub Pages URL above.
---
## [0xFF] LICENSE
┌──────────────────────────────────────────────────────────────┐
│ MIT LICENSE // UNAUTHORIZED REPRODUCTION WILL BE MET │
│ WITH FULL ICE │
└──────────────────────────────────────────────────────────────┘
---
```
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
░░ >>> JACK IN. MATCH THE PATTERN. EXECUTE THE ACTION. <<< ░░
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
```
##### created by [MenkeTechnologies](https://github.com/MenkeTechnologies)