awkrs 0.4.14

  █████╗ ██╗    ██╗██╗  ██╗██████╗ ███████╗
 ██╔══██╗██║    ██║██║ ██╔╝██╔══██╗██╔════╝
 ███████║██║ █╗ ██║█████╔╝ ██████╔╝███████╗
 ██╔══██║██║███╗██║██╔═██╗ ██╔══██╗╚════██║
 ██║  ██║╚███╔███╔╝██║  ██╗██║  ██║███████║
 ╚═╝  ╚═╝ ╚══╝╚══╝ ╚═╝  ╚═╝╚═╝  ╚═╝╚══════╝

`[WORLDS FASTEST AWK BYTECODE ENGINE // PARALLEL RECORD PROCESSOR // RUST CORE]`

"Pattern. Action. Domination."

awkrs runs pattern → action programs over input records like POSIX awk / GNU gawk / mawk, with a fused-superinstruction bytecode VM (plus a default-on fusevm/Cranelift offload path for eligible numeric chunks), parallel record processing, and a CLI that accepts the union of POSIX, gawk, and mawk options.

┌──────────────────────────────────────────────────────────────┐ │ STATUS: ONLINE THREAT LEVEL: NEON SIGNAL: ████████░░ │ └──────────────────────────────────────────────────────────────┘

`Read the Docs` · `Engineering Report` · `strykelang` · `zshrs` · `fusevm`

[0x00] System Scan
[0x01] System Requirements
[0x02] Installation
[0x03] Language Coverage
- Compatibility matrix (BSD awk / mawk / gawk)
[0x04] Multithreading
[0x05] Bytecode VM
[0x06] Benchmarks
[0x07] Build
[0x08] Test
[0x09] Documentation
[0xFF] License

[0x00] SYSTEM SCAN

Positioning: POSIX awk + the gawk extensions that show up in real scripts (BEGINFILE/ENDFILE, coprocess |&, CSV mode, PROCINFO/SYMTAB/FUNCTAB, @include/@load/@namespace, /inet/tcp|udp, MPFR via -M). Performance goal: beat awk/mawk/gawk on supported workloads — see §0x06.

Bytecode cache: -f script.awk runs memoize compiled bytecode to ~/.awkrs/scripts.rkyv — repeat runs skip lex/parse/compile entirely. Source-file mtime and the awkrs-binary mtime invalidate entries silently. AWKRS_CACHE=0 disables it. Details in §0x05.

Implemented gawk-style CLI flags (where they differ from gawk, the gap is documented):

Flag	Behavior
`-d`/`--dump-variables`	Dump globals after run (stdout, `-`, or file)
`-D`/`--debug`	Static rule/function listing — not gawk's interactive debugger
`-p`/`--profile`	Wall-clock summary + per-record-rule hit counts (`-j 1` only) — not gawk's per-line profiler
`-o`/`--pretty-print`	AST pretty-print — not gawk's canonical reformatter
`-g`/`--gen-pot`	Print and exit before execution
`-L`/`-t`/`LINT`	Static lint (extension rules, uninit-var hints, `printf` format checks); when `LINT` is truthy at runtime, also emit `awkrs: warning:` on stderr for `sqrt`/`log` domain issues (negative / zero args)
`-S`/`--sandbox`	Block `system()`, file redirects, pipes, coprocesses, inet I/O
`-l name`	Load `name.awk` from `AWKPATH` (default `.`)
`-b`	Byte length for `length`/`substr`/`index`
`-n`	`strtonum`-style hex/octal coercion
`-s`/`--no-optimize`	Disable peephole/JIT optimization (forces the plain bytecode interpreter)
`-c`/`-P`	Stored on runtime; minimal effect today
`-r`/`--re-interval`	Parsed; no runtime effect (regex crate already supports `{m,n}`)
`-N`/`--use-lc-numeric`	Locale decimal radix and `%'` grouping in `sprintf`/`printf`/`print`. Does not affect string→number parsing

Gawk parity gaps to know:

RS — newline by default; one (UTF-8) char = literal delimiter; RS="" = paragraph mode; multi-char = gawk regex (RT is the matched text). FIELDWIDTHS selects fixed-width when non-empty.
PROCINFO — refreshed before and after BEGIN. Includes gawk-style platform (posix/mingw/vms, not Rust's macos/linux), version, ids, errno, api_major/api_minor, argv, identifiers, FS (active split mode), strftime, pgrpid, groupN, mb_cur_max (Linux sysconf), per-input READ_TIMEOUT/RETRY composite keys with fallback chain → global PROCINFO["READ_TIMEOUT"] → GAWK_READ_TIMEOUT env. Unix primary record reads poll when a timeout applies. With -M: gmp_version, mpfr_version, prec_min, prec_max. User-set keys persist across the post-BEGIN refresh.
PROCINFO["sorted_in"] — @ind_*/@val_* modes, plus user comparator function (2-arg = index sort, 4-arg (i1, v1, i2, v2) = value sort). Returns negative/zero/positive like qsort.
SYMTAB — assignment, for-in, length(SYMTAB) like gawk's global introspection (not GNU's variable-object references).
@load — non-.awk paths only accepted for gawk's bundled extension names (filefuncs, readdir, time, …) as no-ops; the builtins are native. Arbitrary .so/gawkapi modules error at parse time.
-M/--bignum — MPFR via rug (default 256 bits, PROCINFO["prec"]/["roundmode"] apply). Arithmetic, sprintf/printf integer formats (no f64/i64 clamp), int/intdiv/strtonum/++/--, bit ops, transcendentals, srand (low 32 bits of previous seed), CONVFMT/OFMT/%s/concat/regex coercion all use MPFR. Default CONVFMT-style number→string for scalars uses each Float's own precision for the MPFR sprintf path (so raising PROCINFO["prec"] is not undermined by a hardcoded bit count at display time). JIT is disabled in -M mode.
Unicode vs bytes: -b honored for length/substr/index. Full multibyte field-splitting parity is not audited.

HELP // SYSTEM INTERFACE

[0x01] SYSTEM REQUIREMENTS

Rust toolchain (rustc + cargo)
A C compiler and make for gmp-mpfr-sys (pulled in by rug for -M); typical macOS/Linux setups already satisfy this.

[0x02] INSTALLATION

brew tap MenkeTechnologies/menketech                   # one-time
brew install awkrs                                     # via Homebrew tap

cargo install awkrs                                    # from crates.io

git clone https://github.com/MenkeTechnologies/awkrs   # from source
cd awkrs && cargo build --release

awkrs on Crates.io

Zsh completion:

fpath=(/path/to/awkrs/completions $fpath)
autoload -Uz compinit && compinit

[0x03] LANGUAGE COVERAGE

Compatibility matrix: BSD awk, mawk, and gawk vs awkrs.

┌──────────────────────────────────────────────────────────────┐ │ SUBSYSTEM: LEXER ████ PARSER ████ COMPILER ████ VM ████ │ └──────────────────────────────────────────────────────────────┘

Rules: BEGIN, END, BEGINFILE/ENDFILE, empty pattern, /regex/, expression patterns, range patterns (/a/,/b/ or NR==1,NR==5). Like gawk, the four special patterns must use { … }; record rules may omit braces for the default { print $0 }.
Statements: if/while/do…while/for (C-style and for (i in arr)), switch/case/default (gawk-style: no fall-through, regex case /re/), print/printf (with >, >>, |, |& redirection), break, continue, next, nextfile, exit, delete, return, getline (primary, < file, <& cmd, expr | getline [var]).
getline as expression: value 1 (read), 0 (EOF), -1 (error), -2 (gawk retryable I/O when PROCINFO[input,"RETRY"] is set).
Operators: arithmetic, comparison, string concat, ternary, in, ~/!~, ++/-- (prefix/postfix on vars, $n, a[k]), ^/** (right-associative; unary +/-/! bind looser, so -2^2 = -(2^2)). Division by zero (/, compound /=) is a fatal error (gawk-style: division by zero attempted), not infinity.
Primary /: The lexer may emit / as division when regex_mode is false (e.g. after =). At a primary position / cannot start division (division is a binary operator), so the parser re-reads it as /regex/. In expression context a bare /re/ means $0 ~ /re/ (POSIX), so e.g. !/foo/, if (/foo/), and x = /foo/ use a match against the current record, not a string literal. The RHS of ~ / !~ and regexp arguments to gsub/sub/gensub/match/split/patsplit still treat /re/ as the pattern only (so b/c in a replacement string stays division).
gawk regexp constants: @/pattern/ yields a regexp value (typeof reports regexp); ~ uses the pattern as a regex.
Data: fields, scalars, associative arrays (a[k], a[i,j] with SUBSEP), ARGC/ARGV (set before BEGIN; ARGV[0] is the executable, ARGV[1..] are file paths). FS (regex when multi-char), FPAT (gawk-style: non-empty splits by regex match), split/patsplit (3rd arg accepts regex; patsplit 4-arg form populates seps). POSIX record model: NF = n truncates or extends fields and rebuilds $0 with OFS; $0 = "…" re-splits and updates NF. FS/FPAT from literals: bytecode may store source "…" as an internal literal string, but cached_fs sync on read still tracks those assignments. Whole array in a scalar context (print a, string concat, printf args, ~ operands, etc.) is a fatal runtime error (gawk-style), not a silent empty string. Scalars: uninitialized variables compare like numeric 0 where POSIX expects dual 0/"". String constants vs input: program string literals are not numeric strings for </<=/>/>= the way $n can be; arithmetic still uses longest-prefix string→number ("3.14abc"+0 → 3.14). split("", arr, fs) returns 0 (no empty pseudo-field).
Records & env: RS/RT as documented above. ENVIRON, CONVFMT, OFMT, FIELDWIDTHS, IGNORECASE (case-insensitive regex + ==/!=/ordering via strcoll), ARGIND, ERRNO, LINT, TEXTDOMAIN, BINMODE. PROCINFO/FUNCTAB/SYMTAB as in §0x00.
CLI extensions: -k/--csv enables CSV mode (RFC-style quoting, "" escape) — sets FS/FPAT and uses a dedicated parser aligned with gawk --csv.
Builtins: length, index (empty needle → 1, matching gawk), substr (gawk rule, not POSIX: if start < 1, clamp to 1 and leave length unchanged — POSIX shortens length by 1 - start), intdiv, mkbool, split, sprintf/printf (flags, * and %n$ positional, gawk %', conversions %s %d %i %u %o %x %X %f %e %E %g %G %c %% — %e/%E use signed two-digit exponents; %c uses a string’s first character), gsub/sub/match, gensub, isarray, tolower/toupper, int, math (sin cos atan2 exp log sqrt), rand/srand, systime, strftime (0–3 args), mktime, system, close, fflush, bit ops (and or xor lshift rshift compl), strtonum, asort/asorti. User-defined function with parameter locals.
Static checks: Before bytecode emission, the compiler rejects parenthesized comma lists except in print/printf arguments and (… ) in arr keys (e.g. (1,2) alone as a statement is an error). gsub/sub require at least two arguments; split/match at least two; patsplit two to four; gensub three or four — otherwise a clear runtime-style error is returned at compile load time. User-defined recursion is capped (256 nested calls in release builds; lower in unit tests) so pathological self-calls fail with an error instead of overflowing the host stack.
Expressions: integer literals use gawk rules in source — 0x/0X hex; leading 0 octal when all digits are 0–7 (otherwise decimal, e.g. 01238 → 1238); floats with a . use a decimal integer part (077.5 → 77.5). Multidimensional membership (i,j) in arr uses a parenthesized comma list (gawk); it may appear alone as a print argument to emit several fields.
I/O model: main record loop and unredirected getline share one BufReader so line order matches POSIX. exit from BEGIN or a pattern action still runs END rules, then exits with the requested code.
Locale & pipes: Unix string compare/order uses strcoll (LC_COLLATE/LC_ALL). |& and <& run under sh -c (mixing | and |& on the same command is an error). With -N, LC_NUMERIC applies to sprintf/printf floats and %' grouping; without -N, %' still uses localeconv()'s thousands separator (fallback ,). -N does not affect parsing of numeric strings from input.
Gawk extras: @include, @load "*.awk", @namespace "…" (default identifier prefixing; built-ins exempt), indirect calls (@name(…) / @(expr)(…)), /inet/tcp/… and /inet/udp/… client sockets, gettext builtins (bindtextdomain, dcgettext, dcngettext with .mo catalogs via the gettext crate), -M/--bignum MPFR.

[0x04] MULTITHREADING // PARALLEL EXECUTION GRID

 ┌─────────────────────────────────────────────┐
 │  WORKER 0  ▓▓  CHUNK 0   ██ REORDER QUEUE  │
 │  WORKER 1  ▓▓  CHUNK 1   ██ ──────────────>│
 │  WORKER 2  ▓▓  CHUNK 2   ██  DETERMINISTIC │
 │  WORKER N  ▓▓  CHUNK N   ██  OUTPUT STREAM  │
 └─────────────────────────────────────────────┘

Default -j/--threads is 1. Pass a higher value when the program is parallel-safe (static check: no range patterns, no exit/nextfile/delete, no primary getline, no pipe/coproc getline, no asort/asorti, no indirect calls, no print/printf redirection, no cross-record assignments). Records are processed in parallel via rayon and output is reordered to input order within each batch so pipelines stay deterministic.

Regular files are memory-mapped (memmap2) and scanned with the same RS rules as the sequential path — no read() copy of the whole file. Stdin parallel chunks up to --read-ahead lines (default 1024) per batch, dispatches to workers, emits in order, then refills.

Workers run the same bytecode VM as the sequential path. The compiled program is shared via Arc<CompiledProgram> (one compile, cheap refcount per worker) with per-worker runtime state.

Fallback: non-parallel-safe programs run sequentially with a warning when -j > 1. Programs that use primary getline (including in BEGIN) also run sequentially for file input. END only sees post-BEGIN global state — record-rule mutations from parallel workers are not merged.

[0x05] BYTECODE VM // EXECUTION CORE

┌──────────────────────────────────────────────────────────────┐ │ ARCHITECTURE: STACK VM OPTIMIZATION: PEEPHOLE FUSED │ └──────────────────────────────────────────────────────────────┘

awkrs compiles AWK programs into a flat bytecode instruction stream and runs them on a stack VM. Short-circuit &&/||, control flow, and range patterns resolve to jump-patched offsets at compile time. The string pool interns variable names and string constants for cheap u32 indexing.

fusevm offload (on by default): eligible numeric bytecode chunks are lowered to fusevm — the shared bytecode VM also used by zshrs and strykelang — and run on fusevm::VM. Set AWKRS_FUSEVM=0 to force the bytecode interpreter for every chunk. src/fusevm_bridge.rs translates an eligible chunk (is_fusevm_eligible) into a fusevm::Chunk (build_numeric_chunk), runs it, and writes modified slots back into the awkrs runtime. Each per-record chunk is a stable chunk (no baked-in seed preamble); the accumulator's prior value is seeded as data into the VM's base frame before run(), so the chunk — and therefore its op-hash — is identical across every record, which is what lets the JIT-compiled native code be reused across records and across processes. int(x) is admitted into the numeric chunk and lowers to the native fusevm::Op::AwkInt (Cranelift trunc) — the first AWK builtin reachable end-to-end from awkrs into fusevm's native AWK-op set and JIT. The transcendental math builtins sin/cos/exp/atan2 are likewise admitted, lowering to native fusevm::Op::AwkSin/AwkCos/AwkExp/AwkAtan2 — Cranelift libcalls to small Rust helpers that canonicalize a NaN result to +nan, matching awkrs/gawk's NaN-sign normalization. The gawk bitwise builtins and/or/xor (variadic, ≥2 args) are admitted too, lowering to native fusevm::Op::AwkAnd/AwkOr/AwkXor (Cranelift band/bor/bxor over a saturating f64→i64 conversion that matches awkrs's num_to_u64 for huge/NaN operands). sqrt/log stay interpreter-side because they emit a host stderr warning on a negative argument that a pure native op cannot reproduce; lshift/rshift/compl stay interpreter-side because they raise a fatal on a negative argument (a value-dependent trap), unlike the trap-free and/or/xor. Eager block-JIT compilation for offloaded loops: a BEGIN/END loop chunk offloaded via run_fusevm_region calls fusevm::VM::run() exactly once, so under fusevm's normal warm-up threshold (compile after N invocations) the block JIT would never compile it and the whole loop would run on fusevm's slower interpreter. Because eligible_loop_prefix only selects a region that contains a backward jump (a genuinely hot loop), the bridge forces the block-JIT threshold to 0 (compile on the first invocation) around that single run(), so the loop compiles to native code immediately. Measured: a 20M-iteration s += sin(i) loop dropped from ~12.8s (interpreter) to ~0.15s, and an x = and(x, i) + 1 loop from ~14.0s to ~0.13s — ~85–110× — both bit-for-bit identical to AWKRS_FUSEVM=0. Division and modulo are offloaded and block-JIT-compiled: a chunk containing /, %, or compound /=/%= lowers to the native trapping ops fusevm::Op::AwkDivJit/AwkModJit, which the block JIT compiles with a guarded zero-divisor early-exit (fcmp eq divisor, 0.0 → a trap libcall that raises the fatal division by zero attempted / …in %'for modulo, elsefdiv/fmod). So a hot division loop runs as native code — a 20M-iteration division loop dropped from ~8.0s (interpreter) to ~0.12s, ~68× — while a zero divisor reached inside the compiled loop still raises the POSIX fatal instead of producing inf/NaNor hanging. The trap libcall is not a registered host-helper id, soAwkDivJit/AwkModJitchunks JIT in-process only and skip the on-disk cache (no schema impact). fusevm also still defines the interpreter-only trappingOp::AwkDiv/AwkMod(distinct from its shell-arithmeticOp::Div/Op::Modused byzshrs/strykelang); awkrs emits the *Jit` variants.

fusevm persistent JIT cache (~/.cache/fusevm-jit): awkrs builds fusevm with the jit-disk-cache feature (see Cargo.toml), so when the fusevm path is active fusevm's Cranelift tiers persist native-compiled code to ~/.cache/fusevm-jit and reload it across processes — the same on-disk JIT cache zshrs uses. Two tiers write there: the block JIT compiles a fully-eligible per-record numeric chunk to native code (*.blk.fjit), and the tracing JIT compiles hot in-chunk loop traces (*.trc.fjit). This is distinct from awkrs's own ~/.awkrs/scripts.rkyv bytecode cache (below): one caches fusevm-emitted machine code keyed by chunk op-hash (and slot-kind hash), the other caches awkrs bytecode keyed by source. Override the directory with FUSEVM_JIT_CACHE_DIR (off disables); cap it with FUSEVM_JIT_CACHE_MAX_BYTES. fusevm's block JIT round-trips awk's f64 slots through its SlotKind::Float bit-pattern model (a slot is an i64 holding the raw f64 bits; GetSlot/SetSlot bitcast through f64), so an x = int(x + c) accumulator now block-JIT-compiles natively and caches a *.blk.fjit artifact reused on the next run — verified producing the artifact and matching the bytecode interpreter bit-for-bit. Coverage is still partial — the tracing JIT does not yet hot-trace every top-tested for/while shape, and only the numeric chunk set below is eligible. So the cache engages for the chunks fusevm can compile and is inert for the rest. The offload is on by default (AWKRS_FUSEVM=0 disables it); with the block JIT's f64-slot support (below) and eager compilation of offloaded loop regions (above) it is a large win on JIT-compilable numeric loops — measured 14–110× over the bytecode interpreter for pure-arithmetic and builtin (sin/and/…) accumulator loops — while staying bit-for-bit faithful to the bytecode interpreter, and ineligible chunks run on the interpreter exactly as before. The default execution path for ineligible chunks is the bytecode interpreter with peephole-fused superinstructions; AWKRS_JIT=0 (or -s/--no-optimize) disables peephole optimization too.

fusevm bridge per-record caches (in-process): the persistent on-disk cache above caches compiled native code keyed by fusevm-op-hash; the bridge ALSO maintains in-process caches on Runtime that catch the upstream work the disk cache can't touch. (1) fuse_chunk_cache: HashMap<(chunk_ptr, bignum), Option<Arc<(fusevm::Chunk, Vec<u16>)>>> caches the built fusevm::Chunk per awkrs Chunk so build_numeric_chunk's eligibility check + 2-pass op→fusevm translation only runs once per (chunk, bignum), not per record. (2) fuse_last_chunk_key/fuse_last_chunk_value form a single-slot side-table that hoists the HashMap lookup out of the per-record path entirely for the common single-rule case (one tuple compare + Arc::clone, no HashMap touch). (3) fuse_vm_pool: fusevm::VMPool recycles fusevm::VM instances across records — VM::reset(chunk) preserves Vec capacities (stack, frames, slot_buf, globals) so subsequent records reuse the underlying allocations. (4) The cache value's Vec<u16> is the precomputed write-slot set — per-record writeback only walks Op::SetSlot targets instead of all N runtime slots. (5) Slot seeding goes direct from ctx.rt.slots into vm.set_slot with no intermediate Vec<f64> allocation. Cumulative win on the awkrs JIT path: 11% on count_gt_5m over 10M records, 6% on compound_pred (release best-of-3). The JIT path is still net-slower than the awkrs interpreter on tight one-op-body micro-benches — the remaining gap is fusevm's per-call VM setup (chunk move into the VM, JIT entry overhead) which would need fusevm-side API changes (Arc-aware VM::reset) to cut further.

Eligible chunks (fusevm offload): pure-numeric bodies — constants, arithmetic/comparisons, ++/--, compound assignments (+=/-=/*=//=/%=/^=), division/modulo (//%, native trapping Op::AwkDivJit/AwkModJit with a guarded zero-divisor early-exit), jumps and fused loop tests, scalar slot reads/writes whose values stay numeric, int(x) (native Op::AwkInt), the transcendentals sin/cos/exp/atan2 (native Op::AwkSin/AwkCos/AwkExp/AwkAtan2, NaN→+nan), and the bitwise builtins and/or/xor (native Op::AwkAnd/AwkOr/AwkXor, saturating f64→i64). When such a chunk is a hot loop offloaded as a region, the block JIT is eager-compiled on the first run() (see above) so the native code is used immediately rather than after a warm-up. Chunks that touch strings, fields, arrays, regexes, getline, print, user calls, or other builtins (including sqrt/log, which warn on negative args, and lshift/rshift/compl, which raise a fatal on negative args) are ineligible. -M/--bignum disables the path entirely (MPFR values can't be represented as f64 slots). Consulted by default (AWKRS_FUSEVM=0 forces everything onto the bytecode interpreter).

Peephole fusion combines common sequences into single opcodes:

print $N → PrintFieldStdout (zero-alloc field write)
s += $N → AddFieldToSlot (in-place numeric parse)
i = i + 1 / i++ / ++i → IncrSlot (one numeric add, no stack traffic)
s += i between slots → AddSlotToSlot
$1 "," $2 literal concat → ConcatPoolStr
NR++ HashMap-path → IncrVar

Inline fast paths bypass VmCtx entirely for single-rule programs with one fused opcode ({ print $1 }, { s += $1 }). Memory-mapped files also recognize { gsub("lit", "repl"); print } with literal pattern: when the needle is absent, the loop writes each line from the mapped buffer with ORS and skips the VM.

Bytecode cache: -f script.awk invocations memoize the compiled CompiledProgram to ~/.awkrs/scripts.rkyv — an rkyv-archived shard with mmap + zero-copy ArchivedHashMap lookup on the read path (check_archived_root validation) and flock-serialized atomic-rename writes. Each entry's inner CompiledProgram blob is bincode (rkyv outer, bincode inner — same architecture as zshrs/stryke). Repeat runs skip lex/parse/compile entirely — only the matched entry's blob is decoded. Entries are invalidated on source-file mtime change or when the running awkrs binary is newer than the cached entry (any rebuild silently rebuilds the cache). Disable with AWKRS_CACHE=0. The cache only engages for the simple -f script.awk form — inline -e/--source, -E, -i/--include, -l/--load, --debug, --lint, --pretty-print, and --gen-pot skip the cache because they need the AST.

Based on a survey of the major public awk implementations (BWK awk, gawk, mawk, goawk, frawk, zawk), awkrs appears to be the first awk implementation to pair a bytecode VM with a persistent on-disk bytecode cache. frawk is the closest prior art on JIT — it has VM + Cranelift/LLVM JIT — but re-compiles on every invocation; its overview and README contain no mention of disk-persisted compiled artifacts. gawk's pm-gawk persists script-defined variables and functions across runs, not compiled bytecode — different feature. (awkrs's own fusevm/Cranelift offload is on by default for eligible numeric chunks and, with eager block-JIT compilation of offloaded loop regions, is now a large win on those loops — 14–110× over the bytecode interpreter — though it does not engage for the string/field-dominated programs most awk workloads run, so it is not claimed as the headline feature here; note the fusevm offload additionally carries its own separate persistent machine-code cache at ~/.cache/fusevm-jit, distinct from the bytecode cache claimed here.)

Implementation	Bytecode VM	JIT	Persistent bytecode cache
BWK awk (one-true-awk)	✗ tree-walker	✗	✗
gawk	✓	✗	✗ (`pm-gawk` is for vars)
mawk	✓	✗	✗
goawk	✓	✗	✗
frawk	✓	✓ Cranelift + LLVM	✗
zawk (frawk fork)	✓	✓ Cranelift + LLVM	✗
awkrs	✓	◐ fusevm/Cranelift (on by default; block + tracing tiers, `~/.cache/fusevm-jit`)	✓

Raw byte field extraction: print $N with default FS scans raw bytes in the mapped file buffer to find the Nth whitespace field, writes it to the output buffer, and appends Runtime::ors_bytes — no record copy, no UTF-8 validation.

Other optimizations:

Indexed slots: scalars get u16 slot indices; reads/writes are flat-array indexing instead of HashMap lookups (specials like NR/FS/OFS and array names stay on the HashMap path).
Zero-copy fields: fields stored as (u32, u32) byte ranges into the record string; owned Strings only on set_field.
Direct-to-buffer print: stdout writes go straight into a 64 KB Vec<u8> (flushed at file boundaries) — no per-record String, format!(), or stdout locking.
Cached separators: OFS/ORS bytes cached on the runtime, updated only on assignment. The direct-to-buffer stdout print path uses the full ofs_bytes/ors_bytes slices (arbitrary length; not capped at 64 bytes).
Byte-level input: read_until(b'\n') into a reusable Vec<u8> skips per-line UTF-8 validation.
Regex cache: compiled Regex objects cached in a HashMap<String, Regex>.
sub/gsub: when target is $0, applies the new record in one step. Literal needles reuse a cached memmem::Finder. Constant string operands pass via Cow (no per-call alloc).
parse_number: fast-paths plain decimal integer field text before falling back to str::parse::<f64>().
Slurped input: newline scanning uses memchr.
Parallel: compiled program shared via Arc across rayon workers (zero-copy).

[0x06] BENCHMARKS // COMBAT METRICS (vs awk / gawk / mawk)

┌──────────────────────────────────────────────────────────────┐ │ HARDWARE: APPLE M5 MAX OS: macOS ARCH: arm64 │ └──────────────────────────────────────────────────────────────┘

Measured with hyperfine. BSD awk (/usr/bin/awk), GNU gawk 5.4.0, mawk 1.3.4, awkrs (see Cargo.toml for current version). Relative = mean ÷ fastest mean in that table. awkrs has two rows: default (JIT attempted) vs AWKRS_JIT=0 (bytecode only). Each table is one hyperfine invocation across all five commands on the same 1 M-line input, generated 2026-04-10 UTC by ./scripts/benchmark-vs-awk.sh and copied verbatim from benchmarks/benchmark-results.md. For the awkrs-only JIT-vs-bytecode A/B see benchmarks/benchmark-readme-jit.md.

Caveat (2026-06-01): the awkrs (JIT) rows below were generated against awkrs's former in-tree Cranelift module (src/jit.rs), which has since been removed; JIT now means the fusevm/Cranelift offload (on by default; AWKRS_FUSEVM=0 disables it), which does not engage for these string/field programs anyway (they are ineligible). The default path for such programs is the fused-superinstruction bytecode interpreter — effectively the awkrs (bytecode) row. Re-run ./scripts/benchmark-vs-awk.sh to regenerate.

1. Throughput: `{ print $1 }` over 1 M lines

Command	Mean	Min	Max	Relative
BSD awk	195.0 ms	179.8 ms	221.6 ms	12.43×
gawk	100.8 ms	92.8 ms	115.8 ms	6.42×
mawk	66.2 ms	61.9 ms	78.4 ms	4.22×
awkrs (JIT)	15.7 ms	13.3 ms	19.6 ms	1.00×
awkrs (bytecode)	16.1 ms	13.1 ms	20.2 ms	1.03×

2. CPU-bound BEGIN (no input)

BEGIN { s = 0; for (i = 1; i < 400001; i = i + 1) s += i; print s }

Command	Mean	Min	Max	Relative
BSD awk	15.8 ms	14.0 ms	18.6 ms	1.71×
gawk	20.7 ms	18.8 ms	22.9 ms	2.24×
mawk	9.7 ms	8.3 ms	11.4 ms	1.06×
awkrs (JIT)	9.2 ms	8.4 ms	12.0 ms	1.00×
awkrs (bytecode)	9.6 ms	8.2 ms	12.0 ms	1.04×

3. Sum first column (`{ s += $1 } END { print s }`, 1 M lines)

Cross-record state is not parallel-safe, so awkrs stays single-threaded here. On regular-file input, awkrs uses a raw byte path: parses the Nth whitespace field directly from the mmap'd buffer.

Command	Mean	Min	Max	Relative
BSD awk	158.5 ms	147.0 ms	172.7 ms	12.27×
gawk	62.9 ms	58.4 ms	68.9 ms	4.87×
mawk	37.5 ms	33.7 ms	39.9 ms	2.90×
awkrs (JIT)	13.0 ms	11.9 ms	15.4 ms	1.01×
awkrs (bytecode)	12.9 ms	11.5 ms	16.1 ms	1.00×

4. Multi-field print (`{ print $1, $3, $5 }`, 1 M lines, 5 fields/line)

Command	Mean	Min	Max	Relative
BSD awk	647.6 ms	623.5 ms	686.3 ms	11.60×
gawk	266.1 ms	257.4 ms	301.8 ms	4.77×
mawk	156.6 ms	149.8 ms	170.7 ms	2.81×
awkrs (JIT)	56.4 ms	53.1 ms	61.8 ms	1.01×
awkrs (bytecode)	55.8 ms	53.4 ms	61.6 ms	1.00×

5. Regex filter (`/alpha/ { c += 1 } END { print c }`, 1 M lines, no matches)

Command	Mean	Min	Max	Relative
BSD awk	191.8 ms	180.1 ms	208.9 ms	17.31×
gawk	351.4 ms	342.7 ms	363.3 ms	31.72×
mawk	19.3 ms	17.5 ms	21.8 ms	1.74×
awkrs (JIT)	11.1 ms	9.5 ms	13.5 ms	1.00×
awkrs (bytecode)	11.1 ms	9.5 ms	14.6 ms	1.00×

6. Associative array (`{ a[$5] += 1 } END { for (k in a) print k, a[k] }`, 1 M lines)

Command	Mean	Min	Max	Relative
BSD awk	826.2 ms	792.2 ms	896.0 ms	2.43×
gawk	342.4 ms	330.6 ms	362.5 ms	1.01×
mawk	610.0 ms	588.9 ms	648.7 ms	1.79×
awkrs (JIT)	340.0 ms	324.2 ms	377.7 ms	1.00×
awkrs (bytecode)	343.7 ms	323.5 ms	356.7 ms	1.01×

7. Conditional field (`NR % 2 == 0 { print $2 }`, 1 M lines, 2 fields/line)

Command	Mean	Min	Max	Relative
BSD awk	289.1 ms	263.1 ms	321.1 ms	9.58×
gawk	116.1 ms	111.0 ms	124.4 ms	3.85×
mawk	71.1 ms	66.9 ms	83.6 ms	2.36×
awkrs (JIT)	30.2 ms	28.1 ms	34.0 ms	1.00×
awkrs (bytecode)	30.7 ms	28.0 ms	35.5 ms	1.02×

8. Field computation (`{ sum += $1 * $2 } END { print sum }`, 1 M lines, 2 fields/line)

On regular-file input with default FS, awkrs extracts both fields in a single byte scan and parses them as numbers directly from the mmap'd buffer.

Command	Mean	Min	Max	Relative
BSD awk	261.8 ms	251.4 ms	280.8 ms	13.96×
gawk	100.5 ms	95.3 ms	109.5 ms	5.36×
mawk	57.7 ms	54.5 ms	61.1 ms	3.08×
awkrs (JIT)	19.0 ms	17.6 ms	23.0 ms	1.01×
awkrs (bytecode)	18.8 ms	17.5 ms	22.8 ms	1.00×

9. String concat print (`{ print $3 "-" $5 }`, 1 M lines, 5 fields/line)

Command	Mean	Min	Max	Relative
BSD awk	640.8 ms	611.9 ms	689.3 ms	12.68×
gawk	182.2 ms	168.1 ms	197.2 ms	3.61×
mawk	121.0 ms	113.6 ms	128.1 ms	2.39×
awkrs (JIT)	51.0 ms	49.2 ms	53.8 ms	1.01×
awkrs (bytecode)	50.5 ms	48.8 ms	54.8 ms	1.00×

10. gsub (`{ gsub("alpha", "ALPHA"); print }`, 1 M lines, no matches)

Lines do not contain alpha, so this measures no-match gsub plus print. On regular-file input, awkrs uses a slurp inline path: byte memmem scan + print with no VM or per-line set_field_sep_split when the literal is absent.

Command	Mean	Min	Max	Relative
BSD awk	291.5 ms	282.3 ms	300.4 ms	21.15×
gawk	436.3 ms	425.7 ms	459.3 ms	31.66×
mawk	74.3 ms	68.8 ms	84.2 ms	5.39×
awkrs (JIT)	13.8 ms	12.8 ms	16.2 ms	1.00×
awkrs (bytecode)	13.9 ms	12.7 ms	17.6 ms	1.01×

./scripts/benchmark-vs-awk.sh                              # cross-engine §1–§10 (1 M lines)
AWKRS_BENCH_LINES=5000000 ./scripts/benchmark-vs-awk.sh    # 5 M line sweep
./scripts/benchmark-readme-jit-vs-vm.sh                    # awkrs-only JIT vs bytecode A/B

Demo scripts

Quick tours over the feature surface — runnable against the debug build (cargo build); each script auto-builds if the binary is missing.

./scripts/demo-quickstart.sh        # 16-section feature tour (CSV, FIELDWIDTHS, FPAT, RS regex,
                                    # gensub, match() capture, pipes, getline, parallel, JIT toggle)
./scripts/demo-log-parsing.sh       # applied: access log → status dist, CSV → per-host stats,
                                    # ps → top RSS, /etc/passwd → shell histogram
LINES=200000 ./scripts/demo-parallel-jit.sh   # /usr/bin/time -p sweep of -j 1 vs -j N and
                                              # JIT default vs AWKRS_JIT=0 bytecode

NO_COLOR=1 strips ANSI from the demo output. The applied demo writes its inputs to $TMPDIR and cleans up on exit.

Deep examples (`examples/*.awk`)

Substantive standalone awk programs that exercise recursion, multidim arrays via SUBSEP, PROCINFO["sorted_in"], three-arg match(), asort, and bit ops. Each <name>.awk ships with <name>.in (stdin input). Every example is byte-for-byte verified against gawk by the parity job in CI (bash parity/run_parity.sh gawk), so they double as gawk-extension regression tests.

File	What it shows
`bst.awk`	BST insert + inorder / preorder / postorder traversals via recursion
`heap_sort.awk`	Min-heap push / pop → heapsort
`trie.awk`	Trie membership + prefix-count using `SUBSEP` two-level keys
`levenshtein.awk`	O(la·lb) edit-distance DP table held in a `SUBSEP` 2D array
`calc_rd.awk`	Recursive-descent arithmetic parser (`+ - * / % ^` unary `( )`, right-assoc `^`)
`topo_sort.awk`	Kahn's algorithm topological sort + cycle detection
`brainfuck.awk`	Brainfuck interpreter (precomputed bracket map, modulo-256 cells)
`rpn.awk`	Postfix calculator (dup / swap / drop / neg + arith) on an explicit stack
`json_pretty.awk`	JSON tokeniser → 2-space indented pretty-printer
`hexdump.awk`	`xxd`-style hex + ASCII dump (16-byte rows with offset, hex, gutter, ascii)
`csv_pivot.awk`	CSV → per-group min / max / mean / total aggregation
`graph_bfs.awk`	Undirected BFS from a source + path reconstruction
`markov.awk`	Bigram model → top-3 continuations + deterministic 12-step walk
`sql_like.awk`	Mini-SQL on CSV: `SELECT … WHERE … GROUP BY … SUM/AVG/COUNT … ORDER BY …`
`sudoku.awk`	9×9 Sudoku solver via recursive backtracking (row/col/box witness sets)
`regex_engine.awk`	Recursive regex matcher (`. * + ? ^ $ [...] \`) written in awk
`dijkstra.awk`	Single-source shortest paths with a binary min-heap PQ
`kruskal.awk`	MST via union-find (path halving + union-by-rank)
`maze_bfs.awk`	BFS shortest path through an ASCII maze; path overlaid with `*`
`diff_lcs.awk`	LCS-based unified diff with recursive traceback
`n_queens.awk`	N-queens backtracking with col / diag1 / diag2 constant-time witnesses
`kmp.awk`	Knuth-Morris-Pratt substring search (failure function + scan)
`intervals.awk`	Merge overlapping closed intervals; report total covered length
`roman.awk`	Roman numerals ↔ integers, subtractive form, range 1..3999
`knapsack.awk`	0/1 knapsack DP table + traceback to recover chosen items
`prime_sieve.awk`	Sieve of Eratosthenes up to N + ten-per-row pretty printing
`floyd_warshall.awk`	All-pairs shortest paths (negative weights allowed)
`bellman_ford.awk`	Single-source shortest paths + negative-cycle detection
`scc_tarjan.awk`	Tarjan's strongly connected components (recursion + lowlink)
`conway.awk`	Conway's Game of Life — fixed-grid evolution for N generations
`a_star.awk`	A* on an ASCII grid (Manhattan heuristic, tuple-keyed min-heap PQ)
`base64.awk`	RFC-4648 base64 encode + decode (no external tools)
`base_conv.awk`	Integer base conversion 2..36 in either direction
`rle.awk`	Run-length encode + decode (whitespace preserved)
`vigenere.awk`	Vigenère cipher (encrypt + decrypt; case preserved)
`bigint_mul.awk`	Arbitrary-precision multiplication via schoolbook on digit arrays
`lru_cache.awk`	LRU cache: O(1) get + put via doubly-linked list + hash map
`segment_tree.awk`	Iterative segment tree (point update + range-sum query)
`fenwick.awk`	Fenwick / Binary Indexed Tree (prefix + range sums)
`convex_hull.awk`	Convex hull via Andrew's monotone chain + shoelace area
`lis.awk`	Longest increasing subsequence in O(n log n) with traceback
`huffman.awk`	Huffman coding — build prefix tree, encode + round-trip decode
`manacher.awk`	Manacher's longest palindromic substring in O(n)
`subset_sum.awk`	Subset-sum DP + reconstruction of one valid subset
`permutations.awk`	Heap's algorithm — enumerate all n! permutations
`tsp_dp.awk`	Held-Karp bitmask DP for travelling salesman (≤15 cities)
`anagrams.awk`	Group anagrams by sorted-letter signature
`rule30.awk`	Wolfram Rule 30 elementary 1D cellular automaton
`aho_corasick.awk`	Aho-Corasick multi-pattern search (goto/fail/output trie)
`z_function.awk`	Z-array — linear-time prefix-match table + pattern search
`rabin_karp.awk`	Rolling-hash substring search (Rabin-Karp)
`shunting_yard.awk`	Dijkstra's shunting-yard infix → postfix → evaluate
`modexp.awk`	Modular exponentiation + deterministic Miller-Rabin primality
`gcd_extended.awk`	Extended Euclidean algorithm + modular inverse
`mandelbrot.awk`	ASCII Mandelbrot escape-time render
`ini_parser.awk`	INI config parser with sections / comments / globals
`url_parser.awk`	URL decomposer (scheme / user / pass / host / port / path / query / fragment)
`tictactoe.awk`	Minimax tic-tac-toe solver — best move + outcome from any position
`coin_change.awk`	Min-coins DP + reconstruction of one optimal combination
`prim_mst.awk`	Prim's MST via lazy linear frontier scan
`boyer_moore.awk`	Boyer-Moore substring search (bad-character heuristic)
`suffix_array.awk`	Suffix array + LCP (lex-sort-based) for each input line
`avl_tree.awk`	AVL self-balancing BST with insert, inorder, height, balance-check
`quickselect.awk`	Kth smallest element via Hoare partitioning
`horner.awk`	Horner's method: evaluate, derivative, synthetic division
`pollard_rho.awk`	Pollard's rho integer factorization + Miller-Rabin
`lzw.awk`	LZW compression encode + decode (256-byte dictionary start)
`markdown_basic.awk`	Markdown → HTML for a subset (headings, lists, code, links)
`date_calc.awk`	Day-of-week (Zeller), calendar generator, date difference
`gauss_elim.awk`	Gaussian elimination with partial pivoting (Ax = b)
`twenty48.awk`	2048 board: apply L/R/U/D moves, merge tiles, track score
`email_extract.awk`	Find emails in free text + sorted unique tally

Run any example directly:

./target/release/awkrs -f examples/calc_rd.awk <examples/calc_rd.in
./target/release/awkrs -f examples/brainfuck.awk <examples/brainfuck.in

[0x07] BUILD // COMPILE THE PAYLOAD

cargo build --release

awkrs --help / -h prints a cyberpunk HUD (ASCII banner, status box, taglines, footer) in the style of MenkeTechnologies tp -h. ANSI colors apply when stdout is a TTY; set NO_COLOR to force plain text.

Regenerate the help screenshot after UI changes: ./scripts/gen-help-screenshot.sh (needs termshot on PATH and a prior cargo build). The capture runs on a PTY with NO_COLOR unset and renders at 256 columns.

[0x08] TEST // INTEGRITY VERIFICATION

cargo test

CI runs on pushes and pull requests to main via GitHub Actions: one Ubuntu lint job (cargo fmt --check, cargo clippy -D warnings, cargo doc with RUSTDOCFLAGS=-D warnings) plus a build/test matrix on Ubuntu and macOS.

Coverage spans library unit tests for every module (lexer, parser, format, builtins, interp, vm, jit, compiler, runtime, locale, cli, cyber_help) and integration suites under tests/ that exercise the gawk-style additions, the slurped-input path, parallel record behavior, and the full CLI surface. Cross-feature combinations (CSV + ENDFILE, paragraph RS="" + getline, FIELDWIDTHS + NF reassignment, ...) live in tests/cross_feature_integration.rs.

[0x09] DOCUMENTATION // RENDERED HTML + MARKDOWN

docs/ is published to GitHub Pages on every push to main and is the authoritative source for the rendered reference + engineering report.

Doc	Source	Live URL
User reference (quickstart, builtins, variables, examples, cache + parallel notes)	`docs/index.html`	https://menketechnologies.github.io/awkrs/
Engineering report (architecture, module table, perf stack, divergence ledger, competitive matrix)	`docs/report.html`	https://menketechnologies.github.io/awkrs/report.html
Compatibility matrix vs BSD awk / mawk / gawk	`docs/COMPATIBILITY.md`	renders on GitHub
Benchmarks vs BSD awk / mawk / gawk (hyperfine, 1 M lines)	`benchmarks/benchmark-results.md`	renders on GitHub
JIT-on vs JIT-off A/B (awkrs-only)	`benchmarks/benchmark-readme-jit.md`	renders on GitHub
Rust API docs (autogenerated)	`cargo doc --open`	https://docs.rs/awkrs

The HUD-themed HTML docs (docs/index.html, docs/report.html) share hud-static.css, hud-theme.js, and tutorial.css — open them locally via file:// or browse the GitHub Pages URL above.

[0xFF] LICENSE

┌──────────────────────────────────────────────────────────────┐ │ MIT LICENSE // UNAUTHORIZED REPRODUCTION WILL BE MET │ │ WITH FULL ICE │ └──────────────────────────────────────────────────────────────┘

░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
░░ >>> JACK IN. MATCH THE PATTERN. EXECUTE THE ACTION. <<< ░░
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

`[WORLDS FASTEST AWK BYTECODE ENGINE // PARALLEL RECORD PROCESSOR // RUST CORE]`

`Read the Docs` · `Engineering Report` · `strykelang` · `zshrs` · `fusevm`

Table of Contents

[0x00] SYSTEM SCAN

HELP // SYSTEM INTERFACE

[0x01] SYSTEM REQUIREMENTS

[0x02] INSTALLATION

[0x03] LANGUAGE COVERAGE

[0x04] MULTITHREADING // PARALLEL EXECUTION GRID

[0x05] BYTECODE VM // EXECUTION CORE

[0x06] BENCHMARKS // COMBAT METRICS (vs awk / gawk / mawk)

1. Throughput: `{ print $1 }` over 1 M lines

2. CPU-bound BEGIN (no input)

3. Sum first column (`{ s += $1 } END { print s }`, 1 M lines)

4. Multi-field print (`{ print $1, $3, $5 }`, 1 M lines, 5 fields/line)

5. Regex filter (`/alpha/ { c += 1 } END { print c }`, 1 M lines, no matches)

6. Associative array (`{ a[$5] += 1 } END { for (k in a) print k, a[k] }`, 1 M lines)

7. Conditional field (`NR % 2 == 0 { print $2 }`, 1 M lines, 2 fields/line)

8. Field computation (`{ sum += $1 * $2 } END { print sum }`, 1 M lines, 2 fields/line)

9. String concat print (`{ print $3 "-" $5 }`, 1 M lines, 5 fields/line)

10. gsub (`{ gsub("alpha", "ALPHA"); print }`, 1 M lines, no matches)

Demo scripts

Deep examples (`examples/*.awk`)

[0x07] BUILD // COMPILE THE PAYLOAD

[0x08] TEST // INTEGRITY VERIFICATION

[0x09] DOCUMENTATION // RENDERED HTML + MARKDOWN

[0xFF] LICENSE

created by MenkeTechnologies

awkrs 0.4.14

[WORLDS FASTEST AWK BYTECODE ENGINE // PARALLEL RECORD PROCESSOR // RUST CORE]

Read the Docs · Engineering Report · strykelang · zshrs · fusevm

Table of Contents

[0x00] SYSTEM SCAN

HELP // SYSTEM INTERFACE

[0x01] SYSTEM REQUIREMENTS

[0x02] INSTALLATION

[0x03] LANGUAGE COVERAGE

[0x04] MULTITHREADING // PARALLEL EXECUTION GRID

[0x05] BYTECODE VM // EXECUTION CORE

[0x06] BENCHMARKS // COMBAT METRICS (vs awk / gawk / mawk)

1. Throughput: { print $1 } over 1 M lines

2. CPU-bound BEGIN (no input)

3. Sum first column ({ s += $1 } END { print s }, 1 M lines)

4. Multi-field print ({ print $1, $3, $5 }, 1 M lines, 5 fields/line)

5. Regex filter (/alpha/ { c += 1 } END { print c }, 1 M lines, no matches)

6. Associative array ({ a[$5] += 1 } END { for (k in a) print k, a[k] }, 1 M lines)

7. Conditional field (NR % 2 == 0 { print $2 }, 1 M lines, 2 fields/line)

8. Field computation ({ sum += $1 * $2 } END { print sum }, 1 M lines, 2 fields/line)

9. String concat print ({ print $3 "-" $5 }, 1 M lines, 5 fields/line)

10. gsub ({ gsub("alpha", "ALPHA"); print }, 1 M lines, no matches)

Demo scripts

Deep examples (examples/*.awk)

[0x07] BUILD // COMPILE THE PAYLOAD

[0x08] TEST // INTEGRITY VERIFICATION

[0x09] DOCUMENTATION // RENDERED HTML + MARKDOWN

[0xFF] LICENSE

created by MenkeTechnologies

`[WORLDS FASTEST AWK BYTECODE ENGINE // PARALLEL RECORD PROCESSOR // RUST CORE]`

`Read the Docs` · `Engineering Report` · `strykelang` · `zshrs` · `fusevm`

1. Throughput: `{ print $1 }` over 1 M lines

3. Sum first column (`{ s += $1 } END { print s }`, 1 M lines)

4. Multi-field print (`{ print $1, $3, $5 }`, 1 M lines, 5 fields/line)

5. Regex filter (`/alpha/ { c += 1 } END { print c }`, 1 M lines, no matches)

6. Associative array (`{ a[$5] += 1 } END { for (k in a) print k, a[k] }`, 1 M lines)

7. Conditional field (`NR % 2 == 0 { print $2 }`, 1 M lines, 2 fields/line)

8. Field computation (`{ sum += $1 * $2 } END { print sum }`, 1 M lines, 2 fields/line)

9. String concat print (`{ print $3 "-" $5 }`, 1 M lines, 5 fields/line)

10. gsub (`{ gsub("alpha", "ALPHA"); print }`, 1 M lines, no matches)

Deep examples (`examples/*.awk`)