# awk-rs - AWK Implementation in Rust
A 100% POSIX-compatible AWK implementation in Rust, with comprehensive GNU AWK (gawk) extension support.
## 🎉 Version 0.1.0 Released
**awk-rs is now 100% POSIX AWK compliant** with comprehensive GNU AWK extension support. All core AWK functionality is implemented and tested.
| POSIX AWK | ✅ Complete |
| GNU AWK Extensions | ✅ Complete |
| Test Coverage | 639 tests, 86% coverage |
| Release Status | 🚀 v0.1.0 |
---
## Implementation Status
### ✅ COMPLETED - Full POSIX Compliance
#### Phase 1: Foundation & Core Infrastructure
- [x] Error types and result handling (`src/error.rs`)
- [x] CLI argument parsing with `-f`, `-F`, `-v`, `--` options
- [x] Multiple input files and stdin handling
- [x] FILENAME updates per input file
- [x] Complete lexer with all token types
- [x] String escape sequences (standard, hex `\xNN`, octal `\NNN`)
- [x] Regex literals vs division disambiguation
- [x] Line/column tracking for errors
- [x] Line continuation with backslash
- [x] Complete AST definition
- [x] Recursive descent parser with correct operator precedence
- [x] Print/printf argument lists
- [x] All getline variants (`getline`, `getline var`, `getline < file`, `cmd | getline`)
#### Phase 2: Runtime & Interpreter
- [x] Dynamic value system (String, Number, Uninitialized, NumericString)
- [x] Automatic type coercion rules
- [x] Numeric string semantics
- [x] Symbol table for global variables
- [x] Local function scope
- [x] Associative arrays with multi-dimensional subscripts
- [x] Array passing by reference to functions
- [x] `in` operator for membership testing
- [x] `delete` for array elements AND entire arrays
- [x] All special variables: `$0`, `$1`..`$NF`, `NF`, `NR`, `FNR`, `FS`, `RS`, `OFS`, `ORS`, `FILENAME`, `ARGC`, `ARGV`, `ENVIRON`, `CONVFMT`, `OFMT`, `RSTART`, `RLENGTH`, `SUBSEP`
- [x] Field splitting with `FS` (single char, space, regex)
- [x] Paragraph mode (`RS = ""`)
- [x] Field assignment and `$0` rebuilding
- [x] `NF` modification
- [x] All expression types
- [x] All statement types (if/else, while, for, for-in, do-while, break, continue, next, nextfile, exit, return)
- [x] All pattern types (expression, regex, range, BEGIN, END, compound)
#### Phase 3: Built-in Functions
- [x] String: `length`, `substr`, `index`, `split`, `sub`, `gsub`, `match`, `sprintf`, `tolower`, `toupper`
- [x] Math: `sin`, `cos`, `atan2`, `exp`, `log`, `sqrt`, `int`, `rand`, `srand`
- [x] I/O: `print`, `printf`, `getline`, `close`, `fflush`, `system`
- [x] Output redirection: `>`, `>>`, `|`
- [x] All printf format specifiers with width/precision/flags
#### Phase 4: Advanced Features
- [x] User-defined functions with local scope
- [x] Recursion support
- [x] Regex matching with `regex` crate (POSIX ERE)
- [x] `&` replacement in `sub`/`gsub`
- [x] Regex literals as function arguments
- [x] Unicode/UTF-8 support for string functions
---
### ✅ COMPLETED - GNU AWK Extensions
#### Patterns
- [x] `BEGINFILE` - execute before each input file
- [x] `ENDFILE` - execute after each input file
#### String Functions (gawk)
- [x] `gensub(regexp, replacement, how [, target])` - general substitution
- [x] `patsplit(string, array, fieldpat [, seps])` - split by pattern
#### Array Functions (gawk)
- [x] `asort(source [, dest])` - sort array by values
- [x] `asorti(source [, dest])` - sort array by indices
#### Time Functions (gawk)
- [x] `systime()` - current time as seconds since epoch
- [x] `mktime(datespec)` - convert date string to timestamp
- [x] `strftime(format [, timestamp])` - format timestamp
---
#### Field Splitting Extensions (gawk)
- [x] `FPAT` - field pattern for content-based splitting
- [x] `FIELDWIDTHS` - fixed-width field splitting
#### System Information (gawk)
- [x] `PROCINFO` array - process/system information
#### CLI Options
- [x] `--posix` / `-P` - strict POSIX mode (disable extensions)
- [x] `--traditional` / `-c` - traditional AWK mode (disable extensions)
---
### 🔄 REMAINING WORK (Optional Enhancements)
#### Advanced Features (Not Commonly Used)
- [ ] Two-way pipes (`|&`)
- [ ] `@include` directive
- [ ] Network I/O (`/inet/tcp`, `/inet/udp`)
---
## Test Coverage
```
Unit Tests: 170 (lexer, parser, interpreter, value system, error handling)
E2E Tests: 412 (complete AWK programs)
CLI Tests: 19 (command-line interface)
Compat Tests: 34 (gawk comparison)
Doc Tests: 4 (API examples)
Total: 639 tests
Coverage: 86% (library code, excluding CLI main.rs)
```
All tests pass with 100% success rate.
---
## Performance
The implementation includes:
- Optimized field splitting (byte-based for single-char FS)
- Compiled regex caching
- Unicode-aware string operations
- Release profile with LTO and single codegen unit
- Criterion benchmarks for profiling
---
## CI/CD
- GitHub Actions CI (Linux, macOS, Windows)
- Automated releases with binary builds
- Benchmark tracking
- CodeQL security scanning
- Dependency updates via Dependabot
---
## License
Dual-licensed under MIT and Apache 2.0.
Copyright (c) 2026 Pegasus Heavy Industries LLC.
---
## References
- [POSIX AWK Specification](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html)
- [GNU AWK Manual](https://www.gnu.org/software/gawk/manual/)
- [The AWK Programming Language (book)](https://en.wikipedia.org/wiki/The_AWK_Programming_Language)
- [One True AWK Source](https://github.com/onetrueawk/awk)