awk-rs 0.1.0

A 100% POSIX-compatible AWK implementation in Rust
Documentation
# awk-rs - AWK Implementation in Rust

A 100% POSIX-compatible AWK implementation in Rust, with comprehensive GNU AWK (gawk) extension support.

## 🎉 Version 0.1.0 Released

**awk-rs is now 100% POSIX AWK compliant** with comprehensive GNU AWK extension support. All core AWK functionality is implemented and tested.

| Category | Status |
|----------|--------|
| POSIX AWK | ✅ Complete |
| GNU AWK Extensions | ✅ Complete |
| Test Coverage | 639 tests, 86% coverage |
| Release Status | 🚀 v0.1.0 |

---

## Implementation Status

### ✅ COMPLETED - Full POSIX Compliance

#### Phase 1: Foundation & Core Infrastructure
- [x] Error types and result handling (`src/error.rs`)
- [x] CLI argument parsing with `-f`, `-F`, `-v`, `--` options
- [x] Multiple input files and stdin handling
- [x] FILENAME updates per input file
- [x] Complete lexer with all token types
- [x] String escape sequences (standard, hex `\xNN`, octal `\NNN`)
- [x] Regex literals vs division disambiguation
- [x] Line/column tracking for errors
- [x] Line continuation with backslash
- [x] Complete AST definition
- [x] Recursive descent parser with correct operator precedence
- [x] Print/printf argument lists
- [x] All getline variants (`getline`, `getline var`, `getline < file`, `cmd | getline`)

#### Phase 2: Runtime & Interpreter
- [x] Dynamic value system (String, Number, Uninitialized, NumericString)
- [x] Automatic type coercion rules
- [x] Numeric string semantics
- [x] Symbol table for global variables
- [x] Local function scope
- [x] Associative arrays with multi-dimensional subscripts
- [x] Array passing by reference to functions
- [x] `in` operator for membership testing
- [x] `delete` for array elements AND entire arrays
- [x] All special variables: `$0`, `$1`..`$NF`, `NF`, `NR`, `FNR`, `FS`, `RS`, `OFS`, `ORS`, `FILENAME`, `ARGC`, `ARGV`, `ENVIRON`, `CONVFMT`, `OFMT`, `RSTART`, `RLENGTH`, `SUBSEP`
- [x] Field splitting with `FS` (single char, space, regex)
- [x] Paragraph mode (`RS = ""`)
- [x] Field assignment and `$0` rebuilding
- [x] `NF` modification
- [x] All expression types
- [x] All statement types (if/else, while, for, for-in, do-while, break, continue, next, nextfile, exit, return)
- [x] All pattern types (expression, regex, range, BEGIN, END, compound)

#### Phase 3: Built-in Functions
- [x] String: `length`, `substr`, `index`, `split`, `sub`, `gsub`, `match`, `sprintf`, `tolower`, `toupper`
- [x] Math: `sin`, `cos`, `atan2`, `exp`, `log`, `sqrt`, `int`, `rand`, `srand`
- [x] I/O: `print`, `printf`, `getline`, `close`, `fflush`, `system`
- [x] Output redirection: `>`, `>>`, `|`
- [x] All printf format specifiers with width/precision/flags

#### Phase 4: Advanced Features
- [x] User-defined functions with local scope
- [x] Recursion support
- [x] Regex matching with `regex` crate (POSIX ERE)
- [x] `&` replacement in `sub`/`gsub`
- [x] Regex literals as function arguments
- [x] Unicode/UTF-8 support for string functions

---

### ✅ COMPLETED - GNU AWK Extensions

#### Patterns
- [x] `BEGINFILE` - execute before each input file
- [x] `ENDFILE` - execute after each input file

#### String Functions (gawk)
- [x] `gensub(regexp, replacement, how [, target])` - general substitution
- [x] `patsplit(string, array, fieldpat [, seps])` - split by pattern

#### Array Functions (gawk)
- [x] `asort(source [, dest])` - sort array by values
- [x] `asorti(source [, dest])` - sort array by indices

#### Time Functions (gawk)
- [x] `systime()` - current time as seconds since epoch
- [x] `mktime(datespec)` - convert date string to timestamp
- [x] `strftime(format [, timestamp])` - format timestamp

---

#### Field Splitting Extensions (gawk)
- [x] `FPAT` - field pattern for content-based splitting
- [x] `FIELDWIDTHS` - fixed-width field splitting

#### System Information (gawk)
- [x] `PROCINFO` array - process/system information

#### CLI Options
- [x] `--posix` / `-P` - strict POSIX mode (disable extensions)
- [x] `--traditional` / `-c` - traditional AWK mode (disable extensions)

---

### 🔄 REMAINING WORK (Optional Enhancements)

#### Advanced Features (Not Commonly Used)
- [ ] Two-way pipes (`|&`)
- [ ] `@include` directive
- [ ] Network I/O (`/inet/tcp`, `/inet/udp`)

---

## Test Coverage

```
Unit Tests:      170 (lexer, parser, interpreter, value system, error handling)
E2E Tests:       412 (complete AWK programs)
CLI Tests:        19 (command-line interface)
Compat Tests:     34 (gawk comparison)
Doc Tests:         4 (API examples)
Total:           639 tests

Coverage:        86% (library code, excluding CLI main.rs)
```

All tests pass with 100% success rate.

---

## Performance

The implementation includes:
- Optimized field splitting (byte-based for single-char FS)
- Compiled regex caching
- Unicode-aware string operations
- Release profile with LTO and single codegen unit
- Criterion benchmarks for profiling

---

## CI/CD

- GitHub Actions CI (Linux, macOS, Windows)
- Automated releases with binary builds
- Benchmark tracking
- CodeQL security scanning
- Dependency updates via Dependabot

---

## License

Dual-licensed under MIT and Apache 2.0.
Copyright (c) 2026 Pegasus Heavy Industries LLC.

---

## References

- [POSIX AWK Specification]https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
- [GNU AWK Manual]https://www.gnu.org/software/gawk/manual/
- [The AWK Programming Language (book)]https://en.wikipedia.org/wiki/The_AWK_Programming_Language
- [One True AWK Source]https://github.com/onetrueawk/awk