>_AWKRS REFERENCE
A fast AWK implementation written in Rust. Bytecode VM with optional Cranelift JIT, parallel record processing with rayon, and broad CLI compatibility with gawk, mawk, and nawk. Drop-in replacement for text processing pipelines.
Quickstart
Install from crates.io or build from source, then use aw (short) or awkrs:
# install
cargo install awkrs
# from source
git clone https://github.com/MenkeTechnologies/awkrs
cd awkrs && cargo build
# one-liners
aw 'BEGIN { print "hello, world" }'
aw -F: '{ print $1 }' /etc/passwd
aw '{ sum += $1 } END { print sum }' numbers.txt
echo "1 2 3" | aw '{ print $1 + $2 + $3 }'
# field processing
ls -l | aw 'NR > 1 { total += $5 } END { print total }'
# pattern matching
aw '/error/i { print FILENAME ":" NR ":" $0 }' *.log
Full install + usage live in the README.
Why awkrs — Feature Comparison
| Feature | awkrs | gawk | mawk | nawk |
|---|---|---|---|---|
| Parallel records | ✓ | ✗ | ✗ | ✗ |
| JIT compilation | Cranelift | ✗ | ✗ | ✗ |
| Bytecode VM | ✓ | ✓ | ✓ | ✗ |
| Unicode support | ✓ | ✓ | partial | ✗ |
| CSV mode | ✓ | ✓ | ✗ | ✗ |
| Regex backrefs | ✓ | ✓ | ✗ | ✗ |
| Time functions | ✓ | ✓ | ✗ | ✗ |
| I18N (gettext) | ✓ | ✓ | ✗ | ✗ |
| Network I/O | ✓ | ✓ | ✗ | ✗ |
| Single binary | ~8MB | pkg | ~200KB | pkg |
| Memory safety | Rust | C | C | C |
Overview
- Parser & compiler — recursive-descent parser producing an AST, compiled to bytecode for the VM. Hot paths can be JIT-compiled via Cranelift.
- Values — AWK values (string/number/uninitialized) with automatic coercion. Arrays are associative (hash maps).
- Regex — three-tier engine: Rust
regex→fancy-regex(backrefs) →pcre2(advanced). - Parallelism —
-Pflag enables parallel record processing via rayon work-stealing. - Binary size — ~8MB stripped with LTO.
Built-in Variables
| Variable | Description |
|---|---|
$0 | Current input record (entire line) |
$1, $2, ... | Fields of the current record |
NF | Number of fields in current record |
NR | Total number of records read so far |
FNR | Record number in current file |
FILENAME | Name of current input file |
FS | Input field separator (default: space) |
RS | Input record separator (default: newline) |
OFS | Output field separator |
ORS | Output record separator |
OFMT | Output format for numbers |
CONVFMT | Conversion format for numbers |
SUBSEP | Subscript separator for arrays |
RSTART | Start of match from match() |
RLENGTH | Length of match from match() |
ARGC, ARGV | Command-line argument count and array |
ENVIRON | Environment variables array |
Built-in Functions
String
length gsub sub match split substr index sprintf tolower toupper
Math
sin cos atan2 exp log sqrt int rand srand
I/O
print printf getline close fflush system
Time (gawk)
systime mktime strftime
Bit ops (gawk)
and or xor compl lshift rshift
Type (gawk)
typeof isarray
Array (gawk)
asort asorti delete
Regex (gawk)
gensub patsplit
Examples
Field extraction
Aggregation
Pattern matching
Text transformation
Multi-file processing
CLI Flags
-f FILE # read program from file -F FS # set field separator -v VAR=VAL # set variable before execution -b # binary mode (no UTF-8) -c # CSV mode -d # debug: dump variables -e PROG # program text (multiple allowed) -E FILE # like -f, but different variable handling -g # GNU regex mode -i FILE # include file (library) -k # CSV mode with header -l LIB # load extension library -M # arbitrary precision math -n # no implicit input loop -N # decimal context for -M -o FILE # pretty-print to file -O # optimize (enable JIT) -p FILE # profile output -P # POSIX mode -r # extended regex (ERE) -s # sandbox mode -S # sandbox + safe mode -t # lint-old compatibility warnings -V # version -W OPT # gawk-style option
Parallel Processing
Use -P or --parallel to enable parallel record processing. Each record is processed independently using rayon work-stealing across all CPU cores.
# process large file in parallel
aw -P '{ complex_computation($0) }' huge_file.txt
# parallel aggregation (thread-safe)
aw -P '{ sum += $1 } END { print sum }' data.txt
Note: Parallel mode may reorder output. Use -P -s for sorted output by record number.
gawk Extensions
awkrs implements many gawk extensions for compatibility:
- BEGINFILE / ENDFILE — run before/after each input file
- nextfile — skip to next input file
- @include — include another awk file
- @namespace — namespace support
- Typed regex —
@/regex/strongly typed regex constants - Indirect function calls —
@func_name() - Two-way pipes —
|&for coprocess communication - Network I/O —
/inet/tcp/...special files - Time functions —
systime(),mktime(),strftime() - Bit operations —
and(),or(),xor(), etc.
Repository & Links
- Source — github.com/MenkeTechnologies/awkrs
- Crate — crates.io/crates/awkrs (
cargo install awkrs) - Rust API docs — docs.rs/awkrs
- Issues — github.com/MenkeTechnologies/awkrs/issues
- Parity tests —
parity/contains test cases comparing awkrs output against gawk.