// AWKRS — AWK IN RUST

awkrs v0.1.39 · Rust-powered · Parallel records · Cranelift JIT · gawk/mawk/nawk compatible

GitHub Issues
// Color scheme

>_AWKRS REFERENCE

A fast AWK implementation written in Rust. Bytecode VM with optional Cranelift JIT, parallel record processing with rayon, and broad CLI compatibility with gawk, mawk, and nawk. Drop-in replacement for text processing pipelines.

Quickstart

Install from crates.io or build from source, then use aw (short) or awkrs:

# install
cargo install awkrs

# from source
git clone https://github.com/MenkeTechnologies/awkrs
cd awkrs && cargo build

# one-liners
aw 'BEGIN { print "hello, world" }'
aw -F: '{ print $1 }' /etc/passwd
aw '{ sum += $1 } END { print sum }' numbers.txt
echo "1 2 3" | aw '{ print $1 + $2 + $3 }'

# field processing
ls -l | aw 'NR > 1 { total += $5 } END { print total }'

# pattern matching
aw '/error/i { print FILENAME ":" NR ":" $0 }' *.log

Full install + usage live in the README.

Why awkrs — Feature Comparison

Feature awkrs gawk mawk nawk
Parallel records
JIT compilationCranelift
Bytecode VM
Unicode supportpartial
CSV mode
Regex backrefs
Time functions
I18N (gettext)
Network I/O
Single binary~8MBpkg~200KBpkg
Memory safetyRustCCC

Overview

  • Parser & compiler — recursive-descent parser producing an AST, compiled to bytecode for the VM. Hot paths can be JIT-compiled via Cranelift.
  • Values — AWK values (string/number/uninitialized) with automatic coercion. Arrays are associative (hash maps).
  • Regex — three-tier engine: Rust regexfancy-regex (backrefs) → pcre2 (advanced).
  • Parallelism-P flag enables parallel record processing via rayon work-stealing.
  • Binary size — ~8MB stripped with LTO.

Built-in Variables

VariableDescription
$0Current input record (entire line)
$1, $2, ...Fields of the current record
NFNumber of fields in current record
NRTotal number of records read so far
FNRRecord number in current file
FILENAMEName of current input file
FSInput field separator (default: space)
RSInput record separator (default: newline)
OFSOutput field separator
ORSOutput record separator
OFMTOutput format for numbers
CONVFMTConversion format for numbers
SUBSEPSubscript separator for arrays
RSTARTStart of match from match()
RLENGTHLength of match from match()
ARGC, ARGVCommand-line argument count and array
ENVIRONEnvironment variables array

Built-in Functions

String

length gsub sub match split substr index sprintf tolower toupper

Math

sin cos atan2 exp log sqrt int rand srand

I/O

print printf getline close fflush system

Time (gawk)

systime mktime strftime

Bit ops (gawk)

and or xor compl lshift rshift

Type (gawk)

typeof isarray

Array (gawk)

asort asorti delete

Regex (gawk)

gensub patsplit

Examples

Field extraction

aw -F: '{ print $1, $3 }' /etc/passwd # username and UID
aw '{ print $NF }' file.txt # last field of each line

Aggregation

aw '{ sum += $1 } END { print sum }' numbers.txt
aw '{ count[$1]++ } END { for (k in count) print k, count[k] }' data.txt

Pattern matching

aw '/^#/ { next } { print }' config.txt # skip comments
aw 'NR == 1 || /error/' log.txt # header + error lines

Text transformation

aw '{ gsub(/foo/, "bar"); print }' file.txt
aw 'BEGIN { OFS="," } { $1=$1; print }' file.txt # to CSV

Multi-file processing

aw 'FNR == 1 { print "--- " FILENAME " ---" } { print }' *.txt

CLI Flags

-f FILE            # read program from file
-F FS              # set field separator
-v VAR=VAL         # set variable before execution
-b                 # binary mode (no UTF-8)
-c                 # CSV mode
-d                 # debug: dump variables
-e PROG            # program text (multiple allowed)
-E FILE            # like -f, but different variable handling
-g                 # GNU regex mode
-i FILE            # include file (library)
-k                 # CSV mode with header
-l LIB             # load extension library
-M                 # arbitrary precision math
-n                 # no implicit input loop
-N                 # decimal context for -M
-o FILE            # pretty-print to file
-O                 # optimize (enable JIT)
-p FILE            # profile output
-P                 # POSIX mode
-r                 # extended regex (ERE)
-s                 # sandbox mode
-S                 # sandbox + safe mode
-t                 # lint-old compatibility warnings
-V                 # version
-W OPT             # gawk-style option

Parallel Processing

Use -P or --parallel to enable parallel record processing. Each record is processed independently using rayon work-stealing across all CPU cores.

# process large file in parallel
aw -P '{ complex_computation($0) }' huge_file.txt

# parallel aggregation (thread-safe)
aw -P '{ sum += $1 } END { print sum }' data.txt

Note: Parallel mode may reorder output. Use -P -s for sorted output by record number.

gawk Extensions

awkrs implements many gawk extensions for compatibility:

  • BEGINFILE / ENDFILE — run before/after each input file
  • nextfile — skip to next input file
  • @include — include another awk file
  • @namespace — namespace support
  • Typed regex@/regex/ strongly typed regex constants
  • Indirect function calls@func_name()
  • Two-way pipes|& for coprocess communication
  • Network I/O/inet/tcp/... special files
  • Time functionssystime(), mktime(), strftime()
  • Bit operationsand(), or(), xor(), etc.

Repository & Links