// FUSEVM — ENGINEERING REPORT

Language-agnostic bytecode VM · Fused superinstructions · Cranelift 0.130 three-tier JIT (linear / block / tracing with side-exits + frame materialization · auto-dispatched from VM::run())

>_EXECUTIVE SUMMARY

fusevm is a language-agnostic bytecode virtual machine written in Rust. Any frontend compiles to the same 224-variant Op enum and gets fused hot-loop dispatch, extension opcode tables, stack+slot execution, and an optional three-tier Cranelift JIT — for free. Tier 1 is a straight-line linear JIT (compile on first call). Tier 2 is a block-level JIT over the chunk's CFG (warmup threshold 10). Tier 3 is a tracing JIT (loop-header threshold 50) with full side-exit machinery: cross-call inlining (depth ≤ 4), caller- and callee-frame branches, frame materialization on deopt, abstract-stack reconstruction (Int + Float), per-trace side-exit counters with auto-blacklist, persistent TraceMetadata export/import, and side-trace stitching from hot deopt sites. Auto-dispatched from VM::run() when tracing is enabled — the interpreter and the JIT are one execution path, not two. 17,304 production Rust lines + 7,514 #[test] functions + 7,283 integration tests + 231 inline tests + 8 fused superinstructions + 29 first-class shell ops + 140 shell builtin IDs — one shared engine, three live frontends.

17,304
Production Rust
123,238
Test Lines
201
Opcodes
7,514
#[test] Functions
3
JIT Tiers
8
Fused Superinstructions
29
Shell Ops
140
Shell Builtin IDs
3
Live Frontends
11
Direct Deps

Source Distribution — 131,654 total lines

17,304 production / 123,238 tests / 1,789 benches · 7.3% production

Production: 10 files under src/. Tests: 52 integration modules under tests/ (7,283 #[test] fns) plus 231 inline #[cfg(test)] fns in src/. Benches: 5 Criterion harnesses under benches/. Test-to-production ratio: 7.5× — every production line is shadowed by >7× its weight in test code.


~SCALE & POSITION

Reference comparison against other embeddable bytecode VMs and managed-language runtimes. fusevm is intentionally narrower than the others: it ships no parser, no GC, no stdlib — only the dispatch loop, the JIT bridge, and the extension hooks. The frontends layer everything else on top. The compactness is the point: a VM you can read end-to-end in an afternoon.

VM Language Core source Native JIT Embeddable Multi-frontend
fusevm Rust 17,304 (10 files) Cranelift 0.130 (3-tier) crate (cargo add fusevm) yes — 3 live
Lua 5.4 C ~13,000 no (LuaJIT separate) yes (libluacore) single-frontend
LuaJIT C + asm ~85,000 tracing yes single-frontend
QuickJS C ~70,000 no yes single-frontend (JS)
Wren C ~9,000 no yes single-frontend
Wasmtime (Cranelift) Rust ~300,000 Cranelift yes wasm only
CPython ceval C ~12,000 (ceval.c) no (3.13 experimental) libpython single-frontend
Perl 5 pp_* C ~50,000 (pp*.c) no libperl single-frontend

Multi-Frontend by Design

Every other entry in the table above grew its VM as the runtime for exactly one language. fusevm inverts the relationship: the Op enum is the spec, frontends register language-specific ops through Extended(u16, u8) + ExtendedWide(u16, usize) against a handler table. Three frontends ship today — strykelang (~450 ext ops), zshrs (~20 ext ops), awkrs (~95 ext ops) — and they don't conflict.

By Per-File Density

The whole VM is 10 files. jit.rs at 6,925 lines hosts all three JIT tiers + deopt machinery + side-trace stitching. vm.rs at 4,653 lines is the entire match-dispatch interpreter including frame management, builtin dispatch, and host routing. The dispatch core is one match over Op.

By Test Surface

7,283 integration tests in tests/ + 231 inline tests in src/ = 7,514 #[test] functions against 17,304 production lines. tests/jit_trace.rs alone is 1,948 lines pinning the tracing-JIT recorder, deopt path, frame materialization, side-trace stitching, and the persistent-metadata round-trip. A separate differential-fuzz harness (tests/jit_fuzz.rs) generates random valid bytecode and asserts interpreter and tracing-JIT produce identical results on every chunk.

Op Density

201 universal opcodes — arithmetic, comparison, control flow, scope, I/O, collections, higher-order blocks, fused superinstructions, builtins, extension points, plus 29 first-class shell ops promoted out of the extension space because multiple frontends need them (pipelines, redirects, here-docs, glob, file tests, traps, parameter expansion, regex / glob match, scoped redirection blocks). Every variant is ≤ 24 bytes for cache-friendly dispatch.


#SUBSYSTEM BREAKDOWN

Production source partitioned by role. JIT dominates at 41.7% — the tracing tier alone carries cross-call inlining (depth ≤ 4), caller / callee-frame side-exits, frame + abstract-stack materialization, side-exit auto-blacklist, persistent metadata, and side-trace stitching. The interpreter sits at 29.4%. Everything else is bookkeeping: opcode definitions, the value enum, chunk encoding, the host trait, and the builtin-ID table.

SubsystemFileLines%ShareDescription
JIT (Cranelift)src/jit.rs6,92542.1%
Three-tier compiler: compile_linear (straight-line, instant), compile_block (whole-chunk CFG, threshold 10), compile_trace (hot-loop body, threshold 50). Tracing covers Phases 1–9: loop bodies, cross-call inlining, caller- and callee-frame branches with side-exits, frame materialization (DeoptFrame), abstract-stack reconstruction (STACK_KIND_INT / FLOAT), per-trace side-exit counter with auto-blacklist (default cap 50), TraceMetadata export / import, bounded recursion (depth ≤ 4), side-trace stitching (cap 4). TraceJitConfig exposes every threshold to callers.
Interpreter VMsrc/vm.rs4,65328.3%
Match-dispatch loop over Op. Stack + frame slots, builtin handler table, extension handler table (narrow + wide), shell-host routing, VMPool for VM reuse (avoids per-script allocator churn). enable_tracing_jit() wires the tracing tier into VM::run(); auto-dispatch from the interpreter loop on hot backedges. ~195 Op::* arms in the dispatch match.

Shell Builtins (IDs)src/shell_builtins.rs1,0726.5%
140 stable BUILTIN_*: u16 constants partitioned into ranges: 0–19 core (cd, pwd, echo, print, printf, export, unset, source, exit, return, true, false, test, :, .), 20–29 typeset (local, declare, typeset, readonly, integer, float), 30–39 I/O (read, mapfile), 40–49 loop control (break, continue), plus the rest. Frontend registers handlers against these stable IDs via VM::register_builtin(id, handler).
Op Enumsrc/op.rs1,1557.0%
201 variants in 20 sections: Constants, Stack, Variables, Arrays, Hashes, Arithmetic, String, Comparison (numeric), Comparison (string), Logical / Bitwise, Control flow, Functions, Scope, I/O, Collections, Higher-order, Fused superinstructions, Builtins, Extension point, Shell ops. file_test / redirect_op / param_mod constant modules for sub-byte operand encoding. Manual Hash impl over discriminants + payload bytes.
awk Hostsrc/awk_host.rs7804.7%
awkrs-specific host bindings: integrates ShellHost contract with awk semantics (field state, NR/NF, RS/FS, getline plumbing). Used by awkrs's fusevm bridge for the offloaded numeric-chunk path.
Value Systemsrc/value.rs6303.8%
10-variant enum: Undef, Bool, Int(i64), Float(f64), Str(Arc<String>), Array, Hash, Status(i32), Ref, NativeFn(u16). Arc'd strings for cheap closure clone. Coercion API (to_int, to_float, to_str, as_str_cow, is_truthy) keeps the dispatch loop allocation-light.
Chunk + Buildersrc/chunk.rs4572.8%
The compilation unit. Chunk holds the op array, constant pool, name pool, line-number table, slot count, block-range table, and sub-chunk table. ChunkBuilder emits ops one at a time, resolves forward jumps with patch_jump, and finalizes via build(). Serde-serializable for ahead-of-time bytecode caching.
Shell Host Traitsrc/host.rs4202.6%
trait ShellHost: Send with ~25 methods covering everything the VM can't do itself: glob, tilde / brace / word / parameter expansion, command + process substitution, redirects, here-docs, here-strings, pipelines, subshells, traps, scoped redirection blocks, function call dispatch, exec / exec_bg, regex / glob match. DefaultHost ships sensible no-op defaults so a frontend without shell ambitions doesn't have to implement them.
awk Builtinssrc/awk_builtins.rs2691.6%
awkrs-specific builtin handlers wired through fusevm's builtin-ID table for the offloaded JIT path.
Public API Roofsrc/lib.rs710.4%
Module declarations + the public re-export set: Chunk, ChunkBuilder, DefaultHost, ShellHost, Op, Value, VM, VMPool, VMResult, Frame, plus the JIT surface (JitCompiler, JitExtension, NativeCode, SlotKind, TraceJitConfig, TraceLookup, TraceMetadata, DeoptFrame, DeoptInfo).
Teststests/*.rs + src/*.rs123,238
52 integration modules (7,283 #[test] fns) + inline #[cfg(test)] in src/ (231 fns) = 7,514 total #[test] functions. Differential-fuzz harness (tests/jit_fuzz.rs) compares interpreter and tracing JIT op-by-op on randomized chunks. The 123,238-line test corpus is excluded from the 17,304 production total.
Benchesbenches/*.rs1,789
5 Criterion harnesses: vm_bench (560), classic (471), jit_vs_interp (247, requires jit), jit_trace (216, requires jit), jit_crossover (136, requires jit). HTML reports via Criterion's built-in renderer.
PRODUCTION TOTAL17,304100%Tests and benches counted separately (123,238 + 1,789 lines, 132,715 total).

$TOP TEST MODULES

The integration-test corpus partitioned by file. JIT-tracing tests are the single biggest module (1,948 lines, 56 #[test] fns); the rest splits between VM behavior, host routing, shell-op dispatch, and op-by-op exhaustive coverage. Every fused superinstruction has dedicated coverage in fused_ops.rs / slot_and_fused_ops.rs.

FileLines#[test]Role
tests/jit_trace.rs1,94856Tracing JIT: header detection, recorder, deopt, frame materialization, abstract-stack reconstruction, side-trace stitching, TraceMetadata round-trip
tests/vm_integration.rs1,05065End-to-end programs: arithmetic, control flow, function calls, scope, higher-order blocks (MapBlock / GrepBlock / SortBlock / ForEachBlock)
tests/host_ext_and_more_ops.rs90734Host trait method coverage, extension dispatch (narrow + wide), default-host fallthroughs
tests/edge_cases.rs86452Edge cases: empty stack, type coercion at op boundaries, undef propagation, divide-by-zero, out-of-range index, jump-target validation
tests/shell_ops_with_host.rs79845Shell-op dispatch with a real ShellHost: pipelines, redirects, here-docs, command substitution, process substitution, traps
tests/op_exhaustive_and_vm_lifecycle.rs77541Per-op smoke tests + VM lifecycle: new / reset / run / VMPool acquire / release
tests/shell_op_routing.rs66926Routing: which shell ops fall through to the host, which terminate in the VM, how WithRedirectsBegin/End scopes are restored on early return
tests/host_routing_and_reset.rs65053VM↔host boundary: shell-host swap mid-run, reset preserves handlers, extension handler replacement
tests/slot_and_fused_ops.rs58543Slot-indexed fast paths + every fused superinstruction (AccumSumLoop, ConcatConstLoop, PushIntRangeLoop, SlotIncLtIntJumpBack, …)
tests/stack_arith_misc_ops.rs57855Stack manipulation (Dup / Dup2 / Swap / Rot) + arithmetic + miscellaneous ops with full coercion matrix
tests/jumps_ext_builtins_files.rs56746Jump targets, extension dispatch, builtin invocation, file-test ops (12 test types via TestFile(u8))
tests/testfile_builtin_dispatch.rs54241TestFile dispatch matrix: -f, -d, -r, -w, -x, -e, -s, -L, -S, -p, -b, -c
tests/functions_vars_stack.rs54443Call / Return / ReturnValue / PushFrame / PopFrame semantics + slot-vs-name lookup precedence
tests/collections_and_concat.rs54341Array / hash construction (MakeArray / MakeHash), Range / RangeStep, Concat / StringRepeat
tests/jit_fuzz.rsdiffDifferential fuzz: random valid chunks compared interpreter vs tracing JIT, asserts identical results. Gated behind --features jit. Catches latent recorder / deopt / IR bugs that curated tests miss.
TOP 15 MODULES SUBTOTAL11,0129.2% of 123,238-line test corpus

@EXECUTION PIPELINE

Frontend → ChunkBuilderChunkVM::run(). The interpreter is the spine. When the jit feature is enabled and tracing is turned on, hot backedges auto-dispatch into Cranelift-compiled native code; type-guard misses deopt back to the same bytecode offset with frame + stack state reconstructed.

  Frontend source (.stk / .zsh / .awk)
       │
       ▼
  ┌─────────────────────────┐
  │   Frontend compiler     │   stryke: ~450 ext ops
  │   (lexer → parser →     │   zshrs:  ~20  ext ops
  │   AST → bytecode)       │   awkrs:  ~95  ext ops
  └────────┬────────────────┘
           │  b.emit(Op, line)
           ▼
  ┌─────────────────────────┐
  │   ChunkBuilder          │   add_constant / add_name
  │   (src/chunk.rs)        │   add_block_range / add_sub_chunk
  │                         │   patch_jump / build()
  └────────┬────────────────┘
           │
           ▼
  ┌─────────────────────────┐     ┌─────────────────────────┐
  │   Chunk                 │────▶│   VMPool (optional)     │
  │   • ops:    Vec<Op>    │     │   acquire / release     │
  │   • consts: Vec<Value> │     └─────────────────────────┘
  │   • names:  Vec<String> │
  │   • lines:  Vec<u32>   │
  │   • slots:  usize       │
  │   • blocks: Vec<Range>  │
  │   • subs:   Vec<Chunk>  │
  └────────┬────────────────┘
           │  VM::new(chunk)
           ▼
  ┌─────────────────────────────────────────────────────────┐
  │   VM::run()  (src/vm.rs)                                │
  │   match-dispatch over Op                                │
  │   stack + frame slots                                   │
  │                                                         │
  │   ┌───────────────────────────────────────────────┐     │
  │   │  Extension hook                               │     │
  │   │  Op::Extended(id, arg)      ──▶ ext_handler   │     │
  │   │  Op::ExtendedWide(id, payload) ──▶ wide       │     │
  │   │  Op::CallBuiltin(id, argc)  ──▶ builtin tbl   │     │
  │   └───────────────────────────────────────────────┘     │
  │                                                         │
  │   ┌───────────────────────────────────────────────┐     │
  │   │  Shell-host hook                              │     │
  │   │  Op::Exec / Pipeline / Redirect / Glob /...   │     │
  │   │  ──▶ ShellHost::glob / pipeline_begin / ...   │     │
  │   └───────────────────────────────────────────────┘     │
  │                                                         │
  │   ┌───────────────────────────────────────────────┐     │
  │   │  Tracing-JIT hot-backedge dispatch            │     │
  │   │  (feature = "jit", VM::enable_tracing_jit())  │     │
  │   │  ──▶ try_run_trace ──▶ native fn ptr          │     │
  │   │       │ side-exit (type guard miss)           │     │
  │   │       └──▶ DeoptInfo: resume_ip + frames +    │     │
  │   │            stack-kind tags ──▶ back to match  │     │
  │   └───────────────────────────────────────────────┘     │
  └─────────────────────────────────────────────────────────┘
           │
           ▼
       VMResult::Ok(Value) | Error(String) | Halted
      

&OPCODE INVENTORY

201 variants of Op across 20 sections. Each op is a tagged enum case; payload operands are pool indices (u16, 64k names / constants), jump targets (usize), or sub-byte fields encoded via the file_test / redirect_op / param_mod constant modules. Every variant is ≤ 24 bytes for cache-friendly dispatch.

201
Op Variants
20
Sections
8
Fused Superinstructions
29
Shell Ops
140
Builtin IDs
~195
Dispatch Arms in vm.rs
12
File-Test Predicates
18
Param-Expansion Mods

// OPCODE CATEGORIES

Constants (6)

Nop, LoadInt(i64), LoadFloat(f64), LoadConst(u16), LoadTrue, LoadFalse, LoadUndef

Stack (5)

Pop, Dup, Dup2, Swap, Rot

Variables (7)

GetVar / SetVar / DeclareVar (name-pool indexed), GetSlot / SetSlot (slot-indexed fast path), SlotArrayGet / SlotArraySet (slot-resident array indexing — no extra GetSlot)

Arrays (10)

GetArray, SetArray, DeclareArray, ArrayGet, ArraySet, ArrayPush, ArrayPop, ArrayShift, ArrayLen, MakeArray

Hashes (10)

GetHash, SetHash, DeclareHash, HashGet, HashSet, HashDelete, HashExists, HashKeys, HashValues, MakeHash

Arithmetic (9)

Add, Sub, Mul, Div, Mod, Pow, Negate, Inc, Dec — int / float dispatch with wrapping fast path

String (3)

Concat, StringRepeat, StringLen

Numeric Compare (7)

NumEq, NumNe, NumLt, NumGt, NumLe, NumGe, Spaceship (<=> → -1 / 0 / 1)

String Compare (7)

StrEq, StrNe, StrLt, StrGt, StrLe, StrGe, StrCmp

Logical / Bitwise (9)

LogNot, LogAnd, LogOr, BitAnd, BitOr, BitXor, BitNot, Shl, Shr. LogAnd / LogOr evaluate both sides; short-circuit lives in the JumpIfTrueKeep / FalseKeep pair.

Control Flow (5)

Jump, JumpIfTrue, JumpIfFalse, JumpIfTrueKeep (short-circuit ||), JumpIfFalseKeep (short-circuit &&)

Functions (3)

Call(name_idx, argc), Return, ReturnValue

Scope (2)

PushFrame, PopFrame

I/O (3)

Print(n), PrintLn(n), ReadLine

Collections (2)

Range ([from, to] → array), RangeStep ([from, to, step] → array)

Higher-Order (5)

MapBlock(idx), GrepBlock(idx), SortBlock(idx), SortDefault, ForEachBlock(idx)idx resolves to a block range inside the chunk

Fused Superinstructions (8)

PreIncSlot, SlotLtIntJumpIfFalse, SlotIncLtIntJumpBack, AccumSumLoop, ConcatConstLoop, PushIntRangeLoop, AddAssignSlotVoid, PreIncSlotVoid — see next section

Builtins (1)

CallBuiltin(id: u16, argc: u8) — routes to the handler registered by VM::register_builtin(id, handler). Builtin IDs come from src/shell_builtins.rs (140 stable constants).

Extension Point (2)

Extended(u16, u8) for narrow ops (inline byte operand), ExtendedWide(u16, usize) for wide ops (jump targets, large indices). Frontend registers fn(&mut VM, u16, u8) via set_extension_handler / set_extension_wide_handler.

Shell Ops (29)

Exec, ExecBg, PipelineBegin / Stage / End, Redirect(fd, op), HereDoc, HereString, CmdSubst, SubshellBegin / End, ProcessSubIn / Out, Glob, GlobRecursive, TestFile(u8), SetStatus / GetStatus, TrapSet / TrapCheck, ExpandParam(u8), WordSplit, BraceExpand, TildeExpand, CallFunction(name, argc), StrMatch, RegexMatch, WithRedirectsBegin / End

// FUSED SUPERINSTRUCTIONS

The performance secret. The compiler detects hot loop patterns and emits a single op instead of a multi-op sequence. Each fused op eliminates N−1 dispatch cycles, stack pushes, and branch mispredictions from the hot path.

Fused OpReplacesEffect
AccumSumLoop(sum, i, limit)GetSlot + GetSlot + Add + SetSlot + PreInc + NumLt + JumpIfFalseEntire counted sum loop in one dispatch
SlotIncLtIntJumpBack(slot, limit, target)PreIncSlot + SlotLtIntJumpIfFalseLoop backedge in one dispatch
ConcatConstLoop(const, s, i, limit)LoadConst + ConcatAppendSlot + SlotIncLtIntJumpBackString-append loop in one dispatch
PushIntRangeLoop(arr, i, limit)GetSlot + PushArray + ArrayLen + Pop + SlotIncLtIntJumpBackArray push loop in one dispatch
AddAssignSlotVoid(a, b)GetSlot + GetSlot + Add + SetSlotVoid-context add-assign, no stack traffic
PreIncSlotVoid(slot)GetSlot + Inc + SetSlotVoid-context increment, no stack traffic
SlotLtIntJumpIfFalse(slot, int, target)GetSlot + LoadInt + NumLt + JumpIfFalseFused compare + branch, no stack traffic
PreIncSlot(slot)GetSlot + Inc + SetSlot + GetSlotSlot pre-increment with push

!VALUE SYSTEM

Every value in the VM is a Value. 10-variant enum, designed to stay cache-friendly (small discriminant + 1–2 word payload). Frontends convert their native types to / from Value at the boundary; the dispatch loop only sees this shape.

VariantPayloadUsed For
UndefUninitialized / no value — Default for Value
BoolboolConditionals, [[ ]] tests, StrMatch / RegexMatch results
Inti64Numeric scalars — fast path for all Op::Add / Sub / Mul / Div
Floatf64IEEE-754 scalars, mixed-type arith with promotion
StrArc<String>Heap-allocated string — Arc for cheap clone in closures and across pipeline stages
ArrayVec<Value>Ordered array — in-place mutation on slot-resident arrays via SlotArraySet
HashHashMap<String, Value>Key-value associative array
Statusi32Exit status code (shell-specific but universal enough that every frontend can produce one)
RefBox<Value>Pass-by-reference, nested structures, AST-style sharing
NativeFnu16Native function pointer (builtin dispatch ID) — allows first-class function values without trait objects

Coercion API kept allocation-light: to_int, to_float, to_str (owned String), as_str_cow (borrowed Cow<str> for the hot path where the value is already a string), is_truthy (Perl-style: 0 / "" / "0" / Undef → false), len, is_empty. Constructor shortcuts: Value::int(n), Value::float(f), Value::str(s), Value::bool(b), Value::array(v), Value::hash(m), Value::status(code).


~CRANELIFT JIT — 3 TIERS

All three tiers share Cranelift 0.130 (same IR backend as Wasmtime) behind the jit feature flag. Same Value shape across interpreter and JIT, so deopt is a frame swap, not a re-marshal. TraceJitConfig exposes every threshold; JitCompiler::set_config(…) applies it to subsequent calls from the current thread.

Tier 1 — Linear JIT (instant)

compile_linear(chunk: &Chunk) -> Option<CompiledLinear>. Compiles straight-line bytecode on first call — no warmup, no profile, no CFG. Use case: tiny chunks where any interpreter overhead is the bottleneck. Falls back to interpreter on any unsupported op.

Tier 2 — Block JIT (CFG, threshold 10)

compile_block(chunk: &Chunk) -> Option<CompiledBlock>. Whole-chunk control-flow graph compilation. Triggered after a chunk's hot_count crosses 10 invocations. Better steady-state than linear for non-trivial control flow.

Tier 3 — Tracing JIT (loop body, threshold 50)

Hot-backedge detection: every backward branch is a candidate loop header. When a header's hot_count crosses trace_threshold (default 50), the recorder runs one iteration, captures the linear trace, and lowers it to Cranelift IR with type guards at every op boundary. Side-exits deopt back to the interpreter.

Cross-Call Inlining

Phase 2: tracing inlines through Call for callees that are branchless within the inlined window. Phase 8 bumps it: bounded recursion to depth ≤ 4 (max_inline_recursion). Recursive calls past 4 abort the trace.

Caller- and Callee-Frame Branches

Phase 3: caller-frame if / else with side-exits. Phase 4: callee-frame branches with frame materialization. The trace emits DeoptFrame records (caller→callee order) for every inlined frame; on side-exit the VM rebuilds vm.frames to match what the bytecode would naturally have at the deopt IP. Capacity: MAX_DEOPT_FRAMES = 4, MAX_DEOPT_SLOTS_PER_FRAME = 16.

Abstract-Stack Reconstruction

Phases 5 + 5b: the trace tracks the abstract value stack and writes (kind, value) pairs into DeoptInfo.stack_buf on side-exit. Capacity: MAX_DEOPT_STACK = 32 entries. Tags: STACK_KIND_INT (0) for Value::Int(i64), STACK_KIND_FLOAT (1) for Value::Float(f64). The VM pushes them onto the live stack before resuming at resume_ip.

Side-Exit Counter + Auto-Blacklist

Phase 6: every side-exit bumps entry.side_exit_count. When it crosses max_side_exits (default 50) the trace is auto-blacklisted — future invocations skip it and stay in the interpreter. Prevents pathological retry loops.

Persistent TraceMetadata

Phase 7: JitCompiler::export_trace_metadata() serializes all known traces; import_trace_metadata(…) warms a fresh compiler. Embedders that re-run the same script repeatedly skip the recorder warm-up after the first run.

Side-Trace Stitching

Phase 9: when a side-exit fires often enough to qualify as its own hot site, the JIT records a side trace starting from that deopt IP and stitches it to the parent. max_trace_chain = 4 caps the chain depth.

Trace-Length Cap

max_trace_len = 256 (default). Recording aborts past 256 ops — long traces underperform shorter, retypeable ones. Tunable per workload.


%EXTENSION MECHANISM

Universal ops live in the Op enum. Language-specific ops are dispatched through frontend-registered handler tables. Each frontend owns its own ID space — stryke's op 42 and zshrs's op 42 don't collide because the handlers run in different VM instances.

Narrow: Extended(u16, u8)

16-bit op ID + 8-bit inline operand. Common case: a frontend op that fits in one byte of payload (flag bit, enum tag, small index). Registered via VM::set_extension_handler(Box::new(|vm, id, arg| {…})).

Wide: ExtendedWide(u16, usize)

16-bit op ID + usize payload. For jump targets, large indices, or anything that won't fit in a byte. Registered via VM::set_extension_wide_handler(…).

Builtin Dispatch: CallBuiltin(u16, u8)

Universal call into a registered builtin by stable u16 ID. Frontend registers handlers via VM::register_builtin(id, handler). IDs come from shell_builtins (140 reserved constants) or the frontend's own space — the table is per-VM, no global registry.

Shell Host Dispatch

Shell ops in the Op enum route to a Box<dyn ShellHost> set via VM::set_shell_host(…). DefaultHost ships sensible no-ops for frontends that need shell-op syntax (regex match, file tests) without needing real process control.

Builtin ID Ranges

Conventional partitioning in shell_builtins.rs: 0–19 core (cd, pwd, echo, …), 20–29 typeset, 30–39 I/O, 40–49 loop control, 50+ frontend-specific. Frontends can claim unused slots without coordinating — the type system enforces nothing, but the comment-banded ranges keep the convention readable.

Hooks Without Wrappers

Every extension hook is a Box<dyn Fn(&mut VM, …)>. No newtype, no trait object hierarchy. The frontend writes one closure per op or one big match — both shapes are equally fast under the dispatch loop.


^SHELL HOST TRAIT

ShellHost: Send — the boundary between the VM and the host's process-control surface. Every shell op routes through one of these methods; DefaultHost ships no-op defaults so a non-shell frontend can ignore them.

Expansion

glob(pattern, recursive), tilde_expand(s), brace_expand(s), word_split(s), expand_param(name, modifier, args) (18 modifier types via param_mod), array_index(name, idx)

Substitution

cmd_subst(sub: &Chunk) -> String, process_sub_in(sub) -> String (returns FIFO path), process_sub_out(sub) -> String

Redirection

redirect(fd, op, target) (9 op types via redirect_op), heredoc(content), herestring(content), with_redirects_begin(count), with_redirects_end() — scoped redirection blocks restore fd state on early return

Pipelines + Subshells

pipeline_begin(n), pipeline_stage(), pipeline_end() -> i32, subshell_begin(), subshell_end() -> Option<i32> (Some(status) propagates a deferred subshell exit into the parent VM’s last_status)

Traps

trap_set(sig, handler: &Chunk), trap_check() — the compiler inserts TrapCheck between ops; the host decides which signals deliver and runs the registered Chunk

Execution

call_function(name, args) -> Option<i32> (user-defined function lookup), exec(args) -> i32, exec_bg(args) -> i32

Matching

str_match(s, pat) -> bool (glob-pattern match for [[ x = pat ]] and case arms), regex_match(s, regex) -> bool (=~)


*BENCHMARKS

5 Criterion harnesses under benches/. Two are interpreter-only (run without features); three require --features jit. HTML reports via Criterion's built-in renderer (target/criterion/report/index.html).

BenchLOCRequiresMeasures
benches/vm_bench.rs560Core interpreter throughput: arithmetic, control flow, function calls, scope, collections — the baseline every JIT tier is measured against
benches/classic.rs471Classic interpreter workloads: fibonacci, sum-N, array push loops, string concat loops — the inputs every fused superinstruction was designed for
benches/jit_vs_interp.rs247jitHead-to-head: same chunk through pure interpreter vs JIT-enabled VM. Measures the speedup the JIT actually delivers, per workload
benches/jit_trace.rs216jitTracing-specific: hot-loop trace latency, deopt cost, side-trace stitching overhead, TraceMetadata import speedup
benches/jit_crossover.rs136jitCrossover: the chunk size / hot-count where JIT compile + execution starts to beat pure interpretation. Calibrates the trace_threshold default.
TOTAL1,789Run with cargo bench (interpreter benches) or cargo bench --features jit (all)

+DEPENDENCIES

Intentionally minimal. 3 always-on runtime dependencies + 5 optional Cranelift crates (gated behind jit) + 3 dev dependencies. Every crate is foundational — serde, tracing, glob, the Cranelift family, criterion — chosen to survive a 2030+ rebuild without churn.

CrateVersionRoleGating
serde1Derive macros (derive, rc) for Op, Value, Chunk — enables bytecode caching and TraceMetadata export/importalways
tracing0.1Structured logging for diagnostic events — tracing::debug! at every JIT compile, deopt, and side-exitalways
glob0.3Shell-glob pattern matching for DefaultHost::glob and StrMatchalways
cranelift-jit0.130Cranelift JIT memory allocator + symbol resolverjit
cranelift-codegen0.130IR → machine code (x86-64 + aarch64)jit
cranelift-frontend0.130FunctionBuilder — builds IR from bytecodejit
cranelift-native0.130ISA target detection at runtimejit
cranelift-module0.130Module + linker abstraction for JIT-emitted codejit
serde_json1Round-trip tests for Chunk / Op / Value serializationdev
bincode1Compact binary serialization round-trip testsdev
criterion0.5Statistical benchmarking + HTML reportsdev

No rand, no tokio, no parking_lot, no regex (frontends bring their own), no libc — the VM core stays pure Rust. JIT is a single feature flag; interpreter-only builds skip the entire Cranelift toolchain (~1M LOC of transitive C / Rust deps).


;PUBLIC API SURFACE

fusevm ships as an embeddable Rust crate (cargo add fusevm, optionally --features jit). The re-export set in src/lib.rs is the entire public surface; everything else is implementation detail.

SurfaceCountNotes
Public modules9awk_builtins, awk_host, chunk, host, jit, op, shell_builtins, value, vm
Re-exported types19Chunk, ChunkBuilder, DefaultHost, ShellHost, Op, Value, VM, VMPool, VMResult, Frame, JitCompiler, JitExtension, NativeCode, SlotKind, TraceJitConfig, TraceLookup, TraceMetadata, DeoptFrame, DeoptInfo
Public functions (across modules)124Includes constructors, builders, JIT entry points, host trait methods. jit.rs alone exposes 66 pub fn.
Builtin ID constants140Stable BUILTIN_*: u16 values in shell_builtins.rs — frontends register handlers against these
Op variants201The contract every frontend compiles against
Feature flags2jit — enables the entire Cranelift family + the three JIT tiers; jit-disk-cache — persists compiled native code to ~/.cache/fusevm-jit so codegen is skipped across process restarts (implies jit, on by default once enabled)

?KEY DESIGN DECISIONS

Why fusevm looks the way it does. Each call-out is a decision the implementation could have gone either way on, with the rationale for the path taken.

Language-Agnostic by Construction

Every other embeddable VM in the reference table grew its bytecode as the runtime for one language. fusevm starts from the opposite end: the Op enum is the spec; the frontends register against it. The result is three live frontends (stryke / zshrs / awkrs) sharing one JIT, one fused-loop table, one deopt path. Every perf improvement compounds across all of them.

Bytecode, Not Tree-Walker

An AST interpreter is simpler but pays virtual-call cost per node. Bytecode collapses dispatch into a tight match loop the branch predictor can warm up to. Same call-out as strykelang — same execution model.

Cranelift, Not LLVM

LLVM gives slightly better steady-state code but compiles 10× slower and pulls in a giant C++ dependency. Cranelift is pure Rust, fast to compile, and production-tested in Wasmtime. Same backend as Wasmtime is itself a durability argument: bug-fixed against a huge surface area.

Three Tiers, Auto-Dispatched

Linear catches the "first call, want native immediately" case. Block catches the "warm chunk, want CFG-aware code" case. Tracing catches the "tight inner loop, want type-specialized native" case. Auto-dispatched from VM::run(): callers don't choose a tier; the VM picks based on observed hot-counts and falls through tiers as warm-up data accumulates.

Fused Superinstructions Over Generic Inliner

The fused-op table (AccumSumLoop, ConcatConstLoop, PushIntRangeLoop, …) is hand-curated against measured hot patterns. A general inliner would catch more but cost more compile time and code-cache pressure. Eight fused ops absorb the dominant hot loops every frontend produces; the rest stays in the dispatch loop.

Side-Exits Over Recompile-On-Type-Drift

The tracing JIT chooses to deopt on type-guard miss rather than recompile with a new shape. Recompile pays per-shape compile time forever; deopt pays once per anomalous iteration and goes back to the interpreter. With max_side_exits as a backstop, the trace either stabilizes or auto-blacklists.

Persistent TraceMetadata + native disk cache

For embedders that run the same script repeatedly (CI scripts, REPL re-evals), recorder warmup is wasted on every run. export_trace_metadata / import_trace_metadata serializes the recorded set so cold start picks up where warm shutdown left off. The jit-disk-cache feature goes further: it persists the finished native code for all three tiers to ~/.cache/fusevm-jit (on by default once enabled), so restarts skip Cranelift codegen entirely — a cached block load is ~35 µs vs ~152 µs cold.

Pool, Not Allocator Trick

VMPool reuses VM instances across script runs so the allocator isn't asked to rebuild the same frame buffers, slot vectors, and handler tables on every invocation. No jemalloc, no mimalloc, no unsafe — a plain pool that frontends acquire / release per script.

Shell Ops as First-Class Variants

Pipelines, redirects, here-docs, glob, file tests are common enough across stryke / zshrs / awkrs to live in the universal Op enum rather than as ext ops. The cost: extra variants in the match loop. The win: every frontend gets them with zero registration, and the JIT can specialize them.

Zero Runtime Deps Beyond Cranelift

Three always-on crates: serde, tracing, glob. Cranelift is opt-in via the jit feature. No regex, no tokio, no parking_lot. Frontends bring their own. Result: interpreter-only fusevm builds in seconds with a 4-crate dep graph.