>_EXECUTIVE SUMMARY
fusevm is a language-agnostic bytecode virtual machine written in Rust. Any frontend compiles to the same 224-variant Op enum and gets fused hot-loop dispatch, extension opcode tables, stack+slot execution, and an optional three-tier Cranelift JIT — for free. Tier 1 is a straight-line linear JIT (compile on first call). Tier 2 is a block-level JIT over the chunk's CFG (warmup threshold 10). Tier 3 is a tracing JIT (loop-header threshold 50) with full side-exit machinery: cross-call inlining (depth ≤ 4), caller- and callee-frame branches, frame materialization on deopt, abstract-stack reconstruction (Int + Float), per-trace side-exit counters with auto-blacklist, persistent TraceMetadata export/import, and side-trace stitching from hot deopt sites. Auto-dispatched from VM::run() when tracing is enabled — the interpreter and the JIT are one execution path, not two. 17,304 production Rust lines + 7,514 #[test] functions + 7,283 integration tests + 231 inline tests + 8 fused superinstructions + 29 first-class shell ops + 140 shell builtin IDs — one shared engine, three live frontends.
Source Distribution — 131,654 total lines
Production: 10 files under src/. Tests: 52 integration modules under tests/ (7,283 #[test] fns) plus 231 inline #[cfg(test)] fns in src/. Benches: 5 Criterion harnesses under benches/. Test-to-production ratio: 7.5× — every production line is shadowed by >7× its weight in test code.
~SCALE & POSITION
Reference comparison against other embeddable bytecode VMs and managed-language runtimes. fusevm is intentionally narrower than the others: it ships no parser, no GC, no stdlib — only the dispatch loop, the JIT bridge, and the extension hooks. The frontends layer everything else on top. The compactness is the point: a VM you can read end-to-end in an afternoon.
| VM | Language | Core source | Native JIT | Embeddable | Multi-frontend |
|---|---|---|---|---|---|
| fusevm | Rust | 17,304 (10 files) | Cranelift 0.130 (3-tier) | crate (cargo add fusevm) |
yes — 3 live |
| Lua 5.4 | C | ~13,000 | no (LuaJIT separate) | yes (libluacore) | single-frontend |
| LuaJIT | C + asm | ~85,000 | tracing | yes | single-frontend |
| QuickJS | C | ~70,000 | no | yes | single-frontend (JS) |
| Wren | C | ~9,000 | no | yes | single-frontend |
| Wasmtime (Cranelift) | Rust | ~300,000 | Cranelift | yes | wasm only |
| CPython ceval | C | ~12,000 (ceval.c) | no (3.13 experimental) | libpython | single-frontend |
| Perl 5 pp_* | C | ~50,000 (pp*.c) | no | libperl | single-frontend |
Multi-Frontend by Design
Every other entry in the table above grew its VM as the runtime for exactly one language. fusevm inverts the relationship: the Op enum is the spec, frontends register language-specific ops through Extended(u16, u8) + ExtendedWide(u16, usize) against a handler table. Three frontends ship today — strykelang (~450 ext ops), zshrs (~20 ext ops), awkrs (~95 ext ops) — and they don't conflict.
By Per-File Density
The whole VM is 10 files. jit.rs at 6,925 lines hosts all three JIT tiers + deopt machinery + side-trace stitching. vm.rs at 4,653 lines is the entire match-dispatch interpreter including frame management, builtin dispatch, and host routing. The dispatch core is one match over Op.
By Test Surface
7,283 integration tests in tests/ + 231 inline tests in src/ = 7,514 #[test] functions against 17,304 production lines. tests/jit_trace.rs alone is 1,948 lines pinning the tracing-JIT recorder, deopt path, frame materialization, side-trace stitching, and the persistent-metadata round-trip. A separate differential-fuzz harness (tests/jit_fuzz.rs) generates random valid bytecode and asserts interpreter and tracing-JIT produce identical results on every chunk.
Op Density
201 universal opcodes — arithmetic, comparison, control flow, scope, I/O, collections, higher-order blocks, fused superinstructions, builtins, extension points, plus 29 first-class shell ops promoted out of the extension space because multiple frontends need them (pipelines, redirects, here-docs, glob, file tests, traps, parameter expansion, regex / glob match, scoped redirection blocks). Every variant is ≤ 24 bytes for cache-friendly dispatch.
#SUBSYSTEM BREAKDOWN
Production source partitioned by role. JIT dominates at 41.7% — the tracing tier alone carries cross-call inlining (depth ≤ 4), caller / callee-frame side-exits, frame + abstract-stack materialization, side-exit auto-blacklist, persistent metadata, and side-trace stitching. The interpreter sits at 29.4%. Everything else is bookkeeping: opcode definitions, the value enum, chunk encoding, the host trait, and the builtin-ID table.
| Subsystem | File | Lines | % | Share | Description |
|---|---|---|---|---|---|
| JIT (Cranelift) | src/jit.rs | 6,925 | 42.1% | Three-tier compiler: compile_linear (straight-line, instant), compile_block (whole-chunk CFG, threshold 10), compile_trace (hot-loop body, threshold 50). Tracing covers Phases 1–9: loop bodies, cross-call inlining, caller- and callee-frame branches with side-exits, frame materialization (DeoptFrame), abstract-stack reconstruction (STACK_KIND_INT / FLOAT), per-trace side-exit counter with auto-blacklist (default cap 50), TraceMetadata export / import, bounded recursion (depth ≤ 4), side-trace stitching (cap 4). TraceJitConfig exposes every threshold to callers. | |
| Interpreter VM | src/vm.rs | 4,653 | 28.3% | Match-dispatch loop over Op. Stack + frame slots, builtin handler table, extension handler table (narrow + wide), shell-host routing, VMPool for VM reuse (avoids per-script allocator churn). enable_tracing_jit() wires the tracing tier into VM::run(); auto-dispatch from the interpreter loop on hot backedges. ~195 | |
| Shell Builtins (IDs) | src/shell_builtins.rs | 1,072 | 6.5% | 140 stable BUILTIN_*: u16 constants partitioned into ranges: 0–19 core (cd, pwd, echo, print, printf, export, unset, source, exit, return, true, false, test, :, .), 20–29 typeset (local, declare, typeset, readonly, integer, float), 30–39 I/O (read, mapfile), 40–49 loop control (break, continue), plus the rest. Frontend registers handlers against these stable IDs via VM::register_builtin(id, handler). | |
| Op Enum | src/op.rs | 1,155 | 7.0% | 201 variants in 20 sections: Constants, Stack, Variables, Arrays, Hashes, Arithmetic, String, Comparison (numeric), Comparison (string), Logical / Bitwise, Control flow, Functions, Scope, I/O, Collections, Higher-order, Fused superinstructions, Builtins, Extension point, Shell ops. file_test / redirect_op / param_mod constant modules for sub-byte operand encoding. Manual Hash impl over discriminants + payload bytes. | |
| awk Host | src/awk_host.rs | 780 | 4.7% | awkrs-specific host bindings: integrates ShellHost contract with awk semantics (field state, NR/NF, RS/FS, getline plumbing). Used by awkrs's fusevm bridge for the offloaded numeric-chunk path. | |
| Value System | src/value.rs | 630 | 3.8% | 10-variant enum: Undef, Bool, Int(i64), Float(f64), Str(Arc<String>), Array, Hash, Status(i32), Ref, NativeFn(u16). Arc'd strings for cheap closure clone. Coercion API (to_int, to_float, to_str, as_str_cow, is_truthy) keeps the dispatch loop allocation-light. | |
| Chunk + Builder | src/chunk.rs | 457 | 2.8% | The compilation unit. Chunk holds the op array, constant pool, name pool, line-number table, slot count, block-range table, and sub-chunk table. ChunkBuilder emits ops one at a time, resolves forward jumps with patch_jump, and finalizes via build(). Serde-serializable for ahead-of-time bytecode caching. | |
| Shell Host Trait | src/host.rs | 420 | 2.6% | trait ShellHost: Send with ~25 methods covering everything the VM can't do itself: glob, tilde / brace / word / parameter expansion, command + process substitution, redirects, here-docs, here-strings, pipelines, subshells, traps, scoped redirection blocks, function call dispatch, exec / exec_bg, regex / glob match. DefaultHost ships sensible no-op defaults so a frontend without shell ambitions doesn't have to implement them. | |
| awk Builtins | src/awk_builtins.rs | 269 | 1.6% | awkrs-specific builtin handlers wired through fusevm's builtin-ID table for the offloaded JIT path. | |
| Public API Roof | src/lib.rs | 71 | 0.4% | Module declarations + the public re-export set: Chunk, ChunkBuilder, DefaultHost, ShellHost, Op, Value, VM, VMPool, VMResult, Frame, plus the JIT surface (JitCompiler, JitExtension, NativeCode, SlotKind, TraceJitConfig, TraceLookup, TraceMetadata, DeoptFrame, DeoptInfo). | |
| Tests | tests/*.rs + src/*.rs | 123,238 | — | 52 integration modules (7,283 #[test] fns) + inline #[cfg(test)] in src/ (231 fns) = 7,514 total #[test] functions. Differential-fuzz harness (tests/jit_fuzz.rs) compares interpreter and tracing JIT op-by-op on randomized chunks. The 123,238-line test corpus is excluded from the 17,304 production total. | |
| Benches | benches/*.rs | 1,789 | — | 5 Criterion harnesses: vm_bench (560), classic (471), jit_vs_interp (247, requires jit), jit_trace (216, requires jit), jit_crossover (136, requires jit). HTML reports via Criterion's built-in renderer. | |
| PRODUCTION TOTAL | 17,304 | 100% | Tests and benches counted separately (123,238 + 1,789 lines, 132,715 total). | ||
$TOP TEST MODULES
The integration-test corpus partitioned by file. JIT-tracing tests are the single biggest module (1,948 lines, 56 #[test] fns); the rest splits between VM behavior, host routing, shell-op dispatch, and op-by-op exhaustive coverage. Every fused superinstruction has dedicated coverage in fused_ops.rs / slot_and_fused_ops.rs.
| File | Lines | #[test] | Role |
|---|---|---|---|
| tests/jit_trace.rs | 1,948 | 56 | Tracing JIT: header detection, recorder, deopt, frame materialization, abstract-stack reconstruction, side-trace stitching, TraceMetadata round-trip |
| tests/vm_integration.rs | 1,050 | 65 | End-to-end programs: arithmetic, control flow, function calls, scope, higher-order blocks (MapBlock / GrepBlock / SortBlock / ForEachBlock) |
| tests/host_ext_and_more_ops.rs | 907 | 34 | Host trait method coverage, extension dispatch (narrow + wide), default-host fallthroughs |
| tests/edge_cases.rs | 864 | 52 | Edge cases: empty stack, type coercion at op boundaries, undef propagation, divide-by-zero, out-of-range index, jump-target validation |
| tests/shell_ops_with_host.rs | 798 | 45 | Shell-op dispatch with a real ShellHost: pipelines, redirects, here-docs, command substitution, process substitution, traps |
| tests/op_exhaustive_and_vm_lifecycle.rs | 775 | 41 | Per-op smoke tests + VM lifecycle: new / reset / run / VMPool acquire / release |
| tests/shell_op_routing.rs | 669 | 26 | Routing: which shell ops fall through to the host, which terminate in the VM, how WithRedirectsBegin/End scopes are restored on early return |
| tests/host_routing_and_reset.rs | 650 | 53 | VM↔host boundary: shell-host swap mid-run, reset preserves handlers, extension handler replacement |
| tests/slot_and_fused_ops.rs | 585 | 43 | Slot-indexed fast paths + every fused superinstruction (AccumSumLoop, ConcatConstLoop, PushIntRangeLoop, SlotIncLtIntJumpBack, …) |
| tests/stack_arith_misc_ops.rs | 578 | 55 | Stack manipulation (Dup / Dup2 / Swap / Rot) + arithmetic + miscellaneous ops with full coercion matrix |
| tests/jumps_ext_builtins_files.rs | 567 | 46 | Jump targets, extension dispatch, builtin invocation, file-test ops (12 test types via TestFile(u8)) |
| tests/testfile_builtin_dispatch.rs | 542 | 41 | TestFile dispatch matrix: -f, -d, -r, -w, -x, -e, -s, -L, -S, -p, -b, -c |
| tests/functions_vars_stack.rs | 544 | 43 | Call / Return / ReturnValue / PushFrame / PopFrame semantics + slot-vs-name lookup precedence |
| tests/collections_and_concat.rs | 543 | 41 | Array / hash construction (MakeArray / MakeHash), Range / RangeStep, Concat / StringRepeat |
| tests/jit_fuzz.rs | — | diff | Differential fuzz: random valid chunks compared interpreter vs tracing JIT, asserts identical results. Gated behind --features jit. Catches latent recorder / deopt / IR bugs that curated tests miss. |
| TOP 15 MODULES SUBTOTAL | 11,012 | — | 9.2% of 123,238-line test corpus |
@EXECUTION PIPELINE
Frontend → ChunkBuilder → Chunk → VM::run(). The interpreter is the spine. When the jit feature is enabled and tracing is turned on, hot backedges auto-dispatch into Cranelift-compiled native code; type-guard misses deopt back to the same bytecode offset with frame + stack state reconstructed.
Frontend source (.stk / .zsh / .awk)
│
▼
┌─────────────────────────┐
│ Frontend compiler │ stryke: ~450 ext ops
│ (lexer → parser → │ zshrs: ~20 ext ops
│ AST → bytecode) │ awkrs: ~95 ext ops
└────────┬────────────────┘
│ b.emit(Op, line)
▼
┌─────────────────────────┐
│ ChunkBuilder │ add_constant / add_name
│ (src/chunk.rs) │ add_block_range / add_sub_chunk
│ │ patch_jump / build()
└────────┬────────────────┘
│
▼
┌─────────────────────────┐ ┌─────────────────────────┐
│ Chunk │────▶│ VMPool (optional) │
│ • ops: Vec<Op> │ │ acquire / release │
│ • consts: Vec<Value> │ └─────────────────────────┘
│ • names: Vec<String> │
│ • lines: Vec<u32> │
│ • slots: usize │
│ • blocks: Vec<Range> │
│ • subs: Vec<Chunk> │
└────────┬────────────────┘
│ VM::new(chunk)
▼
┌─────────────────────────────────────────────────────────┐
│ VM::run() (src/vm.rs) │
│ match-dispatch over Op │
│ stack + frame slots │
│ │
│ ┌───────────────────────────────────────────────┐ │
│ │ Extension hook │ │
│ │ Op::Extended(id, arg) ──▶ ext_handler │ │
│ │ Op::ExtendedWide(id, payload) ──▶ wide │ │
│ │ Op::CallBuiltin(id, argc) ──▶ builtin tbl │ │
│ └───────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────┐ │
│ │ Shell-host hook │ │
│ │ Op::Exec / Pipeline / Redirect / Glob /... │ │
│ │ ──▶ ShellHost::glob / pipeline_begin / ... │ │
│ └───────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────┐ │
│ │ Tracing-JIT hot-backedge dispatch │ │
│ │ (feature = "jit", VM::enable_tracing_jit()) │ │
│ │ ──▶ try_run_trace ──▶ native fn ptr │ │
│ │ │ side-exit (type guard miss) │ │
│ │ └──▶ DeoptInfo: resume_ip + frames + │ │
│ │ stack-kind tags ──▶ back to match │ │
│ └───────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
│
▼
VMResult::Ok(Value) | Error(String) | Halted
&OPCODE INVENTORY
201 variants of Op across 20 sections. Each op is a tagged enum case; payload operands are pool indices (u16, 64k names / constants), jump targets (usize), or sub-byte fields encoded via the file_test / redirect_op / param_mod constant modules. Every variant is ≤ 24 bytes for cache-friendly dispatch.
// OPCODE CATEGORIES
Constants (6)
Nop, LoadInt(i64), LoadFloat(f64), LoadConst(u16), LoadTrue, LoadFalse, LoadUndef
Stack (5)
Pop, Dup, Dup2, Swap, Rot
Variables (7)
GetVar / SetVar / DeclareVar (name-pool indexed), GetSlot / SetSlot (slot-indexed fast path), SlotArrayGet / SlotArraySet (slot-resident array indexing — no extra GetSlot)
Arrays (10)
GetArray, SetArray, DeclareArray, ArrayGet, ArraySet, ArrayPush, ArrayPop, ArrayShift, ArrayLen, MakeArray
Hashes (10)
GetHash, SetHash, DeclareHash, HashGet, HashSet, HashDelete, HashExists, HashKeys, HashValues, MakeHash
Arithmetic (9)
Add, Sub, Mul, Div, Mod, Pow, Negate, Inc, Dec — int / float dispatch with wrapping fast path
String (3)
Concat, StringRepeat, StringLen
Numeric Compare (7)
NumEq, NumNe, NumLt, NumGt, NumLe, NumGe, Spaceship (<=> → -1 / 0 / 1)
String Compare (7)
StrEq, StrNe, StrLt, StrGt, StrLe, StrGe, StrCmp
Logical / Bitwise (9)
LogNot, LogAnd, LogOr, BitAnd, BitOr, BitXor, BitNot, Shl, Shr. LogAnd / LogOr evaluate both sides; short-circuit lives in the JumpIfTrueKeep / FalseKeep pair.
Control Flow (5)
Jump, JumpIfTrue, JumpIfFalse, JumpIfTrueKeep (short-circuit ||), JumpIfFalseKeep (short-circuit &&)
Functions (3)
Call(name_idx, argc), Return, ReturnValue
Scope (2)
PushFrame, PopFrame
I/O (3)
Print(n), PrintLn(n), ReadLine
Collections (2)
Range ([from, to] → array), RangeStep ([from, to, step] → array)
Higher-Order (5)
MapBlock(idx), GrepBlock(idx), SortBlock(idx), SortDefault, ForEachBlock(idx) — idx resolves to a block range inside the chunk
Fused Superinstructions (8)
PreIncSlot, SlotLtIntJumpIfFalse, SlotIncLtIntJumpBack, AccumSumLoop, ConcatConstLoop, PushIntRangeLoop, AddAssignSlotVoid, PreIncSlotVoid — see next section
Builtins (1)
CallBuiltin(id: u16, argc: u8) — routes to the handler registered by VM::register_builtin(id, handler). Builtin IDs come from src/shell_builtins.rs (140 stable constants).
Extension Point (2)
Extended(u16, u8) for narrow ops (inline byte operand), ExtendedWide(u16, usize) for wide ops (jump targets, large indices). Frontend registers fn(&mut VM, u16, u8) via set_extension_handler / set_extension_wide_handler.
Shell Ops (29)
Exec, ExecBg, PipelineBegin / Stage / End, Redirect(fd, op), HereDoc, HereString, CmdSubst, SubshellBegin / End, ProcessSubIn / Out, Glob, GlobRecursive, TestFile(u8), SetStatus / GetStatus, TrapSet / TrapCheck, ExpandParam(u8), WordSplit, BraceExpand, TildeExpand, CallFunction(name, argc), StrMatch, RegexMatch, WithRedirectsBegin / End
// FUSED SUPERINSTRUCTIONS
The performance secret. The compiler detects hot loop patterns and emits a single op instead of a multi-op sequence. Each fused op eliminates N−1 dispatch cycles, stack pushes, and branch mispredictions from the hot path.
| Fused Op | Replaces | Effect |
|---|---|---|
AccumSumLoop(sum, i, limit) | GetSlot + GetSlot + Add + SetSlot + PreInc + NumLt + JumpIfFalse | Entire counted sum loop in one dispatch |
SlotIncLtIntJumpBack(slot, limit, target) | PreIncSlot + SlotLtIntJumpIfFalse | Loop backedge in one dispatch |
ConcatConstLoop(const, s, i, limit) | LoadConst + ConcatAppendSlot + SlotIncLtIntJumpBack | String-append loop in one dispatch |
PushIntRangeLoop(arr, i, limit) | GetSlot + PushArray + ArrayLen + Pop + SlotIncLtIntJumpBack | Array push loop in one dispatch |
AddAssignSlotVoid(a, b) | GetSlot + GetSlot + Add + SetSlot | Void-context add-assign, no stack traffic |
PreIncSlotVoid(slot) | GetSlot + Inc + SetSlot | Void-context increment, no stack traffic |
SlotLtIntJumpIfFalse(slot, int, target) | GetSlot + LoadInt + NumLt + JumpIfFalse | Fused compare + branch, no stack traffic |
PreIncSlot(slot) | GetSlot + Inc + SetSlot + GetSlot | Slot pre-increment with push |
!VALUE SYSTEM
Every value in the VM is a Value. 10-variant enum, designed to stay cache-friendly (small discriminant + 1–2 word payload). Frontends convert their native types to / from Value at the boundary; the dispatch loop only sees this shape.
| Variant | Payload | Used For |
|---|---|---|
Undef | — | Uninitialized / no value — Default for Value |
Bool | bool | Conditionals, [[ ]] tests, StrMatch / RegexMatch results |
Int | i64 | Numeric scalars — fast path for all Op::Add / Sub / Mul / Div |
Float | f64 | IEEE-754 scalars, mixed-type arith with promotion |
Str | Arc<String> | Heap-allocated string — Arc for cheap clone in closures and across pipeline stages |
Array | Vec<Value> | Ordered array — in-place mutation on slot-resident arrays via SlotArraySet |
Hash | HashMap<String, Value> | Key-value associative array |
Status | i32 | Exit status code (shell-specific but universal enough that every frontend can produce one) |
Ref | Box<Value> | Pass-by-reference, nested structures, AST-style sharing |
NativeFn | u16 | Native function pointer (builtin dispatch ID) — allows first-class function values without trait objects |
Coercion API kept allocation-light: to_int, to_float, to_str (owned String), as_str_cow (borrowed Cow<str> for the hot path where the value is already a string), is_truthy (Perl-style: 0 / "" / "0" / Undef → false), len, is_empty. Constructor shortcuts: Value::int(n), Value::float(f), Value::str(s), Value::bool(b), Value::array(v), Value::hash(m), Value::status(code).
~CRANELIFT JIT — 3 TIERS
All three tiers share Cranelift 0.130 (same IR backend as Wasmtime) behind the jit feature flag. Same Value shape across interpreter and JIT, so deopt is a frame swap, not a re-marshal. TraceJitConfig exposes every threshold; JitCompiler::set_config(…) applies it to subsequent calls from the current thread.
Tier 1 — Linear JIT (instant)
compile_linear(chunk: &Chunk) -> Option<CompiledLinear>. Compiles straight-line bytecode on first call — no warmup, no profile, no CFG. Use case: tiny chunks where any interpreter overhead is the bottleneck. Falls back to interpreter on any unsupported op.
Tier 2 — Block JIT (CFG, threshold 10)
compile_block(chunk: &Chunk) -> Option<CompiledBlock>. Whole-chunk control-flow graph compilation. Triggered after a chunk's hot_count crosses 10 invocations. Better steady-state than linear for non-trivial control flow.
Tier 3 — Tracing JIT (loop body, threshold 50)
Hot-backedge detection: every backward branch is a candidate loop header. When a header's hot_count crosses trace_threshold (default 50), the recorder runs one iteration, captures the linear trace, and lowers it to Cranelift IR with type guards at every op boundary. Side-exits deopt back to the interpreter.
Cross-Call Inlining
Phase 2: tracing inlines through Call for callees that are branchless within the inlined window. Phase 8 bumps it: bounded recursion to depth ≤ 4 (max_inline_recursion). Recursive calls past 4 abort the trace.
Caller- and Callee-Frame Branches
Phase 3: caller-frame if / else with side-exits. Phase 4: callee-frame branches with frame materialization. The trace emits DeoptFrame records (caller→callee order) for every inlined frame; on side-exit the VM rebuilds vm.frames to match what the bytecode would naturally have at the deopt IP. Capacity: MAX_DEOPT_FRAMES = 4, MAX_DEOPT_SLOTS_PER_FRAME = 16.
Abstract-Stack Reconstruction
Phases 5 + 5b: the trace tracks the abstract value stack and writes (kind, value) pairs into DeoptInfo.stack_buf on side-exit. Capacity: MAX_DEOPT_STACK = 32 entries. Tags: STACK_KIND_INT (0) for Value::Int(i64), STACK_KIND_FLOAT (1) for Value::Float(f64). The VM pushes them onto the live stack before resuming at resume_ip.
Side-Exit Counter + Auto-Blacklist
Phase 6: every side-exit bumps entry.side_exit_count. When it crosses max_side_exits (default 50) the trace is auto-blacklisted — future invocations skip it and stay in the interpreter. Prevents pathological retry loops.
Persistent TraceMetadata
Phase 7: JitCompiler::export_trace_metadata() serializes all known traces; import_trace_metadata(…) warms a fresh compiler. Embedders that re-run the same script repeatedly skip the recorder warm-up after the first run.
Side-Trace Stitching
Phase 9: when a side-exit fires often enough to qualify as its own hot site, the JIT records a side trace starting from that deopt IP and stitches it to the parent. max_trace_chain = 4 caps the chain depth.
Trace-Length Cap
max_trace_len = 256 (default). Recording aborts past 256 ops — long traces underperform shorter, retypeable ones. Tunable per workload.
%EXTENSION MECHANISM
Universal ops live in the Op enum. Language-specific ops are dispatched through frontend-registered handler tables. Each frontend owns its own ID space — stryke's op 42 and zshrs's op 42 don't collide because the handlers run in different VM instances.
Narrow: Extended(u16, u8)
16-bit op ID + 8-bit inline operand. Common case: a frontend op that fits in one byte of payload (flag bit, enum tag, small index). Registered via VM::set_extension_handler(Box::new(|vm, id, arg| {…})).
Wide: ExtendedWide(u16, usize)
16-bit op ID + usize payload. For jump targets, large indices, or anything that won't fit in a byte. Registered via VM::set_extension_wide_handler(…).
Builtin Dispatch: CallBuiltin(u16, u8)
Universal call into a registered builtin by stable u16 ID. Frontend registers handlers via VM::register_builtin(id, handler). IDs come from shell_builtins (140 reserved constants) or the frontend's own space — the table is per-VM, no global registry.
Shell Host Dispatch
Shell ops in the Op enum route to a Box<dyn ShellHost> set via VM::set_shell_host(…). DefaultHost ships sensible no-ops for frontends that need shell-op syntax (regex match, file tests) without needing real process control.
Builtin ID Ranges
Conventional partitioning in shell_builtins.rs: 0–19 core (cd, pwd, echo, …), 20–29 typeset, 30–39 I/O, 40–49 loop control, 50+ frontend-specific. Frontends can claim unused slots without coordinating — the type system enforces nothing, but the comment-banded ranges keep the convention readable.
Hooks Without Wrappers
Every extension hook is a Box<dyn Fn(&mut VM, …)>. No newtype, no trait object hierarchy. The frontend writes one closure per op or one big match — both shapes are equally fast under the dispatch loop.
^SHELL HOST TRAIT
ShellHost: Send — the boundary between the VM and the host's process-control surface. Every shell op routes through one of these methods; DefaultHost ships no-op defaults so a non-shell frontend can ignore them.
Expansion
glob(pattern, recursive), tilde_expand(s), brace_expand(s), word_split(s), expand_param(name, modifier, args) (18 modifier types via param_mod), array_index(name, idx)
Substitution
cmd_subst(sub: &Chunk) -> String, process_sub_in(sub) -> String (returns FIFO path), process_sub_out(sub) -> String
Redirection
redirect(fd, op, target) (9 op types via redirect_op), heredoc(content), herestring(content), with_redirects_begin(count), with_redirects_end() — scoped redirection blocks restore fd state on early return
Pipelines + Subshells
pipeline_begin(n), pipeline_stage(), pipeline_end() -> i32, subshell_begin(), subshell_end() -> Option<i32> (Some(status) propagates a deferred subshell exit into the parent VM’s last_status)
Traps
trap_set(sig, handler: &Chunk), trap_check() — the compiler inserts TrapCheck between ops; the host decides which signals deliver and runs the registered Chunk
Execution
call_function(name, args) -> Option<i32> (user-defined function lookup), exec(args) -> i32, exec_bg(args) -> i32
Matching
str_match(s, pat) -> bool (glob-pattern match for [[ x = pat ]] and case arms), regex_match(s, regex) -> bool (=~)
*BENCHMARKS
5 Criterion harnesses under benches/. Two are interpreter-only (run without features); three require --features jit. HTML reports via Criterion's built-in renderer (target/criterion/report/index.html).
| Bench | LOC | Requires | Measures |
|---|---|---|---|
| benches/vm_bench.rs | 560 | — | Core interpreter throughput: arithmetic, control flow, function calls, scope, collections — the baseline every JIT tier is measured against |
| benches/classic.rs | 471 | — | Classic interpreter workloads: fibonacci, sum-N, array push loops, string concat loops — the inputs every fused superinstruction was designed for |
| benches/jit_vs_interp.rs | 247 | jit | Head-to-head: same chunk through pure interpreter vs JIT-enabled VM. Measures the speedup the JIT actually delivers, per workload |
| benches/jit_trace.rs | 216 | jit | Tracing-specific: hot-loop trace latency, deopt cost, side-trace stitching overhead, TraceMetadata import speedup |
| benches/jit_crossover.rs | 136 | jit | Crossover: the chunk size / hot-count where JIT compile + execution starts to beat pure interpretation. Calibrates the trace_threshold default. |
| TOTAL | 1,789 | — | Run with cargo bench (interpreter benches) or cargo bench --features jit (all) |
+DEPENDENCIES
Intentionally minimal. 3 always-on runtime dependencies + 5 optional Cranelift crates (gated behind jit) + 3 dev dependencies. Every crate is foundational — serde, tracing, glob, the Cranelift family, criterion — chosen to survive a 2030+ rebuild without churn.
| Crate | Version | Role | Gating |
|---|---|---|---|
serde | 1 | Derive macros (derive, rc) for Op, Value, Chunk — enables bytecode caching and TraceMetadata export/import | always |
tracing | 0.1 | Structured logging for diagnostic events — tracing::debug! at every JIT compile, deopt, and side-exit | always |
glob | 0.3 | Shell-glob pattern matching for DefaultHost::glob and StrMatch | always |
cranelift-jit | 0.130 | Cranelift JIT memory allocator + symbol resolver | jit |
cranelift-codegen | 0.130 | IR → machine code (x86-64 + aarch64) | jit |
cranelift-frontend | 0.130 | FunctionBuilder — builds IR from bytecode | jit |
cranelift-native | 0.130 | ISA target detection at runtime | jit |
cranelift-module | 0.130 | Module + linker abstraction for JIT-emitted code | jit |
serde_json | 1 | Round-trip tests for Chunk / Op / Value serialization | dev |
bincode | 1 | Compact binary serialization round-trip tests | dev |
criterion | 0.5 | Statistical benchmarking + HTML reports | dev |
No rand, no tokio, no parking_lot, no regex (frontends bring their own), no libc — the VM core stays pure Rust. JIT is a single feature flag; interpreter-only builds skip the entire Cranelift toolchain (~1M LOC of transitive C / Rust deps).
;PUBLIC API SURFACE
fusevm ships as an embeddable Rust crate (cargo add fusevm, optionally --features jit). The re-export set in src/lib.rs is the entire public surface; everything else is implementation detail.
| Surface | Count | Notes |
|---|---|---|
| Public modules | 9 | awk_builtins, awk_host, chunk, host, jit, op, shell_builtins, value, vm |
| Re-exported types | 19 | Chunk, ChunkBuilder, DefaultHost, ShellHost, Op, Value, VM, VMPool, VMResult, Frame, JitCompiler, JitExtension, NativeCode, SlotKind, TraceJitConfig, TraceLookup, TraceMetadata, DeoptFrame, DeoptInfo |
| Public functions (across modules) | 124 | Includes constructors, builders, JIT entry points, host trait methods. jit.rs alone exposes 66 pub fn. |
| Builtin ID constants | 140 | Stable BUILTIN_*: u16 values in shell_builtins.rs — frontends register handlers against these |
| Op variants | 201 | The contract every frontend compiles against |
| Feature flags | 2 | jit — enables the entire Cranelift family + the three JIT tiers; jit-disk-cache — persists compiled native code to ~/.cache/fusevm-jit so codegen is skipped across process restarts (implies jit, on by default once enabled) |
?KEY DESIGN DECISIONS
Why fusevm looks the way it does. Each call-out is a decision the implementation could have gone either way on, with the rationale for the path taken.
Language-Agnostic by Construction
Every other embeddable VM in the reference table grew its bytecode as the runtime for one language. fusevm starts from the opposite end: the Op enum is the spec; the frontends register against it. The result is three live frontends (stryke / zshrs / awkrs) sharing one JIT, one fused-loop table, one deopt path. Every perf improvement compounds across all of them.
Bytecode, Not Tree-Walker
An AST interpreter is simpler but pays virtual-call cost per node. Bytecode collapses dispatch into a tight match loop the branch predictor can warm up to. Same call-out as strykelang — same execution model.
Cranelift, Not LLVM
LLVM gives slightly better steady-state code but compiles 10× slower and pulls in a giant C++ dependency. Cranelift is pure Rust, fast to compile, and production-tested in Wasmtime. Same backend as Wasmtime is itself a durability argument: bug-fixed against a huge surface area.
Three Tiers, Auto-Dispatched
Linear catches the "first call, want native immediately" case. Block catches the "warm chunk, want CFG-aware code" case. Tracing catches the "tight inner loop, want type-specialized native" case. Auto-dispatched from VM::run(): callers don't choose a tier; the VM picks based on observed hot-counts and falls through tiers as warm-up data accumulates.
Fused Superinstructions Over Generic Inliner
The fused-op table (AccumSumLoop, ConcatConstLoop, PushIntRangeLoop, …) is hand-curated against measured hot patterns. A general inliner would catch more but cost more compile time and code-cache pressure. Eight fused ops absorb the dominant hot loops every frontend produces; the rest stays in the dispatch loop.
Side-Exits Over Recompile-On-Type-Drift
The tracing JIT chooses to deopt on type-guard miss rather than recompile with a new shape. Recompile pays per-shape compile time forever; deopt pays once per anomalous iteration and goes back to the interpreter. With max_side_exits as a backstop, the trace either stabilizes or auto-blacklists.
Persistent TraceMetadata + native disk cache
For embedders that run the same script repeatedly (CI scripts, REPL re-evals), recorder warmup is wasted on every run. export_trace_metadata / import_trace_metadata serializes the recorded set so cold start picks up where warm shutdown left off. The jit-disk-cache feature goes further: it persists the finished native code for all three tiers to ~/.cache/fusevm-jit (on by default once enabled), so restarts skip Cranelift codegen entirely — a cached block load is ~35 µs vs ~152 µs cold.
Pool, Not Allocator Trick
VMPool reuses VM instances across script runs so the allocator isn't asked to rebuild the same frame buffers, slot vectors, and handler tables on every invocation. No jemalloc, no mimalloc, no unsafe — a plain pool that frontends acquire / release per script.
Shell Ops as First-Class Variants
Pipelines, redirects, here-docs, glob, file tests are common enough across stryke / zshrs / awkrs to live in the universal Op enum rather than as ext ops. The cost: extra variants in the match loop. The win: every frontend gets them with zero registration, and the JIT can specialize them.
Zero Runtime Deps Beyond Cranelift
Three always-on crates: serde, tracing, glob. Cranelift is opt-in via the jit feature. No regex, no tokio, no parking_lot. Frontends bring their own. Result: interpreter-only fusevm builds in seconds with a 4-crate dep graph.