// STRYKE-SPARK — ENGINEERING REPORT

Opt-in stryke connector package · helper binary stryke-spark-helper (publish=false) · CLI launcher spark · NDJSON-over-pipe protocol

>_EXECUTIVE SUMMARY

stryke-spark is one of the 14 opt-in connector packages in the stryke ecosystem. Apache Spark client for stryke. Opt-in package targeting Spark Connect (post-3.4) over gRPC, so no JVM is required on the client side.

Spark Connect splits driver from client; the client speaks gRPC + protobuf. This package wraps that protocol so stryke pipelines can submit SQL or DataFrame operations without bundling the JVM.

spark
CLI launcher
opt-in
Tier
NDJSON
Helper protocol
14
Sibling stryke-* packages

~ARCHITECTURE

Two-process design: the stryke side is a thin .stk library that pipes calls to a sidecar helper. The helper is a regular Rust binary with the connector's full dependency tree; stryke core never links against it.

LayerImplementation
stryke library (lib/*.stk)Thin wrapper exposing typed functions; serializes args to NDJSON, deserializes responses
Helper binary (stryke-spark-helper)Rust sync; reads NDJSON from stdin, dispatches, writes NDJSON to stdout
Process modelOne helper subprocess per stryke session; lives as long as the stryke runtime needs it
BuildCargo with publish = false on the helper crate; the package itself ships via s pkg install -g .
Install path~/.stryke/bin/spark after make install; PATH-resolvable from any shell
Testszunit-style under t/ with live-service variants when applicable
CIGitHub Actions .github/workflows/ci.yml — cargo fmt + clippy + test, plus stryke pkg install verification

$WHY OPT-IN (NOT BUILTIN)

Spark Connect splits driver from client; the client speaks gRPC + protobuf. This package wraps that protocol so stryke pipelines can submit SQL or DataFrame operations without bundling the JVM.

The trade-off is intentional. The core stryke binary stays under ~40 MB precisely because each connector ships separately. Daily-driver work (one-liners, awk replacement, data scripting) doesn't need MongoDB drivers, AWS SDKs, or Spark Connect bindings linked in.


&HELPER PROTOCOL

NDJSON over stdin/stdout. Each request line is a JSON object with an op field plus op-specific args. Each response is a JSON object on stdout (one per line). Errors are JSON objects with an error field plus a human-readable message.

# manual invocation (debugging)
echo '{"op":"version"}' | stryke-spark-helper

# typical request shape
{"op":"", "...": ...}

# typical response shape
{"ok": true, "result": ...}
{"error": "...", "code": "..."}

/SCOPE

See the README's "Why this is a package" + "CLI: spark" sections for the authoritative scope. The package is intentionally narrower than its underlying SDK / driver — the goal is "useful from a shell pipeline", not "complete API coverage".


#PROJECT METADATA

ItemValue
LicenseMIT
AuthorMenkeTechnologies
Repositorygithub.com/MenkeTechnologies/stryke-spark
Parent languagestrykelang
Meta umbrellaMenkeTechnologiesMeta
Issuesgithub.com/MenkeTechnologies/stryke-spark/issues