>_EXECUTIVE SUMMARY
stryke-spark is one of the 14 opt-in connector packages in the stryke ecosystem. Apache Spark client for stryke. Opt-in package targeting Spark Connect (post-3.4) over gRPC, so no JVM is required on the client side.
Spark Connect splits driver from client; the client speaks gRPC + protobuf. This package wraps that protocol so stryke pipelines can submit SQL or DataFrame operations without bundling the JVM.
~ARCHITECTURE
Two-process design: the stryke side is a thin .stk library that pipes calls to a sidecar helper. The helper is a regular Rust binary with the connector's full dependency tree; stryke core never links against it.
| Layer | Implementation |
|---|---|
stryke library (lib/*.stk) | Thin wrapper exposing typed functions; serializes args to NDJSON, deserializes responses |
Helper binary (stryke-spark-helper) | Rust sync; reads NDJSON from stdin, dispatches, writes NDJSON to stdout |
| Process model | One helper subprocess per stryke session; lives as long as the stryke runtime needs it |
| Build | Cargo with publish = false on the helper crate; the package itself ships via s pkg install -g . |
| Install path | ~/.stryke/bin/spark after make install; PATH-resolvable from any shell |
| Tests | zunit-style under t/ with live-service variants when applicable |
| CI | GitHub Actions .github/workflows/ci.yml — cargo fmt + clippy + test, plus stryke pkg install verification |
$WHY OPT-IN (NOT BUILTIN)
Spark Connect splits driver from client; the client speaks gRPC + protobuf. This package wraps that protocol so stryke pipelines can submit SQL or DataFrame operations without bundling the JVM.
The trade-off is intentional. The core stryke binary stays under ~40 MB precisely because each connector ships separately. Daily-driver work (one-liners, awk replacement, data scripting) doesn't need MongoDB drivers, AWS SDKs, or Spark Connect bindings linked in.
&HELPER PROTOCOL
NDJSON over stdin/stdout. Each request line is a JSON object with an op field plus op-specific args. Each response is a JSON object on stdout (one per line). Errors are JSON objects with an error field plus a human-readable message.
# manual invocation (debugging)
echo '{"op":"version"}' | stryke-spark-helper
# typical request shape
{"op":"", "...": ...}
# typical response shape
{"ok": true, "result": ...}
{"error": "...", "code": "..."}
/SCOPE
See the README's "Why this is a package" + "CLI: sections for the authoritative scope. The package is intentionally narrower than its underlying SDK / driver — the goal is "useful from a shell pipeline", not "complete API coverage".spark"
#PROJECT METADATA
| Item | Value |
|---|---|
| License | MIT |
| Author | MenkeTechnologies |
| Repository | github.com/MenkeTechnologies/stryke-spark |
| Parent language | strykelang |
| Meta umbrella | MenkeTechnologiesMeta |
| Issues | github.com/MenkeTechnologies/stryke-spark/issues |