>_STRYKE-SPARK
Distributed compute from a stryke one-liner. Apache Spark client for stryke. Opt-in package targeting Spark Connect (post-3.4) over gRPC, so no JVM is required on the client side.
Install
# build the helper binary, install as a stryke package cd ~/projects/stryke-spark cargo build --release s pkg install -g . # one-liner make install # verify spark --help
After install, spark --help works from anywhere on PATH (assuming ~/.stryke/bin/ is on PATH). The stryke library is auto-discoverable to any project that depends on the package via [deps] spark = { path = "..." } or, when published, by name.
CLI: spark
| connect + run SQL | spark sql 'SELECT count(*) FROM my_table' --remote sc://spark-connect:15002 |
| submit a DataFrame op | spark df read parquet s3://bucket/path | spark df filter 'qty > 100' | spark df show |
| list tables | spark tables list |
| describe a table schema | spark table describe my_table |
The full flag matrix lives in the README "CLI" section.
Why a package, not a builtin
Spark Connect splits driver from client; the client speaks gRPC + protobuf. This package wraps that protocol so stryke pipelines can submit SQL or DataFrame operations without bundling the JVM.
The stryke side is a thin NDJSON-pipe wrapper; the heavy code lives in the stryke-spark-helper sidecar binary and is loaded on demand. Core stryke is never linked against this package's deps.
Helper protocol
The stryke-spark-helper sidecar speaks newline-delimited JSON over stdin/stdout. The stryke library shells out per call and pipes structured data both ways. This keeps stryke startup small while making the package's surface area available on demand.
# manual invocation (debugging only)
echo '{"op":"version"}' | stryke-spark-helper
Layout
stryke-spark/ ├── Cargo.toml # bin = stryke-spark-helper (publish = false) ├── src/ │ └── main.rs # helper binary entry point ├── lib/ # stryke .stk wrapper(s) ├── stryke.toml # stryke package manifest ├── t/ # zunit-style tests ├── examples/ # runnable .stk examples ├── Makefile # `make install` builds + installs └── docs/ # this site (GitHub Pages)
Sibling packages
Part of the stryke connector family. Browse the others via the MenkeTechnologiesMeta umbrella repo (Tier 2):
- stryke-arrow — Apache Arrow / Parquet / Feather / arrow-CSV/JSON
- stryke-aws — S3, DynamoDB, SQS, Lambda, STS
- stryke-docker — Docker daemon API
- stryke-duckdb — embedded DuckDB
- stryke-gcp — Cloud Storage + Pub/Sub
- stryke-grpc — reflection-based gRPC client
- stryke-k8s — Kubernetes
- stryke-kafka — Apache Kafka
- stryke-mongo — MongoDB
- stryke-mysql — MySQL / MariaDB
- stryke-parquet — Parquet file inspector
- stryke-postgres — PostgreSQL
- stryke-redis — Redis / Valkey
- stryke-spark — Spark Connect (no JVM)