|
| 1 | +--- |
| 2 | +name: blacksmith-testbox |
| 3 | +description: Run Blacksmith Testbox for CI-parity checks, secrets, hosted services, migrations, or builds local cannot reproduce. |
| 4 | +--- |
| 5 | + |
| 6 | +# Blacksmith Testbox |
| 7 | + |
| 8 | +## Scope |
| 9 | + |
| 10 | +Use Testbox when you need remote CI parity, injected secrets, hosted services, |
| 11 | +or an OS/runtime image that your local machine cannot provide cheaply. |
| 12 | + |
| 13 | +Do not default to Testbox for every local test/build loop. If the repo has |
| 14 | +documented local commands for normal iteration, use those first so you keep |
| 15 | +warm caches, local build state, and fast feedback. |
| 16 | + |
| 17 | +Testbox is the expensive path. Reach for it deliberately. |
| 18 | + |
| 19 | +## Install the CLI |
| 20 | + |
| 21 | +If `blacksmith` is not installed, install it: |
| 22 | + |
| 23 | + curl -fsSL https://get.blacksmith.sh | sh |
| 24 | + |
| 25 | +For the canary channel (bleeding-edge): |
| 26 | + |
| 27 | + BLACKSMITH_CHANNEL=canary sh -c 'curl -fsSL https://get.blacksmith.sh | sh' |
| 28 | + |
| 29 | +Then authenticate: |
| 30 | + |
| 31 | + blacksmith auth login |
| 32 | + |
| 33 | +## Agent-triggered browser auth (non-interactive) |
| 34 | + |
| 35 | +When an agent needs to ensure the user is authenticated before running testbox |
| 36 | +commands (e.g. warmup, run), use browser-based auth with non-interactive mode. |
| 37 | +This opens the browser for the user to sign in; the agent does not interact with |
| 38 | +the browser. The org selector in the dashboard is skipped, so the user only sees |
| 39 | +the sign-in flow. |
| 40 | + |
| 41 | +**Required command** (`--organization` is required with `--non-interactive`): |
| 42 | + |
| 43 | + blacksmith auth login --non-interactive --organization <org-slug> |
| 44 | + |
| 45 | +The org slug can come from `BLACKSMITH_ORG` env var or the `--org` global flag. |
| 46 | +If neither is set, the agent should use the project's known org (e.g. from repo |
| 47 | +config or user context). Example: |
| 48 | + |
| 49 | + blacksmith auth login --non-interactive --organization acme-corp |
| 50 | + blacksmith --org acme-corp auth login --non-interactive --organization acme-corp |
| 51 | + |
| 52 | +**Flow**: The CLI starts a local callback server, opens the browser to the |
| 53 | +dashboard auth page, and blocks for up to 2 minutes. The user completes sign-in |
| 54 | +and authorization in the browser. The dashboard redirects to localhost with the |
| 55 | +token; the CLI saves credentials and exits. The agent then proceeds. |
| 56 | + |
| 57 | +**Do not use** `--api-token` for this flow — that is for headless/token-based |
| 58 | +auth. This skill focuses on browser-based auth when the user prefers signing in |
| 59 | +via the web UI. |
| 60 | + |
| 61 | +Optional flags: |
| 62 | + |
| 63 | +- `--dashboard-url <url>` — Override dashboard URL (e.g. for staging) |
| 64 | + |
| 65 | +## Decide first: local or Testbox |
| 66 | + |
| 67 | +Before warming anything up, check the repo's own instructions. |
| 68 | + |
| 69 | +Prefer local commands when: |
| 70 | + |
| 71 | +- the repo documents a supported local test/build workflow |
| 72 | +- you are iterating on unit tests, lint, typecheck, formatting, or other |
| 73 | + local-only validation |
| 74 | +- the value comes from warm local caches and fast repeat runs |
| 75 | +- the command does not need remote secrets, hosted services, or CI-only images |
| 76 | + |
| 77 | +Prefer Testbox when: |
| 78 | + |
| 79 | +- the repo explicitly requires CI-parity or remote validation |
| 80 | +- the command needs secrets, service containers, or provisioned infra |
| 81 | +- you are reproducing CI-only failures |
| 82 | +- you need the exact workflow image/job environment from GitHub Actions |
| 83 | + |
| 84 | +For OpenClaw specifically, normal local iteration should stay local: |
| 85 | + |
| 86 | +- `pnpm check:changed` |
| 87 | +- `pnpm test:changed` |
| 88 | +- `pnpm test <path-or-filter>` |
| 89 | +- `pnpm test:serial` |
| 90 | +- `pnpm build` |
| 91 | + |
| 92 | +Only use Testbox in OpenClaw when the user explicitly wants CI-parity or the |
| 93 | +check truly depends on remote secrets/services that the local repo loop cannot |
| 94 | +provide. |
| 95 | + |
| 96 | +## Setup: Warmup before coding |
| 97 | + |
| 98 | +If you decided Testbox is actually warranted, warm one up early. This returns |
| 99 | +an ID instantly and boots the CI environment in the background while you work: |
| 100 | + |
| 101 | + blacksmith testbox warmup ci-check-testbox.yml |
| 102 | + # → tbx_01jkz5b3t9... |
| 103 | + |
| 104 | +Save this ID. You need it for every `run` command. |
| 105 | + |
| 106 | +Warmup dispatches a GitHub Actions workflow that provisions a VM with the |
| 107 | +full CI environment: dependencies installed, services started, secrets |
| 108 | +injected, and a clean checkout of the repo at the default branch. |
| 109 | + |
| 110 | +Options: |
| 111 | + |
| 112 | + --ref <branch> Git ref to dispatch against (default: repo's default branch) |
| 113 | + --job <name> Specific job within the workflow (if it has multiple) |
| 114 | + --idle-timeout <min> Idle timeout in minutes (default: 30) |
| 115 | + |
| 116 | +## CRITICAL: Always run from the repo root |
| 117 | + |
| 118 | +ALWAYS invoke `blacksmith testbox` commands from the **root of the git |
| 119 | +repository**. The CLI syncs the current working directory to the testbox |
| 120 | +using rsync with `--delete`. If you run from a subdirectory (e.g. |
| 121 | +`cd backend && blacksmith testbox run ...`), rsync will mirror only that |
| 122 | +subdirectory and **delete everything else** on the testbox — wiping other |
| 123 | +directories like `dashboard/`, `cli/`, etc. |
| 124 | + |
| 125 | + # CORRECT — run from repo root, use paths in the command |
| 126 | + blacksmith testbox run --id <ID> "cd backend && php artisan test" |
| 127 | + blacksmith testbox run --id <ID> "cd dashboard && npm test" |
| 128 | + |
| 129 | + # WRONG — do NOT cd into a subdirectory before invoking the CLI |
| 130 | + cd backend && blacksmith testbox run --id <ID> "php artisan test" |
| 131 | + |
| 132 | +If your shell is in a subdirectory, `cd` back to the repo root first: |
| 133 | + |
| 134 | + cd "$(git rev-parse --show-toplevel)" |
| 135 | + blacksmith testbox run --id <ID> "cd backend && php artisan test" |
| 136 | + |
| 137 | +## Running commands |
| 138 | + |
| 139 | + blacksmith testbox run --id <ID> "<command>" |
| 140 | + |
| 141 | +The `run` command automatically waits for the testbox to become ready if |
| 142 | +it is still booting, so you can call `run` immediately after warmup without |
| 143 | +needing to check status first. |
| 144 | + |
| 145 | +## Downloading files from a testbox |
| 146 | + |
| 147 | +Use the `download` command to retrieve files or directories from a running |
| 148 | +testbox to your local machine. This is useful for fetching build artifacts, |
| 149 | +test results, coverage reports, or any output generated on the testbox. |
| 150 | + |
| 151 | + blacksmith testbox download --id <ID> <remote-path> [local-path] |
| 152 | + |
| 153 | +The remote path is relative to the testbox working directory (same as `run`). |
| 154 | +If no local path is specified, the file is saved to the current directory |
| 155 | +using the same base name. |
| 156 | + |
| 157 | +To download a directory, append a trailing `/` to the remote path — this |
| 158 | +triggers recursive mode: |
| 159 | + |
| 160 | + # Download a single file |
| 161 | + blacksmith testbox download --id <ID> coverage/report.html |
| 162 | + |
| 163 | + # Download a file to a specific local path |
| 164 | + blacksmith testbox download --id <ID> build/output.tar.gz ./output.tar.gz |
| 165 | + |
| 166 | + # Download an entire directory |
| 167 | + blacksmith testbox download --id <ID> test-results/ ./results/ |
| 168 | + |
| 169 | +Options: |
| 170 | + |
| 171 | + --ssh-private-key <path> Path to SSH private key (if warmup used --ssh-public-key) |
| 172 | + |
| 173 | +## How file sync works |
| 174 | + |
| 175 | +Understanding this model is critical for using Testbox correctly. |
| 176 | + |
| 177 | +When you call `run`, the CLI performs a **delta sync** of your local changes |
| 178 | +to the remote testbox before executing your command: |
| 179 | + |
| 180 | +1. The testbox VM starts from a clean `actions/checkout` at the warmup ref. |
| 181 | + The workflow's setup steps (e.g. `npm install`, `pip install`, `composer install`) |
| 182 | + run during warmup and populate dependency directories on the remote VM. |
| 183 | + |
| 184 | +2. On each `run`, the CLI uses **git** to detect which files changed locally |
| 185 | + since the last sync. It syncs ONLY tracked files and untracked non-ignored |
| 186 | + files (i.e. files that `git ls-files` reports). |
| 187 | + |
| 188 | +3. **`.gitignore`'d directories are never synced.** This means directories |
| 189 | + like `node_modules/`, `vendor/`, `.venv/`, `build/`, `dist/`, etc. are |
| 190 | + NOT transferred from your local machine. The testbox uses its own copies |
| 191 | + of those directories, populated during the warmup workflow steps. |
| 192 | + |
| 193 | +4. If nothing has changed since the last sync (same git commit and working |
| 194 | + tree state), the sync is skipped entirely for speed. |
| 195 | + |
| 196 | +### Why this matters |
| 197 | + |
| 198 | +- **Changing dependencies**: If you modify `package.json`, `requirements.txt`, |
| 199 | + `composer.json`, `go.mod`, or similar dependency manifests, the lock/manifest |
| 200 | + file will be synced but the actual dependency directory will NOT. You must |
| 201 | + re-run the install command on the testbox: |
| 202 | + |
| 203 | + blacksmith testbox run --id <ID> "npm install && npm test" |
| 204 | + blacksmith testbox run --id <ID> "pip install -r requirements.txt && pytest" |
| 205 | + blacksmith testbox run --id <ID> "composer install && phpunit" |
| 206 | + |
| 207 | +- **Generated/build artifacts**: If your tests depend on a build step (e.g. |
| 208 | + `npm run build`, `make`), and you changed source files that affect the build |
| 209 | + output, re-run the build on the testbox before testing. |
| 210 | + |
| 211 | +- **New untracked files**: New files you create locally ARE synced (as long as |
| 212 | + they are not gitignored). You do not need to `git add` them first. |
| 213 | + |
| 214 | +- **Deleted files**: Files you delete locally are also deleted on the remote |
| 215 | + testbox. The sync model keeps the remote in lockstep with your local managed |
| 216 | + file set. |
| 217 | + |
| 218 | +## CRITICAL: Do not ban local tests |
| 219 | + |
| 220 | +Do not assume local validation is forbidden. Many repos intentionally invest in |
| 221 | +fast, warm local loops, and forcing every run through Testbox destroys that |
| 222 | +advantage. |
| 223 | + |
| 224 | +Use Testbox for the checks that actually need it: remote parity, secrets, |
| 225 | +services, CI-only runners, or reproducibility against the workflow image. |
| 226 | + |
| 227 | +If the repo says local tests/builds are the normal path, follow the repo. |
| 228 | + |
| 229 | +## When to use |
| 230 | + |
| 231 | +Use Testbox when: |
| 232 | + |
| 233 | +- running database migrations or destructive environment checks |
| 234 | +- running commands that depend on secrets or environment variables not present locally |
| 235 | +- reproducing CI-only failures or validating against the workflow image |
| 236 | +- validating behavior that needs provisioned services or remote runners |
| 237 | +- doing a final parity check before commit/push when the repo or user wants that |
| 238 | + |
| 239 | +Trim that list based on repo guidance. If the repo documents supported local |
| 240 | +tests/builds, prefer local for routine iteration and keep Testbox for the |
| 241 | +checks that need parity or remote state. |
| 242 | + |
| 243 | +## Workflow |
| 244 | + |
| 245 | +1. Decide whether the repo's local loop is the right default. |
| 246 | +2. Only if Testbox is warranted, warm up early: |
| 247 | + `blacksmith testbox warmup ci-check-testbox.yml` → save the ID |
| 248 | +3. Write code while the testbox boots in the background. |
| 249 | +4. Run the remote command when needed: |
| 250 | + `blacksmith testbox run --id <ID> "npm test"` |
| 251 | +5. If tests fail, fix code and re-run against the same warm box. |
| 252 | +6. If you changed dependency manifests (package.json, etc.), prepend |
| 253 | + the install command: `blacksmith testbox run --id <ID> "npm install && npm test"` |
| 254 | +7. If you need artifacts (coverage reports, build outputs, etc.), download them: |
| 255 | + `blacksmith testbox download --id <ID> coverage/ ./coverage/` |
| 256 | +8. Once green, commit and push. |
| 257 | + |
| 258 | +## OpenClaw full test suite |
| 259 | + |
| 260 | +For OpenClaw, use the repo package manager and the measured stable full-suite |
| 261 | +profile below. It keeps six Vitest project shards active while limiting each |
| 262 | +shard to one worker to avoid worker OOMs on Testbox: |
| 263 | + |
| 264 | + blacksmith testbox run --id <ID> "env NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test" |
| 265 | + |
| 266 | +Observed full-suite time on Blacksmith Testbox is about 3-4 minutes: |
| 267 | + |
| 268 | +- 173-180s on a warmed box |
| 269 | +- 219s on a fresh 32-vCPU box |
| 270 | + |
| 271 | +When validating before commit/push, run `pnpm check:changed` first when |
| 272 | +appropriate, then the full suite with the profile above if broad confidence is |
| 273 | +needed. |
| 274 | + |
| 275 | +## Examples |
| 276 | + |
| 277 | + blacksmith testbox warmup ci-check-testbox.yml |
| 278 | + # → tbx_01jkz5b3t9... |
| 279 | + |
| 280 | + # Run tests |
| 281 | + blacksmith testbox run --id <ID> "npm test -- --testPathPattern=handler.test" |
| 282 | + blacksmith testbox run --id <ID> "go test ./pkg/api/... -run TestHandler -v" |
| 283 | + blacksmith testbox run --id <ID> "python -m pytest tests/test_api.py -k test_auth" |
| 284 | + |
| 285 | + # Re-install deps after changing package.json, then test |
| 286 | + blacksmith testbox run --id <ID> "npm install && npm test" |
| 287 | + |
| 288 | + # Build and test |
| 289 | + blacksmith testbox run --id <ID> "npm run build && npm test" |
| 290 | + |
| 291 | + # Download artifacts from the testbox |
| 292 | + blacksmith testbox download --id <ID> coverage/lcov-report/ ./coverage/ |
| 293 | + blacksmith testbox download --id <ID> build/output.tar.gz |
| 294 | + |
| 295 | +## Waiting for the testbox to be ready |
| 296 | + |
| 297 | +The `run` command automatically waits for the testbox, so explicit waiting is |
| 298 | +usually unnecessary. If you do need to check readiness separately (e.g. before |
| 299 | +a series of runs), use the `--wait` flag. Do NOT use a sleep-and-recheck loop. |
| 300 | + |
| 301 | +Correct: block until ready with a timeout: |
| 302 | + |
| 303 | + blacksmith testbox status --id <ID> --wait [--wait-timeout 5m] |
| 304 | + |
| 305 | +Wrong: never use sleep + status in a loop: |
| 306 | + |
| 307 | + # BAD — do not do this |
| 308 | + sleep 30 && blacksmith testbox status --id <ID> |
| 309 | + while ! blacksmith testbox status --id <ID> | grep ready; do sleep 5; done |
| 310 | + |
| 311 | +`--wait` polls the status and exits as soon as the testbox is ready (or when the |
| 312 | +timeout is reached). Default timeout is 5m; use `--wait-timeout` for longer |
| 313 | +(e.g. `10m`, `1h`). |
| 314 | + |
| 315 | +## Managing testboxes |
| 316 | + |
| 317 | + # Check status of a specific testbox |
| 318 | + blacksmith testbox status --id <ID> |
| 319 | + |
| 320 | + # List all active testboxes for the current repo |
| 321 | + blacksmith testbox list |
| 322 | + |
| 323 | + # Stop a testbox when you're done (frees resources) |
| 324 | + blacksmith testbox stop --id <ID> |
| 325 | + |
| 326 | +Testboxes automatically shut down after being idle (default: 30 minutes). |
| 327 | +If you need a longer session, increase the timeout at warmup time: |
| 328 | + |
| 329 | + blacksmith testbox warmup ci-check-testbox.yml --idle-timeout 60 |
| 330 | + |
| 331 | +## With options |
| 332 | + |
| 333 | + blacksmith testbox warmup ci-check-testbox.yml --ref main |
| 334 | + blacksmith testbox warmup ci-check-testbox.yml --idle-timeout 60 |
| 335 | + blacksmith testbox run --id <ID> "go test ./..." |
0 commit comments