Skip to content

ci(construct): scope BUILDX_BUILDER to the build job, not workflow-wide#266

Merged
donbeave merged 1 commit into
mainfrom
ci/fix-construct-publish-manifest-builder
May 9, 2026
Merged

ci(construct): scope BUILDX_BUILDER to the build job, not workflow-wide#266
donbeave merged 1 commit into
mainfrom
ci/fix-construct-publish-manifest-builder

Conversation

@donbeave

@donbeave donbeave commented May 9, 2026

Copy link
Copy Markdown
Member

Summary

Hotfix the construct-image workflow break introduced by #256. After that PR merged, the Construct Image / publish manifest job on main failed with:

ERROR: no builder "jackin-construct" found
error: Recipe `construct-publish-manifest` failed with exit code 1

Root cause

#256 hoisted BUILDX_BUILDER: jackin-construct into the workflow-level env: block when it dropped the redundant setup-buildx-action / Bootstrap buildx pair from the publish-manifest job (correct on the merits — docker buildx imagetools create / inspect are registry-side ops that need no local builder). The unintended consequence: BUILDX_BUILDER is read by the docker buildx CLI as the default-builder selection. With the env var present and no builder of that name created in the publish-manifest job, every docker buildx invocation fails with "no builder found".

The break did not surface in PR CI because publish-manifest is gated to push to main || (workflow_dispatch && main) — neither runs on a pull_request event, so the publish path was effectively untested before merge.

Fix

Move BUILDX_BUILDER out of the workflow-level env: block into the build job's job-level env: block. publish-manifest no longer sees the env var, falls back to the default builder, and imagetools runs as designed. The build job's setup-buildx-action with: name: ${{ env.BUILDX_BUILDER }} resolves correctly because step-level env lookups walk the job-level env first.

Verify locally

Checkout

export TIRITH=0
mkdir -p "$HOME/Projects/jackin-project/test"
cd "$HOME/Projects/jackin-project/test"

if [ ! -d jackin/.git ]; then
  git clone https://github.com/jackin-project/jackin.git
fi

cd jackin
mise trust
git fetch -f origin ci/fix-construct-publish-manifest-builder:refs/remotes/origin/ci/fix-construct-publish-manifest-builder
git checkout -B ci/fix-construct-publish-manifest-builder refs/remotes/origin/ci/fix-construct-publish-manifest-builder

Static checks

yq . .github/workflows/construct.yml >/dev/null

The push to main after this PR merges will be the actual fidelity test — publish-manifest should run and succeed.

Hardening follow-ups

The class of break (env-var leak into a third-party CLI's default-selection semantics, surfacing only on push-to-main) was invisible to PR-time CI and to all three review passes. Two hardening PRs follow this one to make the same mistake catchable in PR next time:

  • A jackin PR adding workflow-env-scoping guidance to AGENTS.md and a "smoke-test push-only jobs via gh workflow run from the feature branch" rule to PULL_REQUESTS.md.
  • A jackin-github-terraform PR expanding the required-status-check list for jackin to include the new ci-required, construct-required, docs-required, and validate aggregators, and turning on strict_required_status_checks_policy plus a single required reviewer.

The post-merge run of #256 broke the construct image's
`publish-manifest` job:

    ERROR: no builder "jackin-construct" found
    error: Recipe `construct-publish-manifest` failed with exit code 1

Cause: #256 hoisted `BUILDX_BUILDER: jackin-construct` into the
workflow-level `env:` block when it dropped the redundant
`setup-buildx-action`/`Bootstrap buildx` pair from
`publish-manifest`. That env var is honored by `docker buildx` as
the default-builder selection, so on `publish-manifest` (which
intentionally does not create a builder, since
`docker buildx imagetools create`/`inspect` are registry-side ops)
buildx looks up the named builder, fails to find it, and the recipe
exits 1.

Fix: move `BUILDX_BUILDER` out of the workflow-level `env:` block
into the `build` job's job-level `env:` block, where it actually
belongs. `publish-manifest` no longer sees the env var, falls back
to the default builder, and `imagetools` runs as designed.

Verified locally: yaml parses, the env var only flows into the build
job. The `build` job's `uses: setup-buildx-action with: name: ${{ env.BUILDX_BUILDER }}`
still resolves correctly because step-level `env` lookups walk the
job-level `env` block first.

Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com>
Co-authored-by: Claude <noreply@anthropic.com>
@donbeave donbeave merged commit af7c077 into main May 9, 2026
18 checks passed
@donbeave donbeave deleted the ci/fix-construct-publish-manifest-builder branch May 9, 2026 09:56
donbeave added a commit that referenced this pull request May 9, 2026
… rules (#267)

Codify two CI-merge-safety rules in `PULL_REQUESTS.md` plus a one-line cross-reference in `AGENTS.md`. The recent #256#266 incident exposed a class of break invisible to PR-time CI: a workflow-level env var (`BUILDX_BUILDER`) leaked into the `publish-manifest` job, where docker buildx read it as the default-builder selection and failed at runtime — but only on push to main, because that job is push-only and never runs on a `pull_request` event.

Three rules now live in `PULL_REQUESTS.md`. First, third-party-CLI env vars (`BUILDX_BUILDER`, `DOCKER_BUILDKIT`, `GH_TOKEN`/`GITHUB_TOKEN`, `KUBECONFIG`, `AWS_PROFILE`, `RUSTUP_TOOLCHAIN`, `npm_config_*`) must be scoped to the consuming job; workflow-level `env:` is reserved for in-house naming/paths. Second, registry / production publishing must hard-gate on main — the `is_publish` flag pattern from `construct.yml` is the canonical shape, and PRs / feature-branch dispatches must build-and-verify only, never publish. Third, push-only / main-only / `workflow_run`-gated jobs (`build-validator`, `publish-manifest`, `deploy`, `publish-preview`) must be smoke-tested via `gh workflow run --ref <feature-branch>` before requesting merge, since PR-time CI structurally cannot exercise them.

A companion terraform PR (jackin-project/jackin-github-terraform#14) raises the matching protection-rule changes — required-status-check list, single approving review, strict up-to-date branch policy.

Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com>
Co-authored-by: Claude <noreply@anthropic.com>
donbeave added a commit that referenced this pull request May 9, 2026
… env-var leaks (#273)

Tier 3 follow-up to the post-#266 hardening. Add a PR-time `publish-manifest-rehearsal` job to `construct.yml` that runs the same docker buildx CLI plumbing `publish-manifest` depends on, without the side-effect steps (Docker Hub login, `imagetools create`, `imagetools inspect`).

The previous hardening rounds were process- (#267 docs rules) and gate-side (#14 / #15 terraform branch protection). This is the workflow-side companion. The #266 break (`ERROR: no builder "jackin-construct" found`) was caused by hoisting `BUILDX_BUILDER` into workflow-level env, which docker buildx reads as the default-builder selection — surfacing only post-merge on main because `publish-manifest` is push-only and never runs on a `pull_request` event. `docker buildx ls` evaluates `BUILDX_BUILDER` at startup and exits non-zero on a missing-builder reference, regardless of whether the rest of the command would have hit the network. Running it on PR + feature-branch dispatch reproduces the failure shape without registry credentials.

Two checks: `docker buildx ls` (catches workflow-level env-var leaks for any current or future buildx-controlling env var) and `docker buildx imagetools --help >/dev/null` (smoke-test that the buildx CLI plugin is bundled and loadable on the runner). The `construct-required` aggregator now lists `publish-manifest-rehearsal` in `needs:` so the rehearsal's result is rolled up into the same single check name branch protection requires.

Equivalent rehearsals for `build-validator`, `deploy`, and `publish-preview` are deferred — those side-effect surfaces don't have the same network-free reproducibility shape `buildx ls` provides for docker, so the fidelity-vs-cost trade is worse. Reopen if a #266-class break surfaces in any of them.

Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com>
Co-authored-by: Claude <noreply@anthropic.com>
donbeave added a commit that referenced this pull request May 18, 2026
…de (#266)

Hotfix the construct-image break introduced by #256. After that PR merged, the `Construct Image / publish manifest` job on `main` failed with `ERROR: no builder "jackin-construct" found` because the refactor hoisted `BUILDX_BUILDER: jackin-construct` into the workflow-level `env:` block when it dropped the redundant `setup-buildx-action` / `Bootstrap buildx` pair from `publish-manifest`. `BUILDX_BUILDER` is read by the `docker buildx` CLI as the default-builder selection, so on `publish-manifest` (which intentionally creates no builder, since `imagetools create` / `inspect` are registry-side operations) every `docker buildx` invocation looked up a builder by that name and exited.

Move `BUILDX_BUILDER` out of workflow-level `env:` and into the `build` job's job-level `env:` block where the matching `setup-buildx-action` actually creates the builder. `publish-manifest` no longer sees the env var, falls back to the default builder, and `imagetools` runs as designed.

The break did not surface in PR-time CI because `publish-manifest` is gated to `push to main || (workflow_dispatch && main)` — neither runs on a `pull_request` event, so the publish path was effectively untested before merge. Two follow-up PRs harden against the same class of bug: a documentation PR adding a workflow-env-scoping rule and a smoke-test-from-feature-branch rule to PULL_REQUESTS.md, and a terraform PR raising the required-status-check list for jackin to include the new aggregator jobs.

Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com>
Co-authored-by: Claude <noreply@anthropic.com>
donbeave added a commit that referenced this pull request May 18, 2026
… rules (#267)

Codify two CI-merge-safety rules in `PULL_REQUESTS.md` plus a one-line cross-reference in `AGENTS.md`. The recent #256#266 incident exposed a class of break invisible to PR-time CI: a workflow-level env var (`BUILDX_BUILDER`) leaked into the `publish-manifest` job, where docker buildx read it as the default-builder selection and failed at runtime — but only on push to main, because that job is push-only and never runs on a `pull_request` event.

Three rules now live in `PULL_REQUESTS.md`. First, third-party-CLI env vars (`BUILDX_BUILDER`, `DOCKER_BUILDKIT`, `GH_TOKEN`/`GITHUB_TOKEN`, `KUBECONFIG`, `AWS_PROFILE`, `RUSTUP_TOOLCHAIN`, `npm_config_*`) must be scoped to the consuming job; workflow-level `env:` is reserved for in-house naming/paths. Second, registry / production publishing must hard-gate on main — the `is_publish` flag pattern from `construct.yml` is the canonical shape, and PRs / feature-branch dispatches must build-and-verify only, never publish. Third, push-only / main-only / `workflow_run`-gated jobs (`build-validator`, `publish-manifest`, `deploy`, `publish-preview`) must be smoke-tested via `gh workflow run --ref <feature-branch>` before requesting merge, since PR-time CI structurally cannot exercise them.

A companion terraform PR (jackin-project/jackin-github-terraform#14) raises the matching protection-rule changes — required-status-check list, single approving review, strict up-to-date branch policy.

Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com>
Co-authored-by: Claude <noreply@anthropic.com>
donbeave added a commit that referenced this pull request May 18, 2026
… env-var leaks (#273)

Tier 3 follow-up to the post-#266 hardening. Add a PR-time `publish-manifest-rehearsal` job to `construct.yml` that runs the same docker buildx CLI plumbing `publish-manifest` depends on, without the side-effect steps (Docker Hub login, `imagetools create`, `imagetools inspect`).

The previous hardening rounds were process- (#267 docs rules) and gate-side (#14 / #15 terraform branch protection). This is the workflow-side companion. The #266 break (`ERROR: no builder "jackin-construct" found`) was caused by hoisting `BUILDX_BUILDER` into workflow-level env, which docker buildx reads as the default-builder selection — surfacing only post-merge on main because `publish-manifest` is push-only and never runs on a `pull_request` event. `docker buildx ls` evaluates `BUILDX_BUILDER` at startup and exits non-zero on a missing-builder reference, regardless of whether the rest of the command would have hit the network. Running it on PR + feature-branch dispatch reproduces the failure shape without registry credentials.

Two checks: `docker buildx ls` (catches workflow-level env-var leaks for any current or future buildx-controlling env var) and `docker buildx imagetools --help >/dev/null` (smoke-test that the buildx CLI plugin is bundled and loadable on the runner). The `construct-required` aggregator now lists `publish-manifest-rehearsal` in `needs:` so the rehearsal's result is rolled up into the same single check name branch protection requires.

Equivalent rehearsals for `build-validator`, `deploy`, and `publish-preview` are deferred — those side-effect surfaces don't have the same network-free reproducibility shape `buildx ls` provides for docker, so the fidelity-vs-cost trade is worse. Reopen if a #266-class break surfaces in any of them.

Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com>
Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant