Skip to content

docs(rules): codify workflow env scoping and push-only-job smoke-test rules#267

Merged
donbeave merged 1 commit into
mainfrom
docs/workflow-merge-safety-rules
May 9, 2026
Merged

docs(rules): codify workflow env scoping and push-only-job smoke-test rules#267
donbeave merged 1 commit into
mainfrom
docs/workflow-merge-safety-rules

Conversation

@donbeave

@donbeave donbeave commented May 9, 2026

Copy link
Copy Markdown
Member

Summary

Codify two CI-merge-safety rules that the recent #256 / #266 incident exposed: workflow-level env vars that target third-party CLIs (notably BUILDX_BUILDER) leak into every job in the file and silently break unrelated jobs at runtime; and push-only / main-only jobs (build-validator, publish-manifest, deploy, publish-preview) never run on a pull_request event, so PR-time CI is structurally unable to verify them. Both failure modes fell through PR-time CI and three review passes during #256 because the green PR did not — and could not — exercise the post-merge code path. The merged change went red on main 30 seconds after it landed.

This PR is documentation-only. It adds three rules to PULL_REQUESTS.md and a one-line cross-reference in AGENTS.md:

  1. Scope third-party-CLI env vars to the consuming job. Workflow-level env: is reserved for in-house naming and paths. Any env var that a third-party CLI dereferences as default-selection (BUILDX_BUILDER, DOCKER_BUILDKIT, GH_TOKEN / GITHUB_TOKEN, KUBECONFIG, AWS_PROFILE, RUSTUP_TOOLCHAIN, npm_config_*) must be set at the job level only.
  2. Hard-gate registry / production publishing to main. Codifies the is_publish flag pattern from construct.yml as the canonical shape for every workflow that writes to a public registry, tag, release, or Homebrew formula. PRs and feature-branch dispatches must build/verify only — never publish. The contract spans login, push-by-digest, digest upload, and manifest publish; all gate on is_publish == 'true', while the local-only build path gates on != 'true' and runs from any branch.
  3. Smoke-test push-only / main-only jobs before requesting merge. When a PR touches such a job, its if: clause, its needs: chain, an env var it consumes, the recipe it ultimately runs, or a composite action it depends on — the agent must run gh workflow run <workflow.yml> --ref <branch> against the PR's feature branch and confirm the touched job succeeded, before asking the operator to merge. Workflows that hard-gate to main-only (e.g. publish-preview is locked to dispatch-from-main because the tap is public) require the PR body to call out the gap explicitly.

What's deferred (follow-up PRs)

  • A companion jackin-github-terraform PR raising the required-status-check list for jackin to include the new ci-required, construct-required, docs-required, and validate aggregator jobs, plus enabling strict_required_status_checks_policy and required_approving_review_count = 1. That change lives outside this repo because branch-protection rules are managed via Terraform.
  • Reusable-workflow dry-run: true inputs for the four push-only jobs so the full job chain can be exercised in PR with side effects mocked. Higher cost / higher fidelity than the smoke-test rule shipped here; deferred until the smoke-test rule plus the terraform-side checks prove insufficient.

Verify locally

Checkout

export TIRITH=0
mkdir -p "$HOME/Projects/jackin-project/test"
cd "$HOME/Projects/jackin-project/test"

if [ ! -d jackin/.git ]; then
  git clone https://github.com/jackin-project/jackin.git
fi

cd jackin
mise trust
git fetch -f origin docs/workflow-merge-safety-rules:refs/remotes/origin/docs/workflow-merge-safety-rules
git checkout -B docs/workflow-merge-safety-rules refs/remotes/origin/docs/workflow-merge-safety-rules

Static checks

markdownlint AGENTS.md PULL_REQUESTS.md 2>&1 | head -10 || true
diff <(grep -c '^## ' PULL_REQUESTS.md) <(echo 14)

(Markdownlint optional; the second diff confirms PULL_REQUESTS.md picked up the new top-level section.)

… rules

The #256 refactor and its hotfix #266 surfaced a
class of CI break that was invisible to PR-time CI and to all three
review passes: an env var hoisted to workflow level (`BUILDX_BUILDER`)
is read by docker buildx as the default-builder selection, so the
gated job that did not create that builder fails at runtime — but
only on push to main, because the job is gated to push-to-main and
never runs on a pull_request event.

Add two rules to PULL_REQUESTS.md so the next agent does not have to
re-derive them.

"Scope third-party-CLI env vars to the consuming job" — names the
env-var-leak failure mode, lists the canonical offenders
(`BUILDX_BUILDER`, `DOCKER_BUILDKIT`, `GH_TOKEN` /
`GITHUB_TOKEN`, `KUBECONFIG`, `AWS_PROFILE`,
`RUSTUP_TOOLCHAIN`, `npm_config_*`), and points at the canonical
example (#266). Workflow-level `env:` is reserved for in-house
naming where the value has no runtime effect on third-party tooling.

"Hard-gate registry / production publishing to main" — codifies
the `is_publish` flag pattern from construct.yml as the canonical
shape for any workflow that writes to a public registry, tag,
release, or Homebrew formula. PRs and feature-branch dispatches must
build / verify only, never publish. The contract spans login, push
by digest, digest upload, and manifest publish — all gate on
`is_publish == 'true'`; the local-only build path gates on
`is_publish != 'true'` and runs from any branch.

"Smoke-test push-only / main-only jobs before requesting merge" —
requires the agent to run `gh workflow run --ref <branch>` against
the PR's feature branch and confirm the touched job succeeds before
asking the operator to merge. Lists the four current jackin
push-only paths (`build-validator`, `publish-manifest`,
`deploy`, `publish-preview`). Calls out the harder case where a
job is hard-gated to dispatch-from-main (e.g. `publish-preview`
publishes to a public Homebrew tap) and requires the PR description
to acknowledge the gap.

AGENTS.md picks up a one-line bullet under the existing
"Pull requests" section pointing at the new PULL_REQUESTS.md
section, so an agent reading AGENTS.md first sees the rule named
without having to scroll the longer doc.

These are documentation-only changes — no behavior change, no
schema change, no migration. The rules apply prospectively to PRs
that touch `.github/workflows/**` or `.github/actions/**`.

Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com>
Co-authored-by: Claude <noreply@anthropic.com>
@donbeave donbeave merged commit cab67f2 into main May 9, 2026
17 checks passed
@donbeave donbeave deleted the docs/workflow-merge-safety-rules branch May 9, 2026 10:01
donbeave added a commit to jackin-project/jackin-github-terraform that referenced this pull request May 9, 2026
…iewer; strict up-to-date branch policy (#14)

Raise the merge gate for the `jackin` repo to match the path-aware-workflow refactor in jackin-project/jackin#256. Three changes.

`variables.tf` replaces the single `docs-link-check` entry with the four aggregator-style checks (`ci-required`, `construct-required`, `docs-required`, plus the still-direct `docs-link-check`), the new Renovate validate job (`validate`), and the existing `DCO` gate. The aggregators were added in #256 and roll up the path-aware-gated jobs in each workflow, so adding or removing a gated job inside any of those workflows in the future does not require a terraform change.

`branch-protection.tf` flips two policy knobs on the `protect_main` ruleset: `required_approving_review_count` from 0 to 1 (every PR now needs a human eyeball before merge) and `strict_required_status_checks_policy` from false to true (the PR branch must be up-to-date with main before merge so the green CI run reflects the exact code that lands).

This is the second half of the post-#266 hardening; the first half (the documentation rules in jackin-project/jackin#267) addresses what an agent reviewer should check at PR time, and this terraform change addresses what GitHub itself will let merge.

Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com>
Co-authored-by: Claude <noreply@anthropic.com>
donbeave added a commit that referenced this pull request May 9, 2026
… env-var leaks (#273)

Tier 3 follow-up to the post-#266 hardening. Add a PR-time `publish-manifest-rehearsal` job to `construct.yml` that runs the same docker buildx CLI plumbing `publish-manifest` depends on, without the side-effect steps (Docker Hub login, `imagetools create`, `imagetools inspect`).

The previous hardening rounds were process- (#267 docs rules) and gate-side (#14 / #15 terraform branch protection). This is the workflow-side companion. The #266 break (`ERROR: no builder "jackin-construct" found`) was caused by hoisting `BUILDX_BUILDER` into workflow-level env, which docker buildx reads as the default-builder selection — surfacing only post-merge on main because `publish-manifest` is push-only and never runs on a `pull_request` event. `docker buildx ls` evaluates `BUILDX_BUILDER` at startup and exits non-zero on a missing-builder reference, regardless of whether the rest of the command would have hit the network. Running it on PR + feature-branch dispatch reproduces the failure shape without registry credentials.

Two checks: `docker buildx ls` (catches workflow-level env-var leaks for any current or future buildx-controlling env var) and `docker buildx imagetools --help >/dev/null` (smoke-test that the buildx CLI plugin is bundled and loadable on the runner). The `construct-required` aggregator now lists `publish-manifest-rehearsal` in `needs:` so the rehearsal's result is rolled up into the same single check name branch protection requires.

Equivalent rehearsals for `build-validator`, `deploy`, and `publish-preview` are deferred — those side-effect surfaces don't have the same network-free reproducibility shape `buildx ls` provides for docker, so the fidelity-vs-cost trade is worse. Reopen if a #266-class break surfaces in any of them.

Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com>
Co-authored-by: Claude <noreply@anthropic.com>
donbeave added a commit that referenced this pull request May 18, 2026
… rules (#267)

Codify two CI-merge-safety rules in `PULL_REQUESTS.md` plus a one-line cross-reference in `AGENTS.md`. The recent #256#266 incident exposed a class of break invisible to PR-time CI: a workflow-level env var (`BUILDX_BUILDER`) leaked into the `publish-manifest` job, where docker buildx read it as the default-builder selection and failed at runtime — but only on push to main, because that job is push-only and never runs on a `pull_request` event.

Three rules now live in `PULL_REQUESTS.md`. First, third-party-CLI env vars (`BUILDX_BUILDER`, `DOCKER_BUILDKIT`, `GH_TOKEN`/`GITHUB_TOKEN`, `KUBECONFIG`, `AWS_PROFILE`, `RUSTUP_TOOLCHAIN`, `npm_config_*`) must be scoped to the consuming job; workflow-level `env:` is reserved for in-house naming/paths. Second, registry / production publishing must hard-gate on main — the `is_publish` flag pattern from `construct.yml` is the canonical shape, and PRs / feature-branch dispatches must build-and-verify only, never publish. Third, push-only / main-only / `workflow_run`-gated jobs (`build-validator`, `publish-manifest`, `deploy`, `publish-preview`) must be smoke-tested via `gh workflow run --ref <feature-branch>` before requesting merge, since PR-time CI structurally cannot exercise them.

A companion terraform PR (jackin-project/jackin-github-terraform#14) raises the matching protection-rule changes — required-status-check list, single approving review, strict up-to-date branch policy.

Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com>
Co-authored-by: Claude <noreply@anthropic.com>
donbeave added a commit that referenced this pull request May 18, 2026
… env-var leaks (#273)

Tier 3 follow-up to the post-#266 hardening. Add a PR-time `publish-manifest-rehearsal` job to `construct.yml` that runs the same docker buildx CLI plumbing `publish-manifest` depends on, without the side-effect steps (Docker Hub login, `imagetools create`, `imagetools inspect`).

The previous hardening rounds were process- (#267 docs rules) and gate-side (#14 / #15 terraform branch protection). This is the workflow-side companion. The #266 break (`ERROR: no builder "jackin-construct" found`) was caused by hoisting `BUILDX_BUILDER` into workflow-level env, which docker buildx reads as the default-builder selection — surfacing only post-merge on main because `publish-manifest` is push-only and never runs on a `pull_request` event. `docker buildx ls` evaluates `BUILDX_BUILDER` at startup and exits non-zero on a missing-builder reference, regardless of whether the rest of the command would have hit the network. Running it on PR + feature-branch dispatch reproduces the failure shape without registry credentials.

Two checks: `docker buildx ls` (catches workflow-level env-var leaks for any current or future buildx-controlling env var) and `docker buildx imagetools --help >/dev/null` (smoke-test that the buildx CLI plugin is bundled and loadable on the runner). The `construct-required` aggregator now lists `publish-manifest-rehearsal` in `needs:` so the rehearsal's result is rolled up into the same single check name branch protection requires.

Equivalent rehearsals for `build-validator`, `deploy`, and `publish-preview` are deferred — those side-effect surfaces don't have the same network-free reproducibility shape `buildx ls` provides for docker, so the fidelity-vs-cost trade is worse. Reopen if a #266-class break surfaces in any of them.

Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com>
Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant