feat(ci): AI upstream release analyzer with draft PR batching#1
Conversation
…hing
Introduce a 3-pass analyzer pipeline that classifies every commit in an
upstream release window as GOOD / NEEDS_REVIEW / SLOP, then cherry-picks
each batch onto its own branch and opens one draft PR per non-empty
batch. All merges remain manual in Phase 1 - advisory only.
Pipeline:
* Pass 1: per-commit classification via gpt-4.1-mini (low tier,
150 req/day on Copilot Student)
* Pass 2: slop verification of first-pass SLOP hits via gpt-5-mini
with automatic fallback to gpt-4.1 then gpt-4.1-mini when the
5-mini tier exhausts its 12 req/day quota or rejects on context
length (4K cap)
* Pass 3: release-level cost-benefit synthesis via gpt-4.1
Trigger:
* upstream-tag-watcher.yml (cron */15) polls upstream ls-remote and
fires repository_dispatch when a new stable tag appears, acting as
a pseudo-webhook since we cannot subscribe to a repo we do not own
* upstream-analyzer.yml handles the dispatch and also runs manually
via workflow_dispatch for ad-hoc release analysis
Output:
* One analysis issue with cost-benefit report, slop ratio, breaking
change assessment, and action items
* Up to 3 draft PRs (sync/upstream-<tag>-{good,needs-review,slop})
each labeled upstream-sync + batch:<verdict> + upstream-tag:<tag>
* All classification JSON artifacts uploaded per workflow run
Prompts live in .github/prompts/*.md and are loaded at runtime so tone
and rubric can be tuned without touching workflow YAML or TS code.
Disable the old sync-upstream.yml workflow:
* Scheduled trigger removed - only manual dispatch behind a typed
confirmation input remains (preserves history for reference)
* Replace printf >> GITHUB_OUTPUT with brace-grouped redirect to
silence shellcheck SC2129
Token budget on Copilot Student tier covers a ~50-commit release with
headroom; the fallback chain keeps the pipeline functional even when
gpt-5-mini's tight quota is exhausted.
There was a problem hiding this comment.
Pull request overview
Adds an AI-driven upstream release analyzer pipeline that classifies commits into batches (GOOD / NEEDS_REVIEW / SLOP), cherry-picks each batch onto its own branch, and opens draft PRs plus an analysis issue; the legacy upstream sync workflow is gated/disabled.
Changes:
- Introduces new scheduled upstream tag watcher + analyzer workflows to dispatch analysis and open draft PRs per batch.
- Adds a TypeScript analyzer implementation (classification, verification with model fallback, synthesis, batch branch creation, artifacts).
- Disables legacy scheduled upstream sync by removing the schedule trigger and gating manual runs behind a typed confirmation input.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
script/upstream-analyzer/types.ts |
Defines shared types for classifications, verifications, synthesis, config, and pipeline output. |
script/upstream-analyzer/slop-verifier.ts |
Implements second-pass verification for SLOP commits with model fallback behavior. |
script/upstream-analyzer/run.ts |
CLI/entrypoint that runs the pipeline and emits workflow artifacts + GitHub outputs. |
script/upstream-analyzer/release-synthesizer.ts |
Pass-3 release synthesis (diffstat + dependency diff + classifications) into a recommendation JSON. |
script/upstream-analyzer/prompt-loader.ts |
Loads prompts from .github/prompts for use by the pipeline. |
script/upstream-analyzer/pipeline.ts |
Orchestrates the 3 passes, batch building/pushing, and writes JSON artifacts. |
script/upstream-analyzer/issue-report.ts |
Renders the analysis issue body, labels, and batch PR bodies from pipeline output. |
script/upstream-analyzer/index.ts |
Re-exports public API surface for the analyzer modules. |
script/upstream-analyzer/github-models-client.ts |
Wraps GitHub Models chat completion calls, JSON parsing, and fallback-chain inference. |
script/upstream-analyzer/git-inspector.ts |
Git plumbing for commit listing, diffs, tag existence, cherry-picks, and pushing branches. |
script/upstream-analyzer/commit-classifier.ts |
Pass-1 per-commit classifier using a prompt + commit diff. |
script/upstream-analyzer/cli-config.ts |
Builds analyzer configuration from workflow/CLI environment variables. |
script/upstream-analyzer/batch-builder.ts |
Creates batch branches from a base tag and cherry-picks commits per verdict. |
.github/workflows/upstream-tag-watcher.yml |
New cron-based watcher that detects upstream tags and dispatches the analyzer workflow. |
.github/workflows/upstream-analyzer.yml |
New workflow that runs the analyzer, uploads artifacts, creates the analysis issue, and opens draft PRs. |
.github/workflows/sync-upstream.yml |
Disables scheduled legacy sync and gates manual invocation behind typed confirmation; fixes output writes. |
.github/prompts/slop-verify.md |
Prompt rubric for second-pass SLOP verification. |
.github/prompts/release-synthesis.md |
Prompt rubric for final release-window synthesis recommendation. |
.github/prompts/commit-classify.md |
Prompt rubric for first-pass per-commit classification. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Optional watcher-specific state (survives even if upstream-version.txt is ahead) | ||
| if [ -f "${WATCHER_STATE_FILE}" ]; then | ||
| LAST_SEEN=$(tr -d '[:space:]' < "${WATCHER_STATE_FILE}") | ||
| else | ||
| LAST_SEEN="$STORED" | ||
| fi |
There was a problem hiding this comment.
The watcher reads WATCHER_STATE_FILE / upstream-version.txt but never updates either after dispatching. As a result, once upstream advances past LAST_SEEN, this cron job will keep dispatching the analyzer every 15 minutes (and repeatedly attempt to open duplicate issues/PRs), potentially burning the model budget. Persist the new LATEST value (e.g., write it to ${WATCHER_STATE_FILE} and commit/push it, or store it in a repo/environment variable) after a successful dispatch so subsequent runs can short-circuit.
| - name: Open draft PRs for batches | ||
| uses: actions/github-script@v7 | ||
| env: | ||
| ISSUE_NUMBER: ${{ steps.issue.outputs.issue_number }} | ||
| TO_TAG: ${{ steps.inputs.outputs.to_tag }} | ||
| with: |
There was a problem hiding this comment.
This workflow always attempts to open draft PRs, even when push_branches is false (dry-run). In that case the batch branches won't exist on origin, so PR creation will fail noisily. Consider guarding the PR-opening step with an if: on steps.inputs.outputs.push_branches == 'true' (and optionally steps.analyze.outputs.has_commits == 'true') to keep dry-runs clean.
| } | ||
|
|
||
| export function buildIssueLabels(synthesis: SynthesisResult, toTag: string): string[] { | ||
| const recommendationLabel = `upstream:${synthesis.recommendation.toLowerCase().replace("_", "-")}` |
There was a problem hiding this comment.
buildIssueLabels only replaces the first underscore in the recommendation string. For HOLD_FOR_HUMAN this produces upstream:hold-for_human (still contains an underscore) instead of upstream:hold-for-human, which will prevent expected label matching/creation. Use a global replacement (e.g., replaceAll / regex) so all underscores become dashes.
| const recommendationLabel = `upstream:${synthesis.recommendation.toLowerCase().replace("_", "-")}` | |
| const recommendationLabel = `upstream:${synthesis.recommendation.toLowerCase().replace(/_/g, "-")}` |
| } | ||
|
|
||
| export function buildIssueLabels(synthesis: SynthesisResult, toTag: string): string[] { | ||
| const recommendationLabel = `upstream:${synthesis.recommendation.toLowerCase().replace("_", "-")}` |
There was a problem hiding this comment.
🔴 replace("_", "-") only replaces the first underscore, producing malformed label for HOLD_FOR_HUMAN
String.replace with a string (non-regex) argument only replaces the first occurrence. For the HOLD_FOR_HUMAN recommendation, "hold_for_human".replace("_", "-") produces "hold-for_human" instead of the intended "hold-for-human". This creates an inconsistent GitHub label upstream:hold-for_human.
Confirmed by running: node -e "console.log('HOLD_FOR_HUMAN'.toLowerCase().replace('_', '-'))" → hold-for_human.
| const recommendationLabel = `upstream:${synthesis.recommendation.toLowerCase().replace("_", "-")}` | |
| const recommendationLabel = `upstream:${synthesis.recommendation.toLowerCase().replaceAll("_", "-")}` |
Was this helpful? React with 👍 or 👎 to provide feedback.
| - name: Fire repository_dispatch to analyzer | ||
| if: steps.decide.outputs.dispatch == 'true' | ||
| uses: actions/github-script@v7 | ||
| with: | ||
| script: | | ||
| const payload = { | ||
| from_tag: '${{ steps.decide.outputs.from_tag }}', | ||
| to_tag: '${{ steps.decide.outputs.to_tag }}', | ||
| upstream_repo: process.env.UPSTREAM_REPO, | ||
| detected_at: new Date().toISOString(), | ||
| }; | ||
| await github.rest.repos.createDispatchEvent({ | ||
| owner: context.repo.owner, | ||
| repo: context.repo.repo, | ||
| event_type: 'upstream-tag-detected', | ||
| client_payload: payload, | ||
| }); | ||
| core.notice(`Dispatched analyzer for ${payload.from_tag} -> ${payload.to_tag}`); |
There was a problem hiding this comment.
🔴 Upstream tag watcher never persists seen-tag state, causing infinite repeated analyzer dispatches
The watcher reads LAST_SEEN from .upstream-watcher-seen-tag or upstream-version.txt (line 62-70), but after dispatching the analyzer it never writes the newly-detected tag back to either file. The old sync-upstream.yml (now disabled) was the only workflow that updated upstream-version.txt. Since the new analyzer pipeline also doesn't update it, every 15-minute cron run will re-detect the same upstream tag ($LATEST != $LAST_SEEN remains true) and fire another upstream-tag-detected dispatch. The analyzer's concurrency group limits parallel runs per to_tag, but completed runs will be followed by queued duplicates indefinitely — creating duplicate issues and draft PRs.
Prompt for agents
The upstream-tag-watcher.yml workflow detects new upstream tags and dispatches the analyzer, but it never persists the detected tag. After dispatching at line 121, the workflow should write the new tag to a state file so subsequent cron runs see that this tag was already dispatched.
Two possible approaches:
1. Commit the tag to upstream-version.txt or .upstream-watcher-seen-tag and push (requires contents: write permission, already only has contents: read).
2. Use GitHub Actions cache to persist the last-seen tag between runs (avoids needing write access to the repo).
The simplest fix would be to add a step after 'Fire repository_dispatch' that writes the LATEST tag to .upstream-watcher-seen-tag, commits it, and pushes to dev. This would also require changing permissions.contents from 'read' to 'write'.
Was this helpful? React with 👍 or 👎 to provide feedback.
| function renderCommitLines(commits: CommitClassification[]): string { | ||
| if (commits.length === 0) return "_(none)_" | ||
| return commits | ||
| .map((c) => `- \`${c.shortSha}\` **${c.subject}** — ${c.reason}`) |
There was a problem hiding this comment.
🔴 Em dash (U+2014) in generated content violates AGENTS.md anti-pattern rule
AGENTS.md explicitly states: "Never use em dashes, en dashes, or AI filler phrases in generated content". The renderCommitLines function uses an em dash (—, U+2014) as a separator in generated issue body content. Should use -- or - instead.
| .map((c) => `- \`${c.shortSha}\` **${c.subject}** — ${c.reason}`) | |
| .map((c) => `- \`${c.shortSha}\` **${c.subject}** - ${c.reason}`) |
Was this helpful? React with 👍 or 👎 to provide feedback.
| if (verifications.length === 0) return "" | ||
| const lines = verifications.map( | ||
| (v) => | ||
| `- \`${v.sha.slice(0, 7)}\` ${v.verifyResult} (behavior delta: ${v.behaviorDelta}) — ${v.reasoning}`, |
There was a problem hiding this comment.
🔴 Em dash (U+2014) in generated verification content violates AGENTS.md anti-pattern rule
AGENTS.md explicitly states: "Never use em dashes, en dashes, or AI filler phrases in generated content". The renderVerifications function uses an em dash (—, U+2014) as a separator in generated issue body content.
| `- \`${v.sha.slice(0, 7)}\` ${v.verifyResult} (behavior delta: ${v.behaviorDelta}) — ${v.reasoning}`, | |
| `- \`${v.sha.slice(0, 7)}\` ${v.verifyResult} (behavior delta: ${v.behaviorDelta}) - ${v.reasoning}`, |
Was this helpful? React with 👍 or 👎 to provide feedback.
| return [ | ||
| "---", | ||
| "", | ||
| "_Generated by `upstream-analyzer` workflow. This issue is advisory — no merges or publishes happen automatically. Phase 1: draft PRs only. The prompt lives in `.github/prompts/` and is editable without touching workflow code._", |
There was a problem hiding this comment.
🔴 Em dash (U+2014) in generated footer content violates AGENTS.md anti-pattern rule
AGENTS.md explicitly states: "Never use em dashes, en dashes, or AI filler phrases in generated content". The buildFooter function uses an em dash (—, U+2014) in the generated issue footer text: "This issue is advisory — no merges...".
| "_Generated by `upstream-analyzer` workflow. This issue is advisory — no merges or publishes happen automatically. Phase 1: draft PRs only. The prompt lives in `.github/prompts/` and is editable without touching workflow code._", | |
| "_Generated by `upstream-analyzer` workflow. This issue is advisory -- no merges or publishes happen automatically. Phase 1: draft PRs only. The prompt lives in `.github/prompts/` and is editable without touching workflow code._", |
Was this helpful? React with 👍 or 👎 to provide feedback.
| lines.push(`issue_title<<EOF_TITLE`, artifacts.issueTitle, `EOF_TITLE`) | ||
| lines.push(`issue_labels<<EOF_LABELS`, JSON.stringify(artifacts.issueLabels), `EOF_LABELS`) | ||
| lines.push(`batch_specs<<EOF_SPECS`, JSON.stringify(artifacts.batchPrSpecs), `EOF_SPECS`) | ||
| lines.push(`has_commits=${artifacts.batchPrSpecs.some((s) => !s.skipped) ? "true" : "false"}`) | ||
| await appendFile(outputPath, `${lines.join("\n")}\n`, "utf8") |
There was a problem hiding this comment.
📝 Info: GitHub Actions heredoc delimiters could theoretically collide with content
In run.ts:60-62, the emitGithubOutputs function uses fixed heredoc delimiters (EOF_TITLE, EOF_LABELS, EOF_SPECS) for multi-line GitHub Actions outputs. If artifacts.issueTitle ever contained the literal string EOF_TITLE on its own line, the heredoc would terminate prematurely and corrupt the output. In practice, the issue title is built from tag names and recommendation enums (issue-report.ts:168), making this collision astronomically unlikely. Not a bug, but worth noting if the title format ever changes to include user-controlled content.
Was this helpful? React with 👍 or 👎 to provide feedback.
| .map((line) => { | ||
| const [sha, shortSha, subject, author, date] = line.split("\t") | ||
| return { sha, shortSha, subject, author, date } | ||
| }) |
There was a problem hiding this comment.
📝 Info: Commit subject parsing could break on tab characters in git log output
In git-inspector.ts:26, line.split("\t") splits on tab characters, but the %s format specifier (commit subject) could theoretically contain tabs. If it did, the destructuring would misalign author and date. In practice, git commit subjects almost never contain tab characters (and many tools strip them), so this is not a practical concern for upstream commit analysis.
Was this helpful? React with 👍 or 👎 to provide feedback.
| export async function inferJsonWithFallback( | ||
| req: FallbackInferenceRequest, | ||
| ): Promise<InferenceResult> { | ||
| const chain = [req.primaryModel, ...req.fallbackModels] | ||
| let lastError: unknown | ||
| for (let i = 0; i < chain.length; i++) { | ||
| const model = chain[i] | ||
| try { | ||
| return await inferJson({ | ||
| model, | ||
| systemPrompt: req.systemPrompt, | ||
| userPrompt: req.userPrompt, | ||
| maxTokens: req.maxTokens, | ||
| temperature: req.temperature, | ||
| }) | ||
| } catch (err) { | ||
| lastError = err | ||
| const isLast = i === chain.length - 1 | ||
| if (isLast || !shouldFallback(err)) throw err | ||
| const reason = err instanceof GithubModelsError ? `${err.status} ${err.body.slice(0, 160)}` : String(err) | ||
| req.onFallback?.(model, chain[i + 1], reason) | ||
| } | ||
| } | ||
| throw lastError ?? new Error("inferJsonWithFallback exhausted without error (unreachable)") | ||
| } |
There was a problem hiding this comment.
📝 Info: Fallback chain correctly re-throws non-recoverable errors
Examined the inferJsonWithFallback logic carefully (github-models-client.ts:136-154). The fallback loop only continues to the next model when shouldFallback(err) returns true (quota or context-length errors). For all other errors (network failures, auth errors, 500s), it correctly re-throws immediately at line 149 (if (isLast || !shouldFallback(err)) throw err). The lastError variable at line 154 is technically unreachable code (the loop always either returns or throws), but the defensive fallback throw is harmless.
Was this helpful? React with 👍 or 👎 to provide feedback.
| import { mkdir, writeFile } from "node:fs/promises" | ||
| import { join } from "node:path" |
There was a problem hiding this comment.
📝 Info: Pipeline uses node:fs/promises imports despite Bun-only runtime convention
AGENTS.md states the runtime is "Bun only" and types should use bun-types not @types/node. Several files import from node:fs/promises and node:path (pipeline.ts:1-2, run.ts:3-4, prompt-loader.ts:1-2). Bun fully supports node: protocol imports and these modules work correctly. The git-inspector.ts file correctly uses Bun's $ shell API. While using Bun-native file APIs (like Bun.write()) would be more idiomatic, the node:fs/promises imports are functional and this is a script rather than the core plugin code. Not flagging as a bug since the convention is about runtime/types, not import source.
Was this helpful? React with 👍 or 👎 to provide feedback.
| permissions: | ||
| contents: read | ||
| actions: write |
There was a problem hiding this comment.
🚩 Upstream tag watcher permissions may be insufficient for dispatch + state commit
The watcher workflow has permissions: contents: read and actions: write. The github.rest.repos.createDispatchEvent call requires write access. While actions: write may cover workflow dispatch in some configurations, if the fix for BUG-0002 involves committing a state file back to the repo, the permissions would need to be upgraded to contents: write. Worth verifying that the current permissions are sufficient even for just the dispatch event.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7626cd72fc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| contents: read | ||
| actions: write |
There was a problem hiding this comment.
Grant contents:write for repository dispatch
The watcher calls github.rest.repos.createDispatchEvent, but this job only grants contents: read; GitHub requires write-level repository contents permission for repository dispatch, so this step will fail with a 403 and the analyzer workflow will never be triggered automatically. Update this workflow permission to contents: write (least-privilege can still be preserved by keeping other scopes minimal).
Useful? React with 👍 / 👎.
| - name: Fire repository_dispatch to analyzer | ||
| if: steps.decide.outputs.dispatch == 'true' | ||
| uses: actions/github-script@v7 |
There was a problem hiding this comment.
Persist watcher state after successful dispatch
This workflow reads WATCHER_STATE_FILE to suppress repeated alerts, but never writes the newly detected tag after dispatching; with upstream-version.txt unchanged, every 15-minute run will keep redispatching the same range and repeatedly create duplicate analysis issues/PR attempts. Persist LATEST (or otherwise advance state) after a successful dispatch so each upstream tag is analyzed once.
Useful? React with 👍 / 👎.
Summary
Adds a 3-pass AI pipeline that classifies every commit in an upstream release window (GOOD / NEEDS_REVIEW / SLOP), cherry-picks each batch onto its own branch, and opens one draft PR per non-empty batch. Phase 1 posture: advisory only. No auto-merge. No auto-publish.
Replaces the auto-scheduling on
sync-upstream.yml(now gated behind a typed confirmation input, kept for reference).Architecture
Fallback semantics
The new `inferJsonWithFallback` in `github-models-client.ts` detects 429 (rate limit), 403 with rate-limit body, and 400 context-length errors, then walks a model chain. Default verifier chain:
```
openai/gpt-5-mini → openai/gpt-4.1 → openai/gpt-4.1-mini
```
Any of these can be overridden via `workflow_dispatch` inputs (`model_slop_verify` and `model_slop_verify_fallbacks`).
The verifier truncates diffs to 10K chars (gpt-5-mini caps at ~4K input tokens on Copilot Pro/Student tier; fallback models tolerate larger payloads).
Phase 1 posture: what's blocked
No upstream commits can be lost: every commit is either cherry-picked into some batch branch or recorded in `batches.json`'s `conflictCommits` when cherry-pick fails.
Files
New
Modified
Verification
Testing plan
Merge this PR to `dev`, then from the Actions UI manually trigger Upstream Analyzer with:
Expected output:
If the pipeline proves useful on this first run, we'll tune the prompts and consider flipping to Phase 2 (auto-merge GOOD batch on `MERGE_CLEAN` verdict).
Out of scope
Summary by cubic
Adds a 3-pass upstream release analyzer that classifies commits (GOOD / NEEDS_REVIEW / SLOP), cherry-picks them into batch branches, and opens draft PRs per batch. Disables the scheduled legacy sync; Phase 1 is advisory only (no auto-merge or publish).
New Features
upstream-tag-watcher.ymlpollscode-yeongyu/oh-my-openagentfor new stable tags and dispatches analysis.upstream-analyzer.ymlruns 3 passes: per-commit classify (openai/gpt-4.1-mini), SLOP verify with fallback (openai/gpt-5-mini→openai/gpt-4.1→openai/gpt-4.1-mini), and release synthesis (openai/gpt-4.1).sync/upstream-<tag>-{good|needs-review|slop}, and uploads JSON artifacts..github/prompts/; scripts live inscript/upstream-analyzer/. Conflicting cherry-picks are logged inbatches.json.Migration
Sync Upstreamis disabled and only runs with the typed inputyes-legacy-force-sync.from_tag,to_tag, andpush_branches: true. Expect one analysis issue and up to 3 draft PRs.Written for commit 7626cd7. Summary will update on new commits.