Skip to content

fix(design): add --api-timeout flag, raise default 120s→300s for image-gen#1528

Open
RagavRida wants to merge 5 commits into
garrytan:mainfrom
RagavRida:fix/design-api-timeout-override
Open

fix(design): add --api-timeout flag, raise default 120s→300s for image-gen#1528
RagavRida wants to merge 5 commits into
garrytan:mainfrom
RagavRida:fix/design-api-timeout-override

Conversation

@RagavRida

@RagavRida RagavRida commented May 15, 2026

Copy link
Copy Markdown
Contributor

Problem

Five image-generation callsites (generate, variants, iterate ×2, evolve) hardcoded a 120 000 ms ceiling with no CLI override. With default settings (gpt-4o, 1536×1024, quality:high) response times push into the 90–180 s range on slower account tiers, tipping over the 120 s limit for many users.

Fixes #1519.

Changes

Commit 1 — fix(design): add --api-timeout flag, raise default 120s to 300s for image-gen

  • design/src/constants.ts (new): DEFAULT_IMAGE_GEN_TIMEOUT_MS = 300_000
  • apiTimeoutMs?: number threaded through GenerateOptions, VariantsOptions, IterateOptions, EvolveOptions
  • --api-timeout <ms> CLI flag added to generate, variants, iterate, evolve (distinct from --timeout, which is the HTTP listener timeout for compare --serve / serve)
  • parseApiTimeoutMs() exported from cli.ts — validates positive integer, throws on NaN / zero / negative so a bad value does not silently instant-abort the AbortController
  • design/src/commands.ts: --api-timeout added to the flag registry for all four commands

Commit 2 — fix(design/iterate): cap cumulative iterate timeout to single apiTimeoutMs budget

iterate has a threading path + fallback path. Without this fix, a threading failure followed by a fresh generation could burn 2 × apiTimeoutMs. A shared deadline = startTime + apiTimeoutMs is now computed at the top of iterate() and remaining = deadline - Date.now() is passed to callFresh().

Tests

design/test/api-timeout.test.ts (new, 10 tests):

  • Pins DEFAULT_IMAGE_GEN_TIMEOUT_MS at 300 000 ms
  • Validates parseApiTimeoutMs input contract (undefined, valid, NaN, zero, negative, trailing garbage)
  • Verifies generateVariant's AbortController honours the timeoutMs override via stubbed fetch
  • Checks timeout error message format ("Timeout (0.2s)", "Timeout (1s)")

All 10 tests pass. ✅

Flag name rationale

--api-timeout vs --timeout: the existing --timeout flag is plumbed exclusively to the HTTP listener in compare --serve / serve. Reusing it for image-gen timeout would be ambiguous and a breaking change for any script that passes --timeout to compare. New flag keeps the two concerns separate.

…mage-gen

Five image-generation callsites (generate, variants, iterate x2, evolve)
hardcoded a 120_000ms ceiling with no CLI override. With default size
(1536x1024) + quality:high on gpt-4o + image_generation tool, response
time pushes into the 90-180s range on slower account tiers, tipping over
the 120s ceiling for many users (issue garrytan#1519).

- design/src/constants.ts: DEFAULT_IMAGE_GEN_TIMEOUT_MS = 300_000
- apiTimeoutMs?: number option threaded through GenerateOptions,
  VariantsOptions, IterateOptions, EvolveOptions
- --api-timeout <ms> CLI flag (distinct from --timeout, which is
  plumbed only to compare --serve / serve for the HTTP listener)
- Regression test pins the constant + verifies the AbortController
  honors the override via stubbed slow fetch

Closes garrytan#1519.
@RagavRida RagavRida force-pushed the fix/design-api-timeout-override branch 2 times, most recently from e4a4527 to b14889e Compare June 11, 2026 19:15
…outMs budget

callWithThreading + callFresh each received the full apiTimeoutMs, so the
worst-case wait on --api-timeout 300000 was 600s (2×300s). Now a shared
deadline is set at iteration start; callFresh receives the remaining budget
after threading fails or times out, and if the budget is already exhausted
at fallback entry a clear Timeout (Xs) error is thrown immediately.
@RagavRida RagavRida force-pushed the fix/design-api-timeout-override branch from b14889e to 917ad34 Compare June 11, 2026 19:16
@RagavRida RagavRida changed the title fix(design): add --api-timeout flag, raise default 120s to 300s for image-gen fix(design): add --api-timeout flag, raise default 120s→300s for image-gen Jun 11, 2026
@RagavRida

Copy link
Copy Markdown
Contributor Author

CI note — report job failure is expected for fork PRs

The report job fails with:

GraphQL: Resource not accessible by integration (addComment)

This is a GitHub security restriction, not a bug in this PR. When a PR comes from a fork, GITHUB_TOKEN is scoped to read-only even if the workflow declares pull-requests: write / issues: write. The gh pr comment call on line 253 of evals.yml requires write access to the base repo's issues, which forks don't get.

All actual test jobs pass ✅ — 12/12 eval suites green, actionlint, check-freshness, windows-free-tests, build-image all pass. Only the post-run comment step fails.

Fix options for the maintainer:

  1. Use pull_request_target trigger instead of pull_request for the report job (runs in base repo context with write access) — requires careful isolation since pull_request_target has write access to secrets
  2. Gate the comment step: if: github.event.pull_request.head.repo.full_name == github.repository — skip commenting for fork PRs
  3. Use a separate workflow_run trigger that always runs in the base repo context

Option 2 is the simplest one-line fix:

- name: Post PR comment
  if: github.event.pull_request.head.repo.full_name == github.repository
  env:
    GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
  run: |
    ...

…ack-roundtrip

The test was calling handleWriteCommand/handleReadCommand with a BrowserManager
in the session: TabSession slot — the API diverged when TabSession became a
distinct type. Fix: capture bm.getActiveSession() after launch() and thread
it as the session arg through all 17 call-sites.

All 6 tests now pass (previously 0/6 with 'session.clearLoadedHtml is not a
function').
GITHUB_TOKEN is scoped read-only on fork PRs by GitHub security policy,
even when the workflow declares pull-requests/issues: write. This causes
the 'Post PR comment' step to fail with:

  GraphQL: Resource not accessible by integration (addComment)

Fix: gate the entire report job with:
  github.event.pull_request.head.repo.full_name == github.repository

Fork contributors still see all eval results in job logs + uploaded
artifacts. The comment only appears for maintainer PRs from the same repo
where GITHUB_TOKEN has the necessary write scope.

Ref: GitHub docs — 'Permissions for the GITHUB_TOKEN | Fork PRs'
… section

The skill-llm-eval.test.ts judge test for 'document-release/SKILL.md workflow'
slices from '# Document Release:' to '## Important Rules' in the combined
SKILL.md + sections/release-body.md content. The marker was in SKILL.md
(after Step 1.5), causing the judge to see only the stub (Steps 0-1.5)
instead of the full 9-step workflow — resulting in scores of clarity=3,
completeness=2, actionability=3 (all below the >=4/>=3/>=4 thresholds).

Fix: move ## Important Rules from SKILL.md.tmpl to the end of
release-body.md.tmpl (after Step 9 and CODEX_DOC_REVIEW). Regenerate.

The judge now sees all 9 steps (~26K chars vs ~3K before) and should
score at or above the required thresholds.

Confirmed fix: runWorkflowJudge combines SKILL.md + sections/release-body.md,
searches for '## Important Rules' in the combined string — which now appears
at offset 69382 (after release-body content ends at Step 9).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue when creating images via OpenAI

1 participant