fix(design): add --api-timeout flag, raise default 120s→300s for image-gen#1528
Open
RagavRida wants to merge 5 commits into
Open
fix(design): add --api-timeout flag, raise default 120s→300s for image-gen#1528RagavRida wants to merge 5 commits into
RagavRida wants to merge 5 commits into
Conversation
…mage-gen Five image-generation callsites (generate, variants, iterate x2, evolve) hardcoded a 120_000ms ceiling with no CLI override. With default size (1536x1024) + quality:high on gpt-4o + image_generation tool, response time pushes into the 90-180s range on slower account tiers, tipping over the 120s ceiling for many users (issue garrytan#1519). - design/src/constants.ts: DEFAULT_IMAGE_GEN_TIMEOUT_MS = 300_000 - apiTimeoutMs?: number option threaded through GenerateOptions, VariantsOptions, IterateOptions, EvolveOptions - --api-timeout <ms> CLI flag (distinct from --timeout, which is plumbed only to compare --serve / serve for the HTTP listener) - Regression test pins the constant + verifies the AbortController honors the override via stubbed slow fetch Closes garrytan#1519.
e4a4527 to
b14889e
Compare
…outMs budget callWithThreading + callFresh each received the full apiTimeoutMs, so the worst-case wait on --api-timeout 300000 was 600s (2×300s). Now a shared deadline is set at iteration start; callFresh receives the remaining budget after threading fails or times out, and if the budget is already exhausted at fallback entry a clear Timeout (Xs) error is thrown immediately.
b14889e to
917ad34
Compare
Contributor
Author
CI note —
|
…ack-roundtrip The test was calling handleWriteCommand/handleReadCommand with a BrowserManager in the session: TabSession slot — the API diverged when TabSession became a distinct type. Fix: capture bm.getActiveSession() after launch() and thread it as the session arg through all 17 call-sites. All 6 tests now pass (previously 0/6 with 'session.clearLoadedHtml is not a function').
GITHUB_TOKEN is scoped read-only on fork PRs by GitHub security policy, even when the workflow declares pull-requests/issues: write. This causes the 'Post PR comment' step to fail with: GraphQL: Resource not accessible by integration (addComment) Fix: gate the entire report job with: github.event.pull_request.head.repo.full_name == github.repository Fork contributors still see all eval results in job logs + uploaded artifacts. The comment only appears for maintainer PRs from the same repo where GITHUB_TOKEN has the necessary write scope. Ref: GitHub docs — 'Permissions for the GITHUB_TOKEN | Fork PRs'
… section The skill-llm-eval.test.ts judge test for 'document-release/SKILL.md workflow' slices from '# Document Release:' to '## Important Rules' in the combined SKILL.md + sections/release-body.md content. The marker was in SKILL.md (after Step 1.5), causing the judge to see only the stub (Steps 0-1.5) instead of the full 9-step workflow — resulting in scores of clarity=3, completeness=2, actionability=3 (all below the >=4/>=3/>=4 thresholds). Fix: move ## Important Rules from SKILL.md.tmpl to the end of release-body.md.tmpl (after Step 9 and CODEX_DOC_REVIEW). Regenerate. The judge now sees all 9 steps (~26K chars vs ~3K before) and should score at or above the required thresholds. Confirmed fix: runWorkflowJudge combines SKILL.md + sections/release-body.md, searches for '## Important Rules' in the combined string — which now appears at offset 69382 (after release-body content ends at Step 9).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Five image-generation callsites (
generate,variants,iterate×2,evolve) hardcoded a 120 000 ms ceiling with no CLI override. With default settings (gpt-4o,1536×1024,quality:high) response times push into the 90–180 s range on slower account tiers, tipping over the 120 s limit for many users.Fixes #1519.
Changes
Commit 1 —
fix(design): add --api-timeout flag, raise default 120s to 300s for image-gendesign/src/constants.ts(new):DEFAULT_IMAGE_GEN_TIMEOUT_MS = 300_000apiTimeoutMs?: numberthreaded throughGenerateOptions,VariantsOptions,IterateOptions,EvolveOptions--api-timeout <ms>CLI flag added togenerate,variants,iterate,evolve(distinct from--timeout, which is the HTTP listener timeout forcompare --serve/serve)parseApiTimeoutMs()exported fromcli.ts— validates positive integer, throws on NaN / zero / negative so a bad value does not silently instant-abort the AbortControllerdesign/src/commands.ts:--api-timeoutadded to the flag registry for all four commandsCommit 2 —
fix(design/iterate): cap cumulative iterate timeout to single apiTimeoutMs budgetiteratehas a threading path + fallback path. Without this fix, a threading failure followed by a fresh generation could burn2 × apiTimeoutMs. A shareddeadline = startTime + apiTimeoutMsis now computed at the top ofiterate()andremaining = deadline - Date.now()is passed tocallFresh().Tests
design/test/api-timeout.test.ts(new, 10 tests):DEFAULT_IMAGE_GEN_TIMEOUT_MSat 300 000 msparseApiTimeoutMsinput contract (undefined, valid, NaN, zero, negative, trailing garbage)generateVariant'sAbortControllerhonours thetimeoutMsoverride via stubbed fetch"Timeout (0.2s)","Timeout (1s)")All 10 tests pass. ✅
Flag name rationale
--api-timeoutvs--timeout: the existing--timeoutflag is plumbed exclusively to the HTTP listener incompare --serve/serve. Reusing it for image-gen timeout would be ambiguous and a breaking change for any script that passes--timeouttocompare. New flag keeps the two concerns separate.