Fix skills-install prompt getting stuck with concurrent wrangler instances#14220
Fix skills-install prompt getting stuck with concurrent wrangler instances#14220dario-piotrowicz wants to merge 1 commit into
Conversation
🦋 Changeset detectedLatest commit: 5328d68 The changes in this PR will be included in the next version bump. This PR includes changesets to release 4 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Codeowners approval required for this PR:
Show detailed file reviewers
|
|
UnknownError: ProviderInitError |
|
@dario-piotrowicz Bonk workflow failed. Check the logs for details. View workflow run · To retry, trigger Bonk again. |
|
✅ All changesets look good |
| if (!force) { | ||
| writeSkillsInstallMetadataFile({ | ||
| version: 1, | ||
| accepted: "unanswered", | ||
| date: new Date().toISOString(), | ||
| pid: process.pid, | ||
| }); | ||
| } |
There was a problem hiding this comment.
🔴 "unanswered" metadata written before all skip conditions, permanently suppressing the prompt on transient failures
The "unanswered" metadata file is written at line 93-99, before agent detection (line 104), the no-agents check (line 113), and the non-interactive terminal check (line 121). If any of these checks cause an early return, the "unanswered" file remains on disk. On every subsequent run, readSkillsInstallMetadataFile() returns the existing config at agents-skills-install.ts:60-61, so the function enters the early-return block at line 61 and returns at line 75 — permanently suppressing the prompt.
This is a regression from the old code (agents-skills-install.ts:71-92 in the base commit) which only wrote metadata after the user answered the prompt. Affected scenarios:
- Non-interactive terminal: User runs wrangler once in a non-TTY environment (e.g., piped output, IDE task runner). The
"unanswered"file is created. On the next interactive run, the prompt is silently skipped. - Agent detection failure: WASM load fails once (a transient error). The prompt is permanently suppressed even after agents are installed.
- No agents detected: User runs wrangler before installing any AI coding agent. After installing one, they still never see the prompt.
The fix is to move the "unanswered" write to just before the prompt (after all skip conditions), or delete the metadata file in the early-return paths.
Prompt for agents
The writeSkillsInstallMetadataFile call at lines 93-100 is placed too early in the function — before agent detection (line 104), the no-agents check (line 113), and the non-interactive terminal check (line 121). If any of these subsequent checks triggers an early return, the 'unanswered' metadata file is left on disk, permanently suppressing the prompt on future runs.
The intent of writing 'unanswered' early is to (a) handle concurrent instances and (b) survive interruptions during the slow agent-detection WASM load. But the current placement also covers cases where the prompt was never going to be shown (no agents, non-interactive terminal, detection failure), which is a regression from the old behavior.
Approach 1 (recommended): Move the writeSkillsInstallMetadataFile call to just before the prompts() call — after the agent detection, no-agents check, and isInteractive check all pass. This preserves the concurrency protocol (the PID check before the prompt) while not leaving stale metadata when the prompt wouldn't be shown. The tradeoff is that a crash during agent detection won't leave a marker, but agent detection is a local WASM operation that's unlikely to be interrupted.
Approach 2: Keep the early write but delete or overwrite the metadata file in each of the early-return paths (agent detection failure at line 110, no agents at line 117, non-interactive at line 125). This would require calling something like deleteSkillsInstallMetadataFile() or writing metadata with accepted: false in each path.
Files to modify: packages/wrangler/src/agents-skills-install.ts (move or add cleanup in the early return paths). Also update the pre-prompt PID re-read at line 137-140 if the write is moved later.
Was this helpful? React with 👍 or 👎 to provide feedback.
| if (existingConfig.accepted === "unanswered") { | ||
| // Another wrangler instance is currently showing the prompt (or | ||
| // was interrupted before it could answer). Overwrite the file | ||
| // with this process's PID so the first instance can detect that | ||
| // a concurrent process started and skip its own prompt, avoiding | ||
| // a stuck/corrupted TTY. | ||
| writeSkillsInstallMetadataFile({ | ||
| version: 1, | ||
| accepted: "unanswered", | ||
| date: new Date().toISOString(), | ||
| pid: process.pid, | ||
| }); | ||
| } | ||
| return; |
There was a problem hiding this comment.
🚩 Concurrent instance protocol leaves prompt permanently suppressed after concurrency resolves
By design, when two wrangler instances run concurrently, neither shows the prompt. The metadata file ends up with accepted: "unanswered" and the second instance's PID. On ALL subsequent single-instance runs, the function sees the existing "unanswered" metadata at agents-skills-install.ts:61, overwrites the PID at line 68-73, and returns at line 75 — the user is never prompted again. This is the stated intent of the PR ("any interruption leaves behind a file that prevents the prompt from resurfacing"), but it means a single concurrent-run incident permanently disables the feature. Users must manually run wrangler --install-skills to recover. Consider whether a TTL or retry mechanism for "unanswered" state would be more user-friendly.
Was this helpful? React with 👍 or 👎 to provide feedback.
| if (!concurrentInstanceDetected) { | ||
| process.nextTick(() => { | ||
| process.exit(1); | ||
| }); | ||
| } |
There was a problem hiding this comment.
🚩 Metrics are lost on Ctrl+C due to process.exit(1) via nextTick
The old code (base commit) sent a "User dismissed (SIGINT)" metrics event and awaited allMetricsDispatchesCompleted() before calling process.exit(1), ensuring the metrics dispatch completed. The new code schedules process.exit(1) via process.nextTick in the onState callback at line 189-191, which fires before the async continuation can send any metrics event. This means Ctrl+C dismissals are no longer tracked in telemetry at all. The allMetricsDispatchesCompleted import was also removed (line 16 in the diff). This may be acceptable since the "unanswered" state conflates multiple scenarios anyway, but it's a loss of telemetry signal compared to the old code.
Was this helpful? React with 👍 or 👎 to provide feedback.
create-cloudflare
@cloudflare/deploy-helpers
@cloudflare/kv-asset-handler
miniflare
@cloudflare/pages-shared
@cloudflare/unenv-preset
@cloudflare/vite-plugin
@cloudflare/vitest-pool-workers
@cloudflare/workers-auth
@cloudflare/workers-editor-shared
@cloudflare/workers-utils
wrangler
@cloudflare/wrangler-bundler
commit: |
|
Closing in favour of #14264 |
This fixes the concurrency bug described in #14116 (comment) while trying to simplify the pre-existing flow where we specifically listen for Ctrl+C events.
A picture of a cute animal (not mandatory, but encouraged)