Skip to content

[Bug]: Flaky test in bun runner: isolated-agent model override precedence expects 'openai' but gets 'anthropic' #32084

@velamints2

Description

@velamints2

Description

The test applies model overrides with correct precedence in src/cron/isolated-agent.uses-last-non-empty-agent-text-as.test.ts fails intermittently under the bun test runner but passes consistently under node.

Failure

AssertionError: expected 'anthropic' to be 'openai' // Object.is equality

Expected: "openai"
Received: "anthropic"

❯ expectEmbeddedProviderModel src/cron/isolated-agent.uses-last-non-empty-agent-text-as.test.ts:104:26
❯ src/cron/isolated-agent.uses-last-non-empty-agent-text-as.test.ts:424:7

Analysis

The test runs three sequential turns inside a single withTempHome scope:

  1. Turn 1runCronTurn with model: "openai/gpt-4.1-mini" in job payload → expects provider: "openai"
  2. Turn 2runTurnWithStoredModelOverride (session store has providerOverride: "openai", modelOverride: "gpt-4.1-mini") → expects provider: "openai"gets anthropic
  3. Turn 3runTurnWithStoredModelOverride with model: "anthropic/claude-opus-4-5" → expects provider: "anthropic"

The failure in Turn 2 suggests the session model override path (cronSession.sessionEntry.modelOverride) is not being applied, causing fallback to the default model from makeCfg which is anthropic/claude-opus-4-5.

Suspected root cause

loadModelCatalog is mocked via vi.fn(), and resolveAllowedModelRef uses the catalog to validate the override. In Turn 2, the test calls vi.mocked(loadModelCatalog).mockResolvedValue(deterministicCatalog) before the turn, but the stored session override goes through a different code path (resolveCronSession → sessionEntry.modelOverride) that calls resolveAllowedModelRef with await loadCatalog().

Under bun, the lazy loadCatalog closure may retain a stale (empty) catalog from a previous mock clear, causing resolveAllowedModelRef to return an error (model not in catalog) and silently falling back to the default anthropic provider.

CI Reference

This was observed in PR #31972 CI run:

  • Job: checks (bun, test, pnpm canvas:a2ui:bundle && bunx vitest run --config vitest.unit.config.ts)
  • The node runner job passed all tests.

Environment

  • bun runner: bunx vitest
  • node runner: pnpm test (passes)
  • Test file: src/cron/isolated-agent.uses-last-non-empty-agent-text-as.test.ts:424
  • Test harness default model: anthropic/claude-opus-4-5 (in makeCfg)

Suggested Fix

Consider resetting the internal catalog cache variable in the loadCatalog closure between turns, or adding explicit vi.mocked(loadModelCatalog).mockClear() before each mock setup call. Alternatively, investigate whether bun's vitest mock behavior differs from node's for mockResolvedValue on already-resolved promises.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions