Skip to content

fix(workflows): Don't leak the previous user's workflows after a login or logout#6944

Merged
facumenzella merged 6 commits into
mainfrom
facu/workflow-list-cache-guard
Jun 9, 2026
Merged

fix(workflows): Don't leak the previous user's workflows after a login or logout#6944
facumenzella merged 6 commits into
mainfrom
facu/workflow-list-cache-guard

Conversation

@facumenzella

@facumenzella facumenzella commented Jun 8, 2026

Copy link
Copy Markdown
Member

Follow-up to #6917.

What

#6917 added a cross-user generation guard to the workflow detail cache, and a partial guard on the list fetch (it drops a late list response from the previous user on identity change). But a narrow window remained: between the success-path currentCacheGeneration() == generation re-check and the actual, still-unguarded cache(workflowsList:) write, a clearCache() on a login/logout could land and the previous user's offeringId → workflowId map would repopulate the cleared cache.

That map is the user-targeted part (targeting/experiments can map the same offering to a different workflow per user), so the residual matters even though it self-heals on the next offerings refresh.

This folds the check into the write so there's no window:

  • cache(workflowsList:ifGeneration:) does the generation check, the in-memory + disk writes, and the detail prune atomically under detailsLock, and returns whether it landed. The manager's separate "re-check then write" collapses into that one guarded call, and it skips prefetching the details of a dropped list.
  • restoreWorkflowsListFromDisk() and the list clear in clearCache() now run under the same lock too, so clear / write / restore are all mutually atomic (one generation bump covers list + details).
  • Split pruneWorkflowDetails into a locked wrapper + a lock-free …Locked body, since the guarded write already holds the non-recursive Lock and would otherwise self-deadlock.

forceWorkflowsListCacheStale() stays lock-free on purpose: it only marks an existing entry stale, it never writes a new list, so it can't reintroduce a previous user's map.

The race, and why the lock closes it

The fix works because the list write and clearCache() both serialize on detailsLock, so the generation check and the write can't be split by a clear. Either the clear lands first (write sees the bumped generation and drops) or the write lands first (the clear wipes it right after). No interleaving in between.

sequenceDiagram
    participant M as WorkflowManager
    participant C as WorkflowsCache
    participant ID as IdentityManager

    Note over M: getWorkflowsList(user A) issued
    M->>C: currentCacheGeneration() → G
    Note over M: backend.getWorkflows(A) in flight…

    ID->>C: login/logout → clearCache()
    Note over C: under detailsLock:<br/>generation G → G+1,<br/>list + details cleared

    Note over M: A's list response arrives
    M->>C: cache(workflowsList: A, ifGeneration: G)
    Note over C: under detailsLock:<br/>G ≠ G+1 → drop write, return false
    C-->>M: false
    M->>M: skip prefetch, fire onComplete
    Note over C: user B's cache stays clean
Loading

Before this PR, the manager did currentCacheGeneration() == generation and then cache(workflowsList:) as two separate steps, so a clearCache() landing between them would write A's offeringId → workflowId map into B's freshly cleared cache.

Plus: the prefetched detail's in-memory write had the same split

The list write and the prefetched detail's disk write are both guarded by the generation captured when the list fetch was issued, so a clearCache() correctly drops them. The matching in-memory detail write wasn't: getWorkflow re-captured currentCacheGeneration() at its own call time. From the prefetch path that read happens after the list is cached, so a clearCache() landing in that window makes the capture read the already-bumped generation, the write passes its own guard (new == new), and the previous user's (user-scoped) detail repopulates the cleared in-memory cache.

getWorkflow now takes an optional ifGeneration:; prefetch forwards the list's generation so the in-memory and disk writes share one token. On-demand callers pass nil and keep capturing fresh (their request really is issued at that moment), so their behavior is unchanged.

Scope

Workflows cache only. The offerings cache (DeviceCache.offeringsCachedObject) is intentionally left as-is.

AI session context

Metadata

Goal

Close the cross-user generation-guard gaps the workflows cache still had after #6917, scoped to the workflows cache only (offerings explicitly out of scope). Two gaps: the list-cache check-then-write window, and the prefetched detail's in-memory write re-capturing the generation instead of using the list's.

Initial Prompt

Two threads. (1) The list-cache window was a follow-up flagged during #6917 review (vegaro: "the list of workflows should probably be protected with a similar lock from cross-user leaks… Can come in a future PR"); the human asked to see the workflows-only fix and confirmed "we must only touch workflows." (2) On a fresh pass over the diff, the human identified that getWorkflow doing let generation = self.workflowsCache.currentCacheGeneration() was itself the bug for the prefetch in-memory write, and asked how to fix it.

Important Follow-up Prompts

  • Workflows-only; do not touch the offerings cache.
  • "creo que el problema es que getWorkflow hace let generation = self.workflowsCache.currentCacheGeneration()" — pinned the in-memory write gap to getWorkflow's own re-capture.

Agent Contribution

  • List-cache fix: cache(workflowsList:ifGeneration:) -> Bool (generation check + memory/disk write + prune under detailsLock); moved restoreWorkflowsListFromDisk() and the list clear under the lock; split pruneWorkflowDetails into a locked wrapper + lock-free body; gated prefetch on the guarded write's result.
  • In-memory write fix: added ifGeneration: Int? = nil to WorkflowManager.getWorkflow; the in-memory write now uses generation ?? currentCacheGeneration(); prefetchWorkflows forwards the list's generation so the disk and in-memory writes share one token. Updated the stale doc comment that claimed each getWorkflow "guards its own in-memory write the same way".
  • Tests: TDD on the in-memory fix — added testPrefetchInMemoryWriteIsGuardedByListGenerationNotAFreshCapture, watched it fail (write landed) before the fix and pass after. Earlier list-cache work added a seedWorkflowsList helper + two guard tests.
  • Review cleanup (vegaro): collapsed the dead pruneWorkflowDetails wrapper and the lock-free pruneWorkflowDetailsLocked body into a single pruneWorkflowDetails whose doc states the detailsLock precondition, instead of encoding a misleading Locked suffix in the name.

Human Decisions

  • Workflows-only. The offerings cache is intentionally not touched.
  • Identified the in-memory write root cause (getWorkflow's fresh re-capture) directly.

Key Implementation Decisions

  • Decision: fold the list generation check into the write under one lock.
    • Rationale: makes the update atomic w.r.t. clearCache() (no check-then-write window).
  • Decision: forward the list's generation into the prefetch's in-memory write rather than re-capturing in getWorkflow.
    • Rationale: the list's generation is the token cache(workflowsList:ifGeneration:) already validated against; a re-capture reads an already-bumped value if a clear landed in between. On-demand callers keep the fresh capture via the nil default, so their behavior is unchanged.
    • Rejected: making workflowId(forOfferingId:) / the getters take the lock — they are intentionally lock-free reads; the guarantee is eventual consistency, not serializing every getter.

Files / Symbols Touched

  • Sources/Caching/WorkflowsCache.swiftcache(workflowsList:ifGeneration:), pruneWorkflowDetails (single lock-free body, caller holds detailsLock), restoreWorkflowsListFromDisk (under lock), clearCache (list clear under lock).
  • Sources/Purchasing/WorkflowManager.swiftgetWorkflow(…, ifGeneration:) and its in-memory write; prefetchWorkflows forwards the list generation.
  • Tests/UnitTests/Caching/WorkflowsCacheTests.swiftseedWorkflowsList helper, call-site migration, two guard tests.
  • Tests/UnitTests/Purchasing/WorkflowManagerTests.swifttestPrefetchInMemoryWriteIsGuardedByListGenerationNotAFreshCapture.

Dependencies / Config / Migrations

  • None. WorkflowManager is internal; the added parameter is optional, so no public swiftinterface change.

Validation

  • Commands run:
    • xcodebuild test … -only-testing:UnitTests/WorkflowManagerTests -only-testing:UnitTests/WorkflowsCacheTests (iPhone 15 sim): 84 passing, 0 failures (42 + 42).
    • swiftlint on changed files: no violations.
  • Manual verification: Not run.
  • CI: pending on the PR at time of writing.

Validation Gaps

  • The new test drives getWorkflow directly with the list's generation, so it covers getWorkflow honoring a forwarded generation. The prefetch call site forwarding that generation is verified by inspection: the clear landing between caching the list and the prefetch's synchronous generation capture is not deterministically raceable without a production-only test seam, which was not added.
  • The cross-user interleaving is closed by the lock + generation token; the deterministic unit tests cover the guards' accept/drop behavior, not a true concurrent race.

Review Focus

  • Lock ordering stays detailsLock → deviceCache (no inversion); the prune split avoids self-deadlock on the non-recursive lock.
  • clearCache / guarded write / restore are mutually atomic.
  • getWorkflow's ifGeneration default (nil → fresh capture) preserves on-demand behavior; only prefetch forwards the list generation.

Risks / Reviewer Notes

  • Risk: workflowId(forOfferingId:) is an intentionally lock-free read, so a clear and a read can still race.
    • Evidence: it was never synchronized with clearCache; the guarantee is eventual consistency, and during an identity switch the new user's data doesn't exist yet.
    • Mitigation: out of scope by design; called out so it isn't re-litigated.

Non-goals / Out of Scope

  • The offerings cache. Doing the same guard there is deliberately not part of this PR.

Omitted Context

  • Raw transcript, unrelated exploration, and chain-of-thought content omitted.

#6917 dropped a late list fetch from the previous user on identity
change, but a narrow window remained between the success-path
generation re-check and the still-unguarded cache(workflowsList:)
write. Fold the check into a generation-guarded
cache(workflowsList:ifGeneration:) that does the check, the memory and
disk writes, and the detail prune atomically under detailsLock, and put
restoreWorkflowsListFromDisk and the list clear under the same lock.
The guarded write reports whether it landed so the manager skips
prefetching a dropped list's details.

Scope is the workflows cache only; the offerings cache is intentionally
left as-is.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@facumenzella facumenzella added the pr:fix A bug fix label Jun 8, 2026
@facumenzella facumenzella changed the title fix(workflows): close the residual list-cache cross-user race Don't leak the previous user's workflows after a login or logout Jun 8, 2026
@facumenzella facumenzella marked this pull request as ready for review June 8, 2026 16:43
@facumenzella facumenzella requested a review from a team as a code owner June 8, 2026 16:43
@facumenzella facumenzella changed the title Don't leak the previous user's workflows after a login or logout fix(workflows): Don't leak the previous user's workflows after a login or logout Jun 8, 2026

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 5201a6c. Configure here.

Comment thread Sources/Caching/WorkflowsCache.swift
@facumenzella facumenzella added pr:other and removed pr:fix A bug fix labels Jun 9, 2026
Comment thread Sources/Caching/WorkflowsCache.swift Outdated
Comment thread Sources/Caching/WorkflowsCache.swift
Comment thread Sources/Purchasing/WorkflowManager.swift
getWorkflow re-captured currentCacheGeneration() at its own call time. From
the prefetch path that's too late: a clearCache() landing between caching the
list and the prefetch issuing its getWorkflow makes that fresh capture read the
already-bumped generation, so the in-memory write passes its own guard and the
previous user's (user-scoped) detail repopulates the cleared cache. The disk
write was already correct because it used the list's generation.

Forward the list generation into getWorkflow's in-memory write (on-demand
callers still capture fresh via the nil default), so both the disk and
in-memory writes are guarded by the same token.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address review: the standalone pruneWorkflowDetails wrapper was dead (the
guarded list write is the only pruner and already holds detailsLock), and the
"...Locked" suffix on the body read as if it took the lock. Collapse to a single
pruneWorkflowDetails that documents the lock precondition instead of encoding a
misleading suffix in the name.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@facumenzella facumenzella enabled auto-merge (squash) June 9, 2026 09:31
@facumenzella facumenzella merged commit 40029c5 into main Jun 9, 2026
18 of 20 checks passed
@facumenzella facumenzella deleted the facu/workflow-list-cache-guard branch June 9, 2026 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants