Skip to content

feat(workflows): persist prefetched workflow detail on disk#6917

Merged
facumenzella merged 6 commits into
mainfrom
facu/workflow-persist-detail-cache
Jun 8, 2026
Merged

feat(workflows): persist prefetched workflow detail on disk#6917
facumenzella merged 6 commits into
mainfrom
facu/workflow-persist-detail-cache

Conversation

@facumenzella

@facumenzella facumenzella commented Jun 5, 2026

Copy link
Copy Markdown
Member

Port of RevenueCat/purchases-android#3537.

What

#6882/#6883 persist the /workflows list to disk and restore the offeringId → workflowId map when the backend is down. But each prefetched workflow's detail was kept in memory only, so a cold start with the backend down could restore the map yet not render a single workflow. The list disk cache was half a resilience feature on its own.

This persists each prefetched workflow's detail too, so a cold start with the backend down renders the workflows we told the SDK mattered.

Flow

flowchart TD
    Start([getWorkflowsList]) --> Stale{list cache stale?}
    Stale -->|no| Done([onComplete])
    Stale -->|yes| Fetch[backend.getWorkflows]

    Fetch -->|success| CacheList[cache list: memory + disk]
    CacheList --> Prune[prune disk details<br/>not in latest list]
    CacheList --> Prefetch[prefetch each prefetch=true workflow]
    Prefetch --> GW["getWorkflow: capture cacheGeneration,<br/>resolve, cache in memory if gen still matches"]
    GW --> Accum[accumulate resolved results]
    Accum --> LastOne{last prefetch landed?}
    LastOne -->|yes| Persist["batch-persist details to disk<br/>if gen matches, filtered to current list ids"]
    Persist --> Done

    Fetch -->|failure| RestoreList["restore list from disk → stale<br/>(drives refetch when backend is back)"]
    RestoreList --> RestoreDetails["restore details from disk → fresh<br/>(fill only ids missing in memory)"]
    RestoreDetails --> Done2([onComplete])
    RestoreDetails -.enables.-> Offline["later getWorkflow → in-memory hit,<br/>no backend round-trip"]

    Clear([clearCache on identity change]) ==> Bump["under detailsLock: bump cacheGeneration,<br/>clear in-memory + on-disk details"]

    classDef new fill:#e8f5e9,stroke:#43a047,color:#1b5e20;
    class CacheList,Prune,Accum,Persist,RestoreDetails,Offline,Bump new
Loading

Green = added/changed by this PR. Everything else is the existing list-cache flow from #6882/#6883.

Divergences from Android

The Android PR persists the envelope (url/hash/enrolled_variants) and re-resolves it offline, re-reading the CDN file. On iOS the envelope never reaches WorkflowManager: the networking layer consumes the raw response and hands back a fully-resolved WorkflowDataResult, for consistency with how every other endpoint works on iOS. So iOS persists the resolved result and restores it directly.

Consequences of that fork:

  • For use_cdn workflows the compiled JSON is duplicated on disk (the prefetch set is small). Android avoids this; iOS can't cheaply.
  • Restore needs no CDN round-trip or hash re-verification. It's our own cache and the doc was verified when first fetched. Android's re-resolve can still hit the CDN (and fail) while the backend is down.

Where iOS is deliberately stricter than Android (driven by Bugbot + review on this PR):

  • Cross-user guard. Android's cacheWorkflowsList intentionally leaves the identity-transition race unguarded (last-write-wins, self-heals on the next fetch, same as the offerings cache). iOS guards the detail writes with a cacheGeneration token so an in-flight fetch for the previous user can't repopulate memory or disk after clearCache(). Workflow detail is user-scoped (subscribers/{appUserID}/workflows/{id}), so this closes a real wrong-user window.
  • Restore is fill-gaps. Android's restore overwrites in-memory entries; iOS fills only ids missing from memory, so a fresher on-demand fetch isn't clobbered by an older disk snapshot.
  • Persist is list-filtered. Android prunes only on list-write; iOS also filters the persisted map to the current list ids, so a slow prefetch from an earlier list can't revive a since-pruned id.

Same as Android: prefetch-only persistence (on-demand stays memory-only, LRU cap is a follow-up), prune-to-list on each list write, clear-on-identity, details restored fresh, and onComplete firing exactly once.

Known follow-up (not in this PR): a late list fetch from a previous user is now dropped on identity change (the generation is captured when the list request is issued), so the common race no longer caches the wrong user's list or persists their details. A narrow residual window remains between the success-path generation re-check and the still-unguarded cache(workflowsList:) write; it only affects the offeringId → workflowId map (details stay fully guarded) and self-heals on the next offerings refresh / list TTL. Closing it fully means a generation-guarded list write under detailsLock, done consistently with the offerings cache rather than bolted on here.

AI session context

Metadata

Goal

Port purchases-android#3537 (persist prefetched workflow detail for backend-down recovery) to iOS, then address Bugbot/review findings and keep the branch current with main.

Initial Prompt

"let's port purchases-android#3537 to iOS. Take #6905 into account when integrating that. Most likely useful to branch off #6905."

Important Follow-up Prompts

  • Asked how iOS already persists other things to disk before choosing an approach (drove persisting a resolved Codable model via the existing large-item cache).
  • "let's go with 2 then" — chose Approach 2 (persist resolved WorkflowDataResult) over strict Android parity (raw envelope + re-resolve).
  • "vegaro has made a bunch of comments, let's analyse them" → chose: keep restore-fresh, full generation-token fix for the logout race, and refactors (move restore into the cache, batched cache(workflows:), prefetch uses the batched path).
  • "Let's address cursorbot" → fixed the three open Bugbot findings (+ a fourth instance found during review).
  • "make sure we're up-to-date with main, there are conflicts" → rebased onto main after feat(workflows): synchronously seed workflow paywall from warm cache #6905 merged.
  • "validate behavior against android" + add a diagram → this writeup.
  • Bugbot r3374267695 (High: "Late fetch persists prior user details") → confirmed real with a RED test, then "keep it proportionate" (vs. the heavier lock-guarded list write): manager-level drop of a late list fetch + thread the issue-time generation into the prefetch persist.

Agent Contribution

  • Read the Android diff and the iOS workflow stack (WorkflowManager, WorkflowsCache, DeviceCache, WorkflowDetailProcessor, WorkflowsAPI, GetWorkflowOperation) and identified the architectural fork: the envelope never reaches WorkflowManager on iOS.
  • Implemented Approach 2: new DeviceCache workflow-details region, WorkflowsCache disk methods (persist/merge/prune/restore/clear), prefetch-path persistence in WorkflowManager, disk restore on list-fetch failure.
  • Added Codable to WorkflowDataResult with a populated-AnyDecodable encode→decode round-trip test to prove lossless persistence.
  • Addressed Bugbot findings: extended the cacheGeneration guard to the in-memory write and moved the in-memory clear under detailsLock (cross-user race); made restore fill-gaps-only; filtered the persisted map to the current list ids; closed the same race on the restore path by holding the disk-read + memory-write under the lock.
  • Re-validated against the merged Android PR and rebased the branch onto main.
  • Closed Bugbot's late-list-fetch race: the prefetch path re-captured the generation after the list fetch completed, so a clearCache() landing while the list request was in flight was already baked into that generation and the prior user's detail writes passed the guard. Moved the capture to list-issue time, added a success-path generation re-check that drops the response (no list cache, no prefetch) while still firing onComplete, and threaded the issue-time generation into the prefetch disk persist. Proven RED then GREEN with a new manager test (+ a MockWorkflowsAPI before-completion hook seam).

Human Decisions

  • Approach 2 (persist resolved result) over strict Android parity.
  • Restore details fresh, restore list stale.
  • Full cross-user fix via the generation token (rather than accepting Android's self-heal).
  • Refactor shape: restore lives in WorkflowsCache, batched cache(workflows:), prefetch uses the batched path.

Key Implementation Decisions

  • Persist the resolved WorkflowDataResult, not the raw envelope. Rationale: iOS never surfaces the envelope to the manager; matches every existing cache region; self-contained offline render with no CDN dependency. Rejected: persist raw envelope + re-resolve (faithful to Android but forces signature churn through the networking layer).
  • Restore details fresh, list stale. Fresh details let getWorkflow serve offline; the stale list drives a refetch once the backend is back.
  • Generation guard (cacheGeneration + detailsLock) covers both memory and disk detail writes; bumped on clearCache() so a fetch issued for the previous user is dropped. Scope is the detail cache only (see the list follow-up note above).
  • Late-list-fetch fix: capture the generation when the list request is issued (not when prefetch starts), re-check it on the success path and drop the whole response if it changed (still firing onComplete), and pass that issue-time generation into the prefetch persist. Rationale: the prior re-capture happened after any in-flight clear, so it could not detect it. De-scoped (per "keep it proportionate"): a fully airtight, generation-guarded cache(workflowsList:ifGeneration:) under detailsLock (would also need pruneWorkflowDetails de-nested to avoid a re-entrant deadlock and the list clear moved under the lock). The residual list-map TOCTOU is pre-existing and narrow; details are unaffected.

Files / Symbols Touched

  • Sources/Networking/Responses/WorkflowsResponse.swiftWorkflowDataResult: Codable. Review: lossless round-trip of AnyDecodable-backed fields.
  • Sources/Caching/DeviceCache.swiftCacheKey.workflowDetails, cachedWorkflowDetails() / cache(workflowDetails:) / clearWorkflowDetailsCache().
  • Sources/Caching/WorkflowsCache.swiftcache(workflows:), cache(workflow:workflowId:ifGeneration:), persistWorkflowDetailsToDisk(_:ifGeneration:), currentCacheGeneration(), restoreWorkflowDetailsFromDisk() (fill-gaps, under lock), pruneWorkflowDetails(toListIds:), cachedListWorkflowIds(), cacheGeneration/detailsLock, clear-on-identity. Review: lock scope, list-id filter on persist, fill-gaps restore.
  • Sources/Purchasing/WorkflowManager.swiftgetWorkflow captures currentCacheGeneration() and writes via the ifGeneration: path; getWorkflowsList now captures the generation at request-issue time and drops the success path if it changed; prefetch accumulates results and batch-persists once the last lands (now takes the issue-time generation instead of re-capturing); restore on list-fetch failure. Review: the success-path re-check + the threaded generation.
  • Tests/UnitTests/Caching/WorkflowsCacheTests.swift, Tests/UnitTests/Caching/DeviceCacheTests.swift, Tests/UnitTests/Purchasing/WorkflowManagerTests.swift (+ testGetWorkflowsListDropsListAndPrefetchWhenIdentityChangesWhileListFetchInFlight), Tests/UnitTests/Mocks/MockDeviceCache.swift, Tests/UnitTests/Mocks/MockWorkflowsAPI.swift (onGetWorkflowsBeforeCompletion hook).

Dependencies / Config / Migrations

  • None. No new dependencies, flags, or schema. (Package.resolved touched only locally by tooling and reverted; not in the diff.)

Validation

  • swift build: success.
  • swiftlint on changed files: no violations.
  • xcodebuild test -scheme UnitTests for WorkflowsCacheTests (41) + WorkflowManagerTests (36): 77 passing, 0 failures, on iPhone 16 (latest sim), after rebasing onto main.
  • Late-list-fetch fix: WorkflowManagerTests ran RED (5 failures: wrong-user list cached, prefetch ran, detail persisted to disk + memory, offering map populated) before the fix, then GREEN (37/37, including the two existing mid-prefetch identity tests) after. swiftlint clean on the changed files.
  • CI: green on the prior commit except the Emerge size gate (action_required, +0.09% download / +0.1% install) which is a manual size-approval, not a failure.

Validation Gaps

  • End-to-end "cold start with backend unreachable renders a prefetched workflow" not exercised on device; covered at the unit level (restore into memory → served without backend round-trip).
  • The cross-user races are covered by deterministic unit tests at the cache layer and a mid-flight manager test; the true concurrent interleaving is closed by the lock, not reproduced under threads.

Review Focus

  • Restore avoids serving stale data once the backend is back (list restored stale → refetch; details fresh → served offline meanwhile).
  • Prune-on-list + filter-on-persist keep on-disk details a subset of the latest list.
  • detailsLock scope: generation-check + memory/disk writes atomic against clearCache(); no nested-lock deadlock (lock order detailsLock → cachedWorkflows).
  • Interaction with feat(workflows): synchronously seed workflow paywall from warm cache #6905's cachedWorkflow(forOfferingId:): after recovery the list is stale, so the sync seed bails to the async path.

Risks / Reviewer Notes

  • Risk: CDN-compiled JSON duplicated on disk for prefetched use_cdn workflows. Mitigation: pruned to the current list on each list write; cleared on identity changes; prefetch set is small.
  • Risk: a narrow list-map TOCTOU remains between the success-path generation re-check and the unguarded cache(workflowsList:) write. Evidence: those two statements are adjacent and synchronous (no await between them), so the window is a few instructions; details are unaffected (their writes re-check the generation under detailsLock). Mitigation: self-heals on the next offerings refresh / list TTL. Tracked as the follow-up noted above. The common race (clear before the list completion) is now closed by the success-path drop.

Non-goals / Out of Scope

  • On-demand (non-prefetch) detail persistence + LRU cap.
  • Fully closing the narrow residual list-map TOCTOU with a generation-guarded cache(workflowsList:ifGeneration:) under detailsLock (do it consistently with the offerings cache). The common identity-transition race is handled in this PR.

Omitted Context

  • Raw transcript, unrelated exploration, sensitive details, and chain-of-thought content were omitted.

Note

Medium Risk
Touches paywall workflow caching, cross-user identity races, and persisted user-targeted workflow JSON; mitigated by generation guards, pruning, and broad unit tests, but wrong-user list cache remains unguarded as noted in the PR.

Overview
Adds disk persistence for prefetched workflow details so a cold start with the backend down can still render paywalls after the workflows list is restored from disk.

DeviceCache gains a non-user-scoped workflowDetails large-item region (workflowIdWorkflowDataResult), cleared on identity changes like the workflows list.

WorkflowsCache batches prefetched details to disk after prefetch completes, restores them into memory (fresh TTL, fill-gaps only) when the list fetch fails, and prunes on-disk details when the list no longer includes those ids. A cacheGeneration token plus detailsLock drop in-flight memory/disk writes after clearCache() so the previous user’s user-scoped workflow payloads cannot repopulate the cache mid login/logout.

WorkflowManager accumulates successful prefetch results into one disk write, skips persisting on-demand getWorkflow fetches, drops stale list responses if identity changed mid-flight, and restores list + details from disk on list failure.

WorkflowDataResult is Codable for JSON persistence (with round-trip tests for AnyDecodable fields).

Reviewed by Cursor Bugbot for commit c06fba9. Bugbot is set up for automated code reviews on this repo. Configure here.

@facumenzella facumenzella added the pr:feat A new feature label Jun 5, 2026
@facumenzella facumenzella changed the title feat(workflows): persist prefetched workflow detail for backend-down recovery feat(workflows): persist prefetched workflow detail on disk Jun 5, 2026
@facumenzella facumenzella marked this pull request as ready for review June 5, 2026 08:58
@facumenzella facumenzella requested a review from a team as a code owner June 5, 2026 08:58
Comment thread Sources/Caching/WorkflowsCache.swift
@facumenzella facumenzella force-pushed the facu/workflow-persist-detail-cache branch from 8310741 to 2fd53e1 Compare June 5, 2026 09:00
Comment thread Sources/Caching/WorkflowsCache.swift Outdated
Comment thread Sources/Purchasing/WorkflowManager.swift Outdated
Comment thread Sources/Caching/WorkflowsCache.swift Outdated
Comment thread Sources/Purchasing/WorkflowManager.swift Outdated
Comment thread Sources/Purchasing/WorkflowManager.swift Outdated
Comment thread Sources/Purchasing/WorkflowManager.swift Outdated
Comment thread Sources/Purchasing/WorkflowManager.swift
Comment thread Sources/Caching/WorkflowsCache.swift Outdated
Base automatically changed from facu/workflow-sync-seed to main June 8, 2026 11:58
@facumenzella facumenzella requested review from a team as code owners June 8, 2026 11:58
@facumenzella facumenzella force-pushed the facu/workflow-persist-detail-cache branch from 50489ad to 16be480 Compare June 8, 2026 12:43
@facumenzella facumenzella requested a review from vegaro June 8, 2026 13:09
Comment thread Sources/Caching/WorkflowsCache.swift
Comment thread Sources/Purchasing/WorkflowManager.swift
Comment thread Sources/Caching/WorkflowsCache.swift Outdated
Comment thread Sources/Caching/WorkflowsCache.swift
Comment thread Sources/Caching/WorkflowsCache.swift
@facumenzella facumenzella requested a review from vegaro June 8, 2026 15:01
@facumenzella

Copy link
Copy Markdown
Member Author

Ready for another pass whenever you can @vegaro

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 51484d0. Configure here.

Comment thread Sources/Purchasing/WorkflowManager.swift
Comment thread Sources/Caching/WorkflowsCache.swift
facumenzella and others added 6 commits June 8, 2026 17:34
…recovery

Prefetched workflow detail was kept in memory only, so a cold start with
the backend down could restore the offeringId -> workflowId map but couldn't
render a single workflow. This persists each prefetched workflow's resolved
WorkflowDataResult to a new DeviceCache region, mirroring how the workflows
list is stored, and restores it into the in-memory cache on a list-fetch
failure so a later getWorkflow is a cache hit with no failed network call.

- Persist on the prefetch path only (persistDetail flag), after a successful
  fetch, so a persisted detail is always renderable offline.
- Prune the on-disk detail store to the current list on each list write, and
  clear it on identity transitions alongside the list disk cache.
- Restore details fresh so getWorkflow serves them offline; the list restores
  stale so it refetches once the backend is back.

Port of RevenueCat/purchases-android#3537

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Move workflow-detail disk restore into WorkflowsCache (next to the list
  restore) so all disk-restore logic lives in one place.
- Add a batched in-memory cache(workflows:) and persist prefetched details
  to disk in a single write at the end of prefetchWorkflows instead of one
  write per workflow.
- Guard the disk write with a generation token bumped on clearCache, so an
  in-flight prefetch from a since-logged-out user can't write its details
  back after the store was cleared (cross-user leak). The clear + generation
  bump run under the same lock as the write.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address Bugbot findings on the detail-cache persistence:

- Extend the generation guard to the in-memory write so an in-flight
  fetch for the previous user can't repopulate memory after an identity
  change (workflow detail is user-scoped). The in-memory clear now runs
  under the same lock as the generation bump.
- restoreWorkflowDetailsFromDisk fills only ids missing from memory, so
  a stale disk snapshot can't clobber a fresher on-demand fetch, and
  holds the disk-read + memory-write under the lock so a clear can't
  interleave.
- persistWorkflowDetailsToDisk filters the persisted map to the current
  list ids, keeping the on-disk store a subset of the latest list when a
  slower prefetch from an earlier list lands after a prune.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cache(workflow:workflowId:) and cache(workflows:) have no production
callers since getWorkflow writes via the generation-guarded
cache(workflow:workflowId:ifGeneration:) and restore inlines its own
fill-gaps write. Remove both and seed the in-memory cache in tests
through the guarded path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The restored details are stamped fresh so they serve offline, but the
old comment implied the stale-list restore drives their refresh. It
doesn't: once the backend is back, getWorkflow cache-hits the fresh
details (no backend call) until their own foreground TTL expires. The
stale list only drives the list/map refetch. Clarify both comments.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tity change

If a workflows list request issued for the previous user finished after an
identity change cleared the cache, the success path still cached that list and
prefetched its details, persisting the prior user's targeted workflow details
to the shared on-disk map for the new session.

Capture the cache generation when the list request is issued and, on success,
drop the response (no list cache, no prefetch) when the generation no longer
matches, still firing onComplete so callers aren't blocked. Thread that
issue-time generation into the prefetch disk persist instead of re-capturing it
after the clear, so a clear landing mid-prefetch is also caught.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@facumenzella facumenzella force-pushed the facu/workflow-persist-detail-cache branch from 51484d0 to c06fba9 Compare June 8, 2026 15:41
@facumenzella facumenzella enabled auto-merge (squash) June 8, 2026 15:44
@facumenzella facumenzella merged commit f936d3d into main Jun 8, 2026
17 of 20 checks passed
@facumenzella facumenzella deleted the facu/workflow-persist-detail-cache branch June 8, 2026 15:52
facumenzella added a commit that referenced this pull request Jun 9, 2026
…n or logout (#6944)

* fix(workflows): close the residual list-cache cross-user race

#6917 dropped a late list fetch from the previous user on identity
change, but a narrow window remained between the success-path
generation re-check and the still-unguarded cache(workflowsList:)
write. Fold the check into a generation-guarded
cache(workflowsList:ifGeneration:) that does the check, the memory and
disk writes, and the detail prune atomically under detailsLock, and put
restoreWorkflowsListFromDisk and the list clear under the same lock.
The guarded write reports whether it landed so the manager skips
prefetching a dropped list's details.

Scope is the workflows cache only; the offerings cache is intentionally
left as-is.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(workflows): guard prefetch in-memory write with the list generation

getWorkflow re-captured currentCacheGeneration() at its own call time. From
the prefetch path that's too late: a clearCache() landing between caching the
list and the prefetch issuing its getWorkflow makes that fresh capture read the
already-bumped generation, so the in-memory write passes its own guard and the
previous user's (user-scoped) detail repopulates the cleared cache. The disk
write was already correct because it used the list's generation.

Forward the list generation into getWorkflow's in-memory write (on-demand
callers still capture fresh via the nil default), so both the disk and
in-memory writes are guarded by the same token.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(workflows): drop dead prune wrapper, rename lock-free prune body

Address review: the standalone pruneWorkflowDetails wrapper was dead (the
guarded list write is the only pruner and already holds detailsLock), and the
"...Locked" suffix on the body read as if it took the lock. Collapse to a single
pruneWorkflowDetails that documents the lock precondition instead of encoding a
misleading suffix in the name.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants