Skip to content

Persist workflow detail envelopes for recovery when backend is down#3537

Merged
vegaro merged 9 commits into
mainfrom
cesar/atlanta-v1
Jun 8, 2026
Merged

Persist workflow detail envelopes for recovery when backend is down#3537
vegaro merged 9 commits into
mainfrom
cesar/atlanta-v1

Conversation

@vegaro

@vegaro vegaro commented Jun 4, 2026

Copy link
Copy Markdown
Member

This PR persists the detail envelopes to disk, mirroring what we do with OfferingsCache, so a cold start with the backend down renders the prefetched workflows. We are already saving to disk the list of workflows, and we were missing this part.


Note

Medium Risk
Changes paywall/workflow delivery and disk cache behavior on network failure and cold start; scope is bounded (prefetch + prune) but incorrect recovery could affect paywall rendering.

Overview
Adds disk persistence of workflow detail envelopes (alongside the existing workflows list cache) so prefetched paywalls can be re-resolved offline after restart or when the backend is unreachable.

Prefetch-only writes: getWorkflow gains persistEnvelopeOnResolve (default false). Only the workflows-list prefetch path sets it true after a successful resolve, storing the raw WorkflowDetailResponse in DeviceCache via WorkflowsCache (merge-by-workflow-id map). On-demand fetches do not persist, to avoid unbounded disk use.

Backend-down recovery: When getWorkflowsList fails, the manager still restores the list from disk; if persisted envelopes exist, it re-resolves them in parallel into the in-memory workflow cache (restoreWorkflowFromEnvelope), so subsequent getWorkflow can be a cache hit without calling the backend. Failures per envelope are logged and isolated; onComplete still runs once.

Cache hygiene: cacheWorkflowsList prunes stored envelopes to IDs in the latest list; clearCache clears the envelope store on identity transitions. WorkflowJsonParser adds parsing for the envelope map.

Reviewed by Cursor Bugbot for commit bc39f94. Bugbot is set up for automated code reviews on this repo. Configure here.

@vegaro vegaro added the pr:feat A new feature label Jun 4, 2026
@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.65217% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.15%. Comparing base (5ddbc02) to head (bc39f94).

Files with missing lines Patch % Lines
...ecat/purchases/common/workflows/WorkflowManager.kt 90.47% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3537      +/-   ##
==========================================
+ Coverage   80.11%   80.15%   +0.04%     
==========================================
  Files         371      371              
  Lines       15166    15210      +44     
  Branches     2100     2110      +10     
==========================================
+ Hits        12150    12192      +42     
- Misses       2166     2168       +2     
  Partials      850      850              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@vegaro vegaro changed the title feat: persist workflow detail envelopes for backend-down recovery Persist workflow detail envelopes for recovery when backend is down Jun 4, 2026
@vegaro vegaro added pr:RevenueCatUI pr:other and removed pr:feat A new feature labels Jun 4, 2026
@vegaro vegaro marked this pull request as ready for review June 4, 2026 16:06
@vegaro vegaro requested a review from a team as a code owner June 4, 2026 16:06
vegaro and others added 9 commits June 5, 2026 08:06
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vegaro vegaro force-pushed the cesar/atlanta-v1 branch from b7b33bf to bc39f94 Compare June 5, 2026 06:08

vegaro commented Jun 5, 2026

Copy link
Copy Markdown
Member Author

facumenzella added a commit to RevenueCat/purchases-ios that referenced this pull request Jun 5, 2026
…recovery

Prefetched workflow detail was kept in memory only, so a cold start with
the backend down could restore the offeringId -> workflowId map but couldn't
render a single workflow. This persists each prefetched workflow's resolved
WorkflowDataResult to a new DeviceCache region, mirroring how the workflows
list is stored, and restores it into the in-memory cache on a list-fetch
failure so a later getWorkflow is a cache hit with no failed network call.

- Persist on the prefetch path only (persistDetail flag), after a successful
  fetch, so a persisted detail is always renderable offline.
- Prune the on-disk detail store to the current list on each list write, and
  clear it on identity transitions alongside the list disk cache.
- Restore details fresh so getWorkflow serves them offline; the list restores
  stale so it refetches once the backend is back.

Port of RevenueCat/purchases-android#3537

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@facumenzella

facumenzella commented Jun 5, 2026

Copy link
Copy Markdown
Member

Added two focused reproductions for the workflow envelope caching concerns in this PR:

  1. Stale envelope restore after prefetch is turned off: [codex] Reproduce stale workflow envelope restore #3548

    • Scenario: backend-down restore sees a cached workflows list where wf_1 is still present but now prefetch=false, while an older persisted envelope for wf_1 remains on disk.
    • Expected: the restore path should not re-resolve that stale envelope into the in-memory workflow cache.
    • Current behavior: the focused test fails at WorkflowManagerTest.kt:1280 because workflowsCache.cachedWorkflow("wf_1") is populated.
  2. Late prefetch completion after cache clear: other(workflows): reproduce late workflow envelope persistence #3547

    • Scenario: workflows-list prefetch starts for wf_1, WorkflowsCache.clearCache() runs before the detail callback completes, then the late detail callback resolves.
    • Expected: the late callback should not persist the old detail envelope back to DeviceCache after the clear.
    • Current behavior: the focused test fails at WorkflowManagerTest.kt:1262 because cacheWorkflowDetailEnvelopes(...) is called after clearCache().

@facumenzella facumenzella left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just opened two PRs to expose some edge cases. I think those are valid, even though edgie. Let me know what you think. We can totally tackle that later

@vegaro vegaro added this pull request to the merge queue Jun 8, 2026
Merged via the queue into main with commit a9acd9f Jun 8, 2026
40 checks passed
@vegaro vegaro deleted the cesar/atlanta-v1 branch June 8, 2026 10:47
facumenzella added a commit to RevenueCat/purchases-ios that referenced this pull request Jun 8, 2026
…recovery

Prefetched workflow detail was kept in memory only, so a cold start with
the backend down could restore the offeringId -> workflowId map but couldn't
render a single workflow. This persists each prefetched workflow's resolved
WorkflowDataResult to a new DeviceCache region, mirroring how the workflows
list is stored, and restores it into the in-memory cache on a list-fetch
failure so a later getWorkflow is a cache hit with no failed network call.

- Persist on the prefetch path only (persistDetail flag), after a successful
  fetch, so a persisted detail is always renderable offline.
- Prune the on-disk detail store to the current list on each list write, and
  clear it on identity transitions alongside the list disk cache.
- Restore details fresh so getWorkflow serves them offline; the list restores
  stale so it refetches once the backend is back.

Port of RevenueCat/purchases-android#3537

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
facumenzella added a commit to RevenueCat/purchases-ios that referenced this pull request Jun 8, 2026
…recovery

Prefetched workflow detail was kept in memory only, so a cold start with
the backend down could restore the offeringId -> workflowId map but couldn't
render a single workflow. This persists each prefetched workflow's resolved
WorkflowDataResult to a new DeviceCache region, mirroring how the workflows
list is stored, and restores it into the in-memory cache on a list-fetch
failure so a later getWorkflow is a cache hit with no failed network call.

- Persist on the prefetch path only (persistDetail flag), after a successful
  fetch, so a persisted detail is always renderable offline.
- Prune the on-disk detail store to the current list on each list write, and
  clear it on identity transitions alongside the list disk cache.
- Restore details fresh so getWorkflow serves them offline; the list restores
  stale so it refetches once the backend is back.

Port of RevenueCat/purchases-android#3537

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
facumenzella added a commit to RevenueCat/purchases-ios that referenced this pull request Jun 8, 2026
* feat(workflows): persist prefetched workflow detail for backend-down recovery

Prefetched workflow detail was kept in memory only, so a cold start with
the backend down could restore the offeringId -> workflowId map but couldn't
render a single workflow. This persists each prefetched workflow's resolved
WorkflowDataResult to a new DeviceCache region, mirroring how the workflows
list is stored, and restores it into the in-memory cache on a list-fetch
failure so a later getWorkflow is a cache hit with no failed network call.

- Persist on the prefetch path only (persistDetail flag), after a successful
  fetch, so a persisted detail is always renderable offline.
- Prune the on-disk detail store to the current list on each list write, and
  clear it on identity transitions alongside the list disk cache.
- Restore details fresh so getWorkflow serves them offline; the list restores
  stale so it refetches once the backend is back.

Port of RevenueCat/purchases-android#3537

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(workflows): address review on detail-cache persistence

- Move workflow-detail disk restore into WorkflowsCache (next to the list
  restore) so all disk-restore logic lives in one place.
- Add a batched in-memory cache(workflows:) and persist prefetched details
  to disk in a single write at the end of prefetchWorkflows instead of one
  write per workflow.
- Guard the disk write with a generation token bumped on clearCache, so an
  in-flight prefetch from a since-logged-out user can't write its details
  back after the store was cleared (cross-user leak). The clear + generation
  bump run under the same lock as the write.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(workflows): guard workflow detail cache against cross-user races

Address Bugbot findings on the detail-cache persistence:

- Extend the generation guard to the in-memory write so an in-flight
  fetch for the previous user can't repopulate memory after an identity
  change (workflow detail is user-scoped). The in-memory clear now runs
  under the same lock as the generation bump.
- restoreWorkflowDetailsFromDisk fills only ids missing from memory, so
  a stale disk snapshot can't clobber a fresher on-demand fetch, and
  holds the disk-read + memory-write under the lock so a clear can't
  interleave.
- persistWorkflowDetailsToDisk filters the persisted map to the current
  list ids, keeping the on-disk store a subset of the latest list when a
  slower prefetch from an earlier list lands after a prune.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(workflows): drop unused in-memory cache helpers

cache(workflow:workflowId:) and cache(workflows:) have no production
callers since getWorkflow writes via the generation-guarded
cache(workflow:workflowId:ifGeneration:) and restore inlines its own
fill-gaps write. Remove both and seed the in-memory cache in tests
through the guarded path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(workflows): correct restore-freshness comment

The restored details are stamped fresh so they serve offline, but the
old comment implied the stale-list restore drives their refresh. It
doesn't: once the backend is back, getWorkflow cache-hits the fresh
details (no backend call) until their own foreground TTL expires. The
stale list only drives the list/map refetch. Clarify both comments.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(workflows): drop a late list fetch from the previous user on identity change

If a workflows list request issued for the previous user finished after an
identity change cleared the cache, the success path still cached that list and
prefetched its details, persisting the prior user's targeted workflow details
to the shared on-disk map for the new session.

Capture the cache generation when the list request is issued and, on success,
drop the response (no list cache, no prefetch) when the generation no longer
matches, still firing onComplete so callers aren't blocked. Thread that
issue-time generation into the prefetch disk persist instead of re-capturing it
after the clear, so a clear landing mid-prefetch is also caught.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants