Skip to content

fix: skip unsynced orgs in syncOrganizationMeta to prevent scheduler blocking#1602

Merged
TheLastCicada merged 2 commits into
v2-rc2from
fix/sync-organization-meta-nonblocking
Apr 23, 2026
Merged

fix: skip unsynced orgs in syncOrganizationMeta to prevent scheduler blocking#1602
TheLastCicada merged 2 commits into
v2-rc2from
fix/sync-organization-meta-nonblocking

Conversation

@TheLastCicada

@TheLastCicada TheLastCicada commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Follow-up to #1600. Eliminates the remaining large-scale blocking case in the scheduler: syncOrganizationMeta (V1 and V2) loops sequentially over every subscribed org and awaits datalayer.getStoreIfUpdated per org. When an org's root hash has changed but the new data has not yet propagated to the local datalayer, getStoreIfUpdated descends into syncService.getStoreData's MAX_RETRIES=20 × 10 s loop — up to ~3.3 minutes per org. With N orgs updating near the same time (common after a downtime window or a batch publish), the scheduler can stall for N × 3.3 min before the task returns.

  • V1 Organization.syncOrganizationMeta and V2 OrganizationsV2.syncOrganizationMeta now call getDataLayerStoreSyncStatus per org and continue the loop if the store is not yet synced. The scheduler re-runs the task on its normal 5-minute cadence so skipped orgs catch up on the next pass.
  • The pre-check is wrapped in if (!USE_SIMULATOR) to preserve simulator-mode behavior. In simulator mode the sync-status RPC returns a hardcoded synced status anyway; skipping it avoids unnecessary calls on a code path that already completes fast.
  • Same pattern as reconcileOrganization and the governance sync() pre-check shipped in fix: make background sync non-blocking on unsynced stores #1600.

Behavior change callouts

  • Orgs that are mid-sync (root changed, data still arriving) are skipped for the current 5-minute task run rather than blocking the task for up to 3.3 min each. Their metadata will be picked up on the next task run when their store is fully synced.
  • No change to any request-path endpoint. Only the background sync-organization-meta* tasks are affected.

Test plan

  • npm run test:v2 — 3 new tests plus the pre-existing 9 tests in sync-organization-meta-v2.spec.js all pass.
  • Verify on a real CADT deployment that sync-organization-meta* task runs complete within seconds even when several subscribed orgs have pending updates.

Note

Medium Risk
Changes background org-metadata sync to conditionally skip orgs when their datalayer store isn't fully synced, which could delay metadata updates if sync-status checks are wrong or flaky. Risk is limited to scheduled sync behavior (no request-path changes) and is guarded to preserve simulator-mode behavior.

Overview
Prevents the sync-organization-meta scheduler task (V1 and V2) from blocking for minutes per org by pre-checking datalayer sync status and skipping orgs whose org store is not yet fully synced.

In both Organization.syncOrganizationMeta and OrganizationsV2.syncOrganizationMeta, the loop now subscribes to the org store, then calls getDataLayerStoreSyncStatus/datalayer.getDataLayerStoreSyncStatus; on subscription failure, sync-status errors, or an unsynced store, it logs and continues to the next org (only when !USE_SIMULATOR).

Adds an integration test (sync-organization-meta-unsynced.spec.js) asserting both methods exist and return quickly/no-error in simulator mode, and that V2 still ignores unsubscribed orgs.

Reviewed by Cursor Bugbot for commit 037c5f1. Bugbot is set up for automated code reviews on this repo. Configure here.

…blocking

syncOrganizationMeta (V1 and V2) loops over every subscribed org and
awaits datalayer.getStoreIfUpdated sequentially.  When an org's root
hash has changed but the new data has not yet propagated to the local
datalayer, getStoreIfUpdated descends into syncService.getStoreData's
MAX_RETRIES=20 x 10s retry loop — up to ~3.3 minutes per org.  With N
orgs updating near the same time, the scheduler can stall for N x 3.3
minutes before the task returns.

Add a lightweight getDataLayerStoreSyncStatus pre-check per org and
skip orgs whose store is not yet fully synced.  The scheduler re-runs
the task on its normal cadence so skipped orgs catch up on the next
pass.  Matches the pattern used in reconcileOrganization and governance
sync in the companion fix.
…zationMeta

Mirrors the subscribe-first pattern established in #1600 af06a65 (for
governance sync) and 6d41b71 (for reconcileOrganization).

Without subscribe-first, the getDataLayerStoreSyncStatus pre-check added
to syncOrganizationMeta has a permanent-skip-loop failure mode: if the
DataLayer node has no record of an org store (e.g. after a DataLayer DB
reset), the status check returns falsy, the loop iteration skips the
org, and no subsequent run ever subscribes — the org's metadata stops
syncing forever.

Also adds a falsy-return guard on subscribeToStoreOnDataLayer: the
wrapper in src/datalayer/persistance.js returns false (without throwing)
on the common failure paths (no storeId, getSubscriptions RPC failure).
Without the falsy check, a datalayer-unreachable failure would slip past
the try/catch and produce a misleading "not yet synced" log.

Applies to both V1 Organization.syncOrganizationMeta and V2
OrganizationsV2.syncOrganizationMeta.
@TheLastCicada TheLastCicada merged commit 5c33a2b into v2-rc2 Apr 23, 2026
26 checks passed
@TheLastCicada TheLastCicada deleted the fix/sync-organization-meta-nonblocking branch April 23, 2026 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant