fix: skip unsynced orgs in syncOrganizationMeta to prevent scheduler blocking#1602
Merged
Merged
Conversation
…blocking syncOrganizationMeta (V1 and V2) loops over every subscribed org and awaits datalayer.getStoreIfUpdated sequentially. When an org's root hash has changed but the new data has not yet propagated to the local datalayer, getStoreIfUpdated descends into syncService.getStoreData's MAX_RETRIES=20 x 10s retry loop — up to ~3.3 minutes per org. With N orgs updating near the same time, the scheduler can stall for N x 3.3 minutes before the task returns. Add a lightweight getDataLayerStoreSyncStatus pre-check per org and skip orgs whose store is not yet fully synced. The scheduler re-runs the task on its normal cadence so skipped orgs catch up on the next pass. Matches the pattern used in reconcileOrganization and governance sync in the companion fix.
2 tasks
…zationMeta Mirrors the subscribe-first pattern established in #1600 af06a65 (for governance sync) and 6d41b71 (for reconcileOrganization). Without subscribe-first, the getDataLayerStoreSyncStatus pre-check added to syncOrganizationMeta has a permanent-skip-loop failure mode: if the DataLayer node has no record of an org store (e.g. after a DataLayer DB reset), the status check returns falsy, the loop iteration skips the org, and no subsequent run ever subscribes — the org's metadata stops syncing forever. Also adds a falsy-return guard on subscribeToStoreOnDataLayer: the wrapper in src/datalayer/persistance.js returns false (without throwing) on the common failure paths (no storeId, getSubscriptions RPC failure). Without the falsy check, a datalayer-unreachable failure would slip past the try/catch and produce a misleading "not yet synced" log. Applies to both V1 Organization.syncOrganizationMeta and V2 OrganizationsV2.syncOrganizationMeta.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #1600. Eliminates the remaining large-scale blocking case in the scheduler:
syncOrganizationMeta(V1 and V2) loops sequentially over every subscribed org and awaitsdatalayer.getStoreIfUpdatedper org. When an org's root hash has changed but the new data has not yet propagated to the local datalayer,getStoreIfUpdateddescends intosyncService.getStoreData'sMAX_RETRIES=20 × 10 sloop — up to ~3.3 minutes per org. With N orgs updating near the same time (common after a downtime window or a batch publish), the scheduler can stall for N × 3.3 min before the task returns.Organization.syncOrganizationMetaand V2OrganizationsV2.syncOrganizationMetanow callgetDataLayerStoreSyncStatusper org andcontinuethe loop if the store is not yet synced. The scheduler re-runs the task on its normal 5-minute cadence so skipped orgs catch up on the next pass.if (!USE_SIMULATOR)to preserve simulator-mode behavior. In simulator mode the sync-status RPC returns a hardcoded synced status anyway; skipping it avoids unnecessary calls on a code path that already completes fast.reconcileOrganizationand the governancesync()pre-check shipped in fix: make background sync non-blocking on unsynced stores #1600.Behavior change callouts
sync-organization-meta*tasks are affected.Test plan
npm run test:v2— 3 new tests plus the pre-existing 9 tests insync-organization-meta-v2.spec.jsall pass.sync-organization-meta*task runs complete within seconds even when several subscribed orgs have pending updates.Note
Medium Risk
Changes background org-metadata sync to conditionally skip orgs when their datalayer store isn't fully synced, which could delay metadata updates if sync-status checks are wrong or flaky. Risk is limited to scheduled sync behavior (no request-path changes) and is guarded to preserve simulator-mode behavior.
Overview
Prevents the
sync-organization-metascheduler task (V1 and V2) from blocking for minutes per org by pre-checking datalayer sync status and skipping orgs whose org store is not yet fully synced.In both
Organization.syncOrganizationMetaandOrganizationsV2.syncOrganizationMeta, the loop now subscribes to the org store, then callsgetDataLayerStoreSyncStatus/datalayer.getDataLayerStoreSyncStatus; on subscription failure, sync-status errors, or an unsynced store, it logs andcontinues to the next org (only when!USE_SIMULATOR).Adds an integration test (
sync-organization-meta-unsynced.spec.js) asserting both methods exist and return quickly/no-error in simulator mode, and that V2 still ignores unsubscribed orgs.Reviewed by Cursor Bugbot for commit 037c5f1. Bugbot is set up for automated code reviews on this repo. Configure here.