Skip to content

Commit 6690930

Browse files
committed
fix: gate V1 org-creation data pushes on wallet readiness
V1 org creation sometimes failed during the data-push phase because batch_update RPCs were issued while the wallet was still processing store-creation confirmations. The wallet rejects these with "Wallet needs to be fully synced" and the orchestrator marked the entire creation as FAILED even though a background retry was scheduled. Add an explicit wallet-readiness gate (walletIsSynced + hasAnyUnconfirmedTransactions + spendable coins) as a pre-flight for _pushDataInParallel, and wrap each push in a bounded sync-retry loop that re-checks readiness before each attempt. A single transient desync no longer aborts the whole pipeline; the existing background retry + final throw are preserved as last-ditch fallbacks. Worst-case budget per push is MAX_DATA_PUSH_SYNC_RETRIES (3) * DATA_PUSH_WALLET_SYNC_WAIT_MS (2 min) = 6 min, staying well under STORE_CONFIRMATION_TIMEOUT_MS (30 min).
1 parent 8faec6d commit 6690930

2 files changed

Lines changed: 197 additions & 10 deletions

File tree

src/models/organizations/organizations.model.js

Lines changed: 180 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -639,17 +639,160 @@ class Organization extends Model {
639639
}
640640
}
641641

642+
/**
643+
* Wait for the wallet to be ready for a batch_update RPC. The Chia wallet
644+
* will reject batch_update with "Wallet needs to be fully synced" unless
645+
* the wallet is caught up to the tip AND the relevant wallet ids have no
646+
* unconfirmed transactions. We explicitly check all three signals here:
647+
*
648+
* 1. walletIsSynced() — wallet caught up to blockchain
649+
* 2. hasAnyUnconfirmedTransactions() — standard + DL wallets settled
650+
* 3. waitForSpendableCoins(requiredCoins) — at least N coins available
651+
*
652+
* NOTE: waitForSpendableCoins alone is insufficient — it only polls
653+
* hasUnconfirmedTransactions('1') and coin records, not walletIsSynced().
654+
* On a node replaying blocks with no pending txs, it would return success
655+
* while the wallet still rejects transactions.
656+
*
657+
* Bounded by ORG_CREATION_CONFIG.DATA_PUSH_WALLET_SYNC_WAIT_MS. Throws on
658+
* timeout; the caller decides whether that's fatal (pre-flight) or should
659+
* fall through to the next retry attempt (per-push retry).
660+
*
661+
* No-ops in the simulator (no wallet).
662+
*
663+
* @param {number} requiredCoins - Minimum number of spendable coins needed.
664+
* @returns {Promise<void>}
665+
* @private
666+
*/
667+
static async _waitForWalletReadyForPush(requiredCoins = 1) {
668+
if (USE_SIMULATOR) return;
669+
670+
const totalTimeoutMs = ORG_CREATION_CONFIG.DATA_PUSH_WALLET_SYNC_WAIT_MS;
671+
const deadline = Date.now() + totalTimeoutMs;
672+
const pollIntervalMs = 5000;
673+
674+
// Phase 1: wait for the wallet to report synced=true.
675+
while (Date.now() < deadline) {
676+
if (await wallet.walletIsSynced()) break;
677+
await new Promise((resolve) => setTimeout(resolve, pollIntervalMs));
678+
}
679+
if (!(await wallet.walletIsSynced())) {
680+
throw new Error(
681+
`wallet did not reach fully-synced state within ${totalTimeoutMs / 1000}s`,
682+
);
683+
}
684+
685+
// Phase 2: wait for all relevant wallets (standard + DL) to have no
686+
// unconfirmed transactions. This is what pushChangesWhenStoreIsAvailable
687+
// gates on, and it's stricter than waitForSpendableCoins's wallet_id=1
688+
// check.
689+
while (Date.now() < deadline) {
690+
if (!(await wallet.hasAnyUnconfirmedTransactions())) break;
691+
await new Promise((resolve) => setTimeout(resolve, pollIntervalMs));
692+
}
693+
if (await wallet.hasAnyUnconfirmedTransactions()) {
694+
throw new Error(
695+
`unconfirmed transactions did not clear within ${totalTimeoutMs / 1000}s`,
696+
);
697+
}
698+
699+
// Phase 3: verify at least requiredCoins spendable coins remain. We've
700+
// already waited for unconfirmed txs, so this usually returns quickly;
701+
// cap the remaining time so the overall helper respects totalTimeoutMs.
702+
const remainingMs = Math.max(10000, deadline - Date.now());
703+
await wallet.waitForSpendableCoins(requiredCoins, undefined, remainingMs);
704+
}
705+
706+
/**
707+
* Push a single store's changelist with a bounded synchronous retry loop.
708+
*
709+
* Each attempt is preceded by a wallet-sync wait so that a transient desync
710+
* (e.g. wallet still processing store-creation confirmations) doesn't
711+
* immediately fail the push. Returns true on success, false after all
712+
* retries are exhausted. Throws on permanent errors (e.g. store not owned).
713+
*
714+
* Interaction with pushChangeListToDataLayer's internal retry loop:
715+
* - pushChangeListToDataLayer has its own 5-attempt loop, but only retries
716+
* on "Already have a pending root" and "Key already present" errors.
717+
* - The target failure mode here ("Wallet needs to be fully synced")
718+
* falls through to the final `return false` after a single attempt.
719+
* - Because we've already waited for unconfirmed txs to clear in
720+
* _waitForWalletReadyForPush, the "pending root" path is unlikely to
721+
* compound here in practice.
722+
*
723+
* No-op in the simulator -- caller writes via the in-memory store instead.
724+
*
725+
* @param {string} storeType
726+
* @param {string} storeId
727+
* @param {Array} changeList
728+
* @param {Object} state
729+
* @returns {Promise<boolean>}
730+
* @private
731+
*/
732+
static async _pushWithSyncRetry(storeType, storeId, changeList, state) {
733+
const maxAttempts = ORG_CREATION_CONFIG.MAX_DATA_PUSH_SYNC_RETRIES;
734+
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
735+
try {
736+
await Organization._waitForWalletReadyForPush(1);
737+
} catch (waitError) {
738+
const level = attempt === maxAttempts ? 'error' : 'warn';
739+
logState(
740+
state,
741+
`Wallet-readiness wait before ${storeType} push attempt ${attempt}/${maxAttempts} timed out: ${waitError.message}`,
742+
level,
743+
);
744+
if (attempt === maxAttempts) return false;
745+
continue;
746+
}
747+
748+
const success = await pushChangeListToDataLayer(storeId, changeList, {
749+
skipTransactionWait: true,
750+
});
751+
if (success) {
752+
if (attempt > 1) {
753+
logState(
754+
state,
755+
`Push to ${storeType} store ${storeId} succeeded on attempt ${attempt}/${maxAttempts}`,
756+
);
757+
}
758+
return true;
759+
}
760+
761+
if (attempt < maxAttempts) {
762+
logState(
763+
state,
764+
`Push to ${storeType} store ${storeId} returned false (attempt ${attempt}/${maxAttempts}); waiting for wallet readiness and retrying`,
765+
'warn',
766+
);
767+
} else {
768+
logState(
769+
state,
770+
`Push to ${storeType} store ${storeId} returned false after ${maxAttempts} sync-retry attempts; giving up on synchronous path`,
771+
'error',
772+
);
773+
}
774+
}
775+
return false;
776+
}
777+
642778
/**
643779
* Push data to stores sequentially with a short delay between each.
644780
*
645-
* Calls pushChangeListToDataLayer directly, bypassing the hasUnconfirmedTransactions
646-
* gate in pushChangesWhenStoreIsAvailable. With coin splitting we maintain multiple
647-
* coins specifically so concurrent/back-to-back transactions work; the unconfirmed-tx
648-
* check is a legacy guard from single-coin days and would force a 30s retry delay
649-
* on the second push. We already know stores are confirmed from the previous step.
781+
* Before pushing, waits for the wallet to be synced with enough spendable
782+
* coins (pre-flight gate). After store creation the wallet is temporarily
783+
* desynced while processing confirmations; without this gate the first
784+
* batch_update RPC often fails with "Wallet needs to be fully synced",
785+
* which used to mark the entire creation as FAILED.
786+
*
787+
* Each push bypasses the legacy hasUnconfirmedTransactions gate in
788+
* pushChangesWhenStoreIsAvailable (with coin splitting we maintain multiple
789+
* coins specifically so back-to-back txs work), but is wrapped in a bounded
790+
* sync-retry loop (_pushWithSyncRetry) that re-checks wallet readiness
791+
* before each attempt. Only after those retries are exhausted do we fall
792+
* back to the fire-and-forget background retry + record the push as failed.
650793
*
651-
* Pushes are staggered by 2s so two batch_update RPCs don't hit the wallet at the
652-
* exact same instant (avoids a coin-selection race in the Chia wallet).
794+
* Pushes are staggered by 2s so two batch_update RPCs don't hit the wallet
795+
* at the exact same instant (avoids a coin-selection race in the Chia wallet).
653796
*
654797
* @param {Object} state - Current state
655798
* @returns {Promise<Object>} Updated state
@@ -665,6 +808,27 @@ class Organization extends Model {
665808

666809
logState(state, `Pushing data to ${storesNeedingData.length} stores`);
667810

811+
// Pre-flight: wait for the wallet to be synced with at least one spendable
812+
// coin before starting. This is a lightweight gate; the per-push retry
813+
// handles re-checks for subsequent pushes and the coin-management task
814+
// refills coins as they're consumed. Waiting for N coins here would be
815+
// stricter than necessary and could race coin splitting.
816+
if (!USE_SIMULATOR) {
817+
try {
818+
await Organization._waitForWalletReadyForPush(1);
819+
logState(state, 'Wallet is synced and ready for data push');
820+
} catch (waitError) {
821+
logState(
822+
state,
823+
`Wallet did not become ready for data push within ${ORG_CREATION_CONFIG.DATA_PUSH_WALLET_SYNC_WAIT_MS / 1000}s: ${waitError.message}`,
824+
'warn',
825+
);
826+
// Fall through: per-push _pushWithSyncRetry re-checks wallet
827+
// readiness before each attempt, so a slow-to-settle wallet still
828+
// has a chance to recover without aborting the whole creation.
829+
}
830+
}
831+
668832
const orgUidStoreId = state.stores[STORE_TYPES.ORG_UID].id;
669833
const dataModelVersionStoreId = state.stores[STORE_TYPES.DATA_MODEL_VERSION].id;
670834
const registryStoreId = state.stores[STORE_TYPES.REGISTRY].id;
@@ -724,9 +888,15 @@ class Organization extends Model {
724888
await simPush(storeId, changeList);
725889
success = true;
726890
} else {
727-
// Call persistance directly, skipping the legacy hasUnconfirmedTransactions gate.
728-
// With coin splitting we have multiple coins so back-to-back txs are fine.
729-
success = await pushChangeListToDataLayer(storeId, changeList, { skipTransactionWait: true });
891+
// Sync-retry loop with a wallet-sync wait before each attempt. This
892+
// absorbs transient "Wallet needs to be fully synced" errors that
893+
// would otherwise fail the whole creation.
894+
success = await Organization._pushWithSyncRetry(
895+
storeType,
896+
storeId,
897+
changeList,
898+
state,
899+
);
730900
}
731901

732902
if (success) {

src/utils/organization-creation-state.js

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,23 @@ export const ORG_CREATION_CONFIG = {
4141
STORE_CONFIRMATION_TIMEOUT_MS: 30 * 60 * 1000, // 30 minutes
4242
CONFIRMATION_POLL_INTERVAL_MS: 30 * 1000, // 30 seconds
4343
META_KEY: 'pendingOrgCreation',
44+
45+
// Wallet-readiness gate for the V1 data-push phase. After store creation
46+
// the wallet temporarily desyncs while processing confirmations;
47+
// batch_update RPCs made during that window fail with "Wallet needs to be
48+
// fully synced". _waitForWalletReadyForPush polls walletIsSynced +
49+
// hasAnyUnconfirmedTransactions + spendable coins for up to
50+
// DATA_PUSH_WALLET_SYNC_WAIT_MS before each push, and _pushWithSyncRetry
51+
// retries a returns-false push up to MAX_DATA_PUSH_SYNC_RETRIES times, so
52+
// a single transient desync no longer fails the entire creation pipeline.
53+
//
54+
// Worst-case budget per push is MAX_DATA_PUSH_SYNC_RETRIES *
55+
// DATA_PUSH_WALLET_SYNC_WAIT_MS = 6 min. With up to 4 stores pushed
56+
// sequentially that's ~24 min, which stays under
57+
// STORE_CONFIRMATION_TIMEOUT_MS. Keep these values in sync if either
58+
// bound changes.
59+
DATA_PUSH_WALLET_SYNC_WAIT_MS: 2 * 60 * 1000, // 2 minutes
60+
MAX_DATA_PUSH_SYNC_RETRIES: 3,
4461
};
4562

4663
/**

0 commit comments

Comments
 (0)