fix: add retry logic for transient wallet errors in parallel store creation#1521
Merged
Conversation
The reclaim-home endpoint documentation was present in both V1 and V2 API docs but missing from their tables of contents, making it undiscoverable when browsing.
V1: add Get Organization Creation Status, Commit all projects in STAGING, and Commit specific STAGING records by UUID to the TOC. V2: add List/Filter project and unit GET examples (by orgUid, program data, marketplace units, tokenized) and Create tokenized unit on Chia POST example to the TOC.
…overnance heading
…eation _createStoresInParallel had no retry logic for transient wallet errors, causing V2 org creation to fail permanently when the Chia wallet's DataLayer wallet was in a transitional state. This was especially likely with parallel store creation since multiple simultaneous create_new_dl RPC calls compound the race condition. Add per-store retry logic (10 attempts, 30s delay) matching the existing pattern in addV2ToExistingGovernanceBody. Transient errors including "DataLayer Wallet already exists" (downstream symptom of "DataLayerWallet not available" race) are now retried instead of causing immediate failure.
The waitForSync loop in getSubscribedStoreData() could block indefinitely if a store never finishes syncing. Add a 10-minute deadline (matching the timeout used in the v2 getRegistryStoreIdFromSingleton) so a stuck store throws instead of causing an infinite blocking loop.
fix: add 10-minute timeout to getSubscribedStoreData sync wait loop
Update Managed Files
…r checks
The case-sensitive includes('wallet') check didn't match any of the
specific Chia error messages (which all use uppercase 'Wallet') and
instead acted as a catch-all for unrelated errors containing the
substring, causing unnecessary retries of up to 5 minutes.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Missing
DataLayer Wallet already existsin upgradeFromV1 retry- Added
DataLayer Wallet already existstoupgradeFromV1transient retry checks so store creation now retries this known transitional wallet error.
- Added
Or push these changes by commenting:
@cursor push 559f3cde55
Preview (559f3cde55)
diff --git a/src/models/v2/organizations-v2.model.js b/src/models/v2/organizations-v2.model.js
--- a/src/models/v2/organizations-v2.model.js
+++ b/src/models/v2/organizations-v2.model.js
@@ -922,6 +922,7 @@
const isTransient =
error.message?.includes('Wallet needs to be fully synced') ||
error.message?.includes('DataLayerWallet not available') ||
+ error.message?.includes('DataLayer Wallet already exists') ||
error.message?.includes('No spendable coins');
if (isTransient && attempt < maxStoreCreateRetries) {This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
docs: add reclaim-home endpoint to table of contents
COIN_SIZE was hardcoded to 1,000,000 mojos while CADT operations require DEFAULT_COIN_AMOUNT + DEFAULT_FEE (typically 600,000,000 mojos). This caused a perpetual splitting loop where coins were created 600x too small to be usable, wasting fees and temporarily draining spendable balance. Set COIN_SIZE = DEFAULT_COIN_AMOUNT + DEFAULT_FEE so each split coin can independently fund one full DataLayer operation. Add a splitInProgress flag so mirror-check tasks log a warning instead of an error when balance is temporarily reduced during a split.
fix: size split coins to match operational requirements
…etry check
The isTransient check in upgradeFromV1 was missing this error string
after the overly broad includes('wallet') was removed, causing the
v1-to-v2 upgrade path to fail permanently on this transient error
instead of retrying.
…nParallel If the for loop somehow exhausted without returning (e.g. maxRetries changed to 0), the async callback would return undefined, causing a TypeError when downstream code accesses result.success.
Chia's default xch_spam_amount is 1,000,000 mojos. Coins smaller than this may be filtered out by the wallet's spam filter. Since this setting isn't available via RPC and CADT may run on a different host, use the default as a floor so split coins are never below the dust threshold.
Test used COIN_SIZE = MIN_USABLE_COIN_SIZE (3,300) but production computes COIN_SIZE = Math.max(MIN_USABLE_COIN_SIZE, DUST_FILTER_FLOOR) which equals 1,000,000. Updated constants and assertions to match the actual coin-splitting arithmetic.
Both _createStoresInParallel and upgradeFromV1 maintained independent copies of the transient wallet error list. This duplication already caused a drift bug caught during this PR. Extract to a single module-level helper.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Summary
_createStoresInParallelhad no retry logic for transient wallet errors, causing V2 org creation to fail permanently when the Chia wallet's DataLayer wallet was in a transitional stateaddV2ToExistingGovernanceBody, treatingDataLayer Wallet already exists,DataLayerWallet not available,Wallet needs to be fully synced, andNo spendable coinsas retryable transient errorsget_dl_wallet()fails but the wallet actually exists, making this retry logic criticalContext
Observed in CI run #22746247842 — all 3 parallel
create_new_dlRPC calls failed at the same moment withDataLayer Wallet already exists for this key, a race condition in the Chia wallet where the DataLayer wallet exists but isn't accessible viaget_dl_wallet()yet. The org creation failed permanently with no retry, while the wallet recovered moments later.Test plan
Note
Medium Risk
Touches wallet/DataLayer operational flows (store creation, syncing, coin splitting, mirror creation), where timing and balance edge cases can affect production behavior. Changes are bounded to retry/timeout/guard logic but could impact org-creation latency and background tasks under failure modes.
Overview
Hardens V2 org creation against transient Chia wallet/DataLayer states. Parallel store creation now retries per-store (10 attempts, 30s delay) when known transient wallet errors occur, and reuses a centralized transient-error matcher.
Adds guardrails around DataLayer operations.
syncServicenow enforces a 10-minute sync wait timeout with a configurable polling interval, and mirror creation’s “insufficient funds” path downgrades to a warning when a coin split is in progress.Refactors coin management to avoid dust/spam-filter issues and expose split state. Coin size is derived from
DEFAULT_COIN_AMOUNT + feewith a dust floor, split execution is centralized with asplitInProgressflag exported viaisSplitInProgress(), and related integration tests/docs are updated; Dependabot pip reviewers are also adjusted.Written by Cursor Bugbot for commit ffbf4a3. This will update automatically on new commits. Configure here.