Skip to content

push_repo_memory.cjs has no retry/backoff, fails on concurrent pushes #19476

@samuelkahessay

Description

@samuelkahessay

What happens

push_repo_memory.cjs performs a single pull-then-push sequence to update the repo-memory branch. If the push fails — typically because another concurrent workflow pushed to the same branch between the pull and push — the script calls core.setFailed() and exits immediately. There is no retry loop or exponential backoff.

In a parallel pipeline (multiple agents dispatched simultaneously), all agents that complete their work around the same time will race on the repo-memory push. The first push wins; every subsequent push fails with a non-fast-forward error and the workflow step is marked as failed.

What should happen

The push should retry with exponential backoff on non-fast-forward failures. The pattern is: pull (with merge strategy), push, and if push fails due to a concurrent update, pull again and retry. This is a standard optimistic concurrency pattern for git-based shared state.

Where in the code

All references are to main at 99b2107.

Single pull:

  • push_repo_memory.cjs:372-380:
    try {
      const repoUrl = `https://x-access-token:${ghToken}@${serverHost}/${targetRepo}.git`;
      execGitSync(["pull", "--no-rebase", "-X", "ours", repoUrl, branchName], { stdio: "inherit" });
    } catch (error) {
      core.warning(`Pull failed (this may be expected): ${getErrorMessage(error)}`);
    }

Single push with hard failure:

  • push_repo_memory.cjs:382-390:
    try {
      const repoUrl = `https://x-access-token:${ghToken}@${serverHost}/${targetRepo}.git`;
      execGitSync(["push", repoUrl, `HEAD:${branchName}`], { stdio: "inherit" });
      core.info(`Successfully pushed changes to ${branchName} branch`);
    } catch (error) {
      core.setFailed(`Failed to push changes: ${getErrorMessage(error)}`);
      return;
    }

No retry logic: No loop, no backoff, no re-pull on push failure anywhere in the script.

Evidence

Source-level verification (2026-03-03):

  • Searched entire push_repo_memory.cjs for retry, backoff, loop, or re-attempt logic — none found
  • The catch block on push (line 388-390) calls core.setFailed() and returns immediately
  • The pull step's catch block (line 378-380) logs a warning but doesn't trigger a retry flow

Race condition analysis:

  • In a 4-agent parallel pipeline, all agents may finish within seconds of each other
  • Each agent's safe-outputs job runs push_repo_memory.cjs independently
  • All agents pull the same state, commit their changes locally, then push
  • First push succeeds; subsequent pushes get rejected (non-fast-forward) because their local branch is behind
  • Without retry, those pushes hard-fail and repo-memory updates are lost

Proposed fix

Wrap the pull-push sequence in a retry loop with exponential backoff:

const MAX_RETRIES = 3;
const BASE_DELAY_MS = 1000;

for (let attempt = 0; attempt <= MAX_RETRIES; attempt++) {
  try {
    execGitSync(["pull", "--no-rebase", "-X", "ours", repoUrl, branchName], { stdio: "inherit" });
  } catch (error) {
    core.warning(`Pull failed (this may be expected): ${getErrorMessage(error)}`);
  }

  try {
    execGitSync(["push", repoUrl, `HEAD:${branchName}`], { stdio: "inherit" });
    core.info(`Successfully pushed changes to ${branchName} branch`);
    return; // success
  } catch (error) {
    if (attempt < MAX_RETRIES) {
      const delay = BASE_DELAY_MS * Math.pow(2, attempt);
      core.warning(`Push failed (attempt ${attempt + 1}/${MAX_RETRIES + 1}), retrying in ${delay}ms...`);
      // wait delay
    } else {
      core.setFailed(`Failed to push changes after ${MAX_RETRIES + 1} attempts: ${getErrorMessage(error)}`);
      return;
    }
  }
}

Impact

Frequency: Every parallel pipeline dispatch. In our 4-agent runs, at least 1-2 repo-memory pushes fail per batch.
Cost: Moderate — repo-memory updates are lost for the failing agents. The step failure also adds noise to the run summary. The data isn't critical (repo-memory is advisory), but the step failure can be confusing for operators triaging run results.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions