Skip to content

fix(test): add retry logic for session lock in R3 exec smoke tests#9

Merged
Piboonsak merged 2 commits intomainfrom
copilot/fix-session-lock-retry-logic
Mar 8, 2026
Merged

fix(test): add retry logic for session lock in R3 exec smoke tests#9
Piboonsak merged 2 commits intomainfrom
copilot/fix-session-lock-retry-logic

Conversation

Copy link

Copilot AI commented Mar 8, 2026

C1/C2 exec smoke tests fail intermittently when the deploy pipeline runs while the bot has an active session — exec date times out and whoami hits session file locked (timeout 10000ms).

Summary

  • Problem: C1/C2 ran openclaw agent immediately post-deploy with no retry, causing failures when a prior session held the lock.
  • Why it matters: Flaky CI blocks every deploy even when the underlying system is healthy.
  • What changed: Added retry_exec helper (3 attempts, 10 s backoff) using openclaw exec --text; added 15 s settle wait after lock-file cleanup; rewired C1/C2 to use retry_exec.
  • What did NOT change: All other test categories (A–H), runtime config, bot code, docker setup.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

None. Test-only change; no runtime behavior affected.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: VPS Linux (Hostinger)
  • Runtime/container: openclaw-sgnl-openclaw-1
  • Model/provider: N/A
  • Integration/channel (if any): N/A
  • Relevant config (redacted): N/A

Steps

  1. Trigger deploy while bot has an active LINE/Signal session
  2. Deploy runs tests/r3-regression-tests.sh immediately after container restart
  3. C1 (exec date) or C2 (exec whoami) hits session lock

Expected

  • Both tests pass after retrying once the lock is released

Actual (before fix)

  • C1: timeout after 30 s
  • C2: session file locked (timeout 10000ms)

Evidence

Human Verification (required)

  • Verified scenarios: bash syntax validated (bash -n); logic traced manually for both pass and fail paths
  • Edge cases checked: all 3 retry attempts fail → fail with last captured output; approval/permission error on first success → still fails C1 correctly
  • What you did not verify: live execution against a locked container (requires active session race)

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: revert tests/r3-regression-tests.sh to previous commit
  • Files/config to restore: tests/r3-regression-tests.sh only
  • Known bad symptoms: C1/C2 consistently fail → retry_exec itself may have a path issue; check openclaw exec --text availability in container

Risks and Mitigations

  • Risk: 15 s settle wait adds ~45 s worst-case overhead to CI (3 retries × 10 s + 15 s pre-wait)
    • Mitigation: Only triggered post-deploy; acceptable tradeoff vs. flaky failures blocking every deploy.
Original prompt

This section details on the original issue you should resolve

<issue_title>fix(test): R3 exec smoke tests need retry logic for session lock</issue_title>
<issue_description>## Problem
R3 regression tests C1/C2 fail when deploy runs while bot has active session.

Errors:

  • C1: exec date timeout 30s
  • C2: session file locked (timeout 10000ms)

File to modify

tests/r3-regression-tests.sh

Changes needed

1. Add retry helper function (before Section C)

retry_exec() {
  local cmd="$1" max=3 delay=10 attempt=1
  while [ $attempt -le $max ]; do
    result=$(docker exec "$CONTAINER" openclaw exec --text "$cmd" 2>&1) && return 0
    echo "  ⟳ Attempt $attempt/$max failed, retrying in ${delay}s..."
    sleep $delay
    attempt=$((attempt + 1))
  done
  return 1
}

2. Add pre-test wait before Section C

echo "Waiting 15s for session to settle..."
sleep 15

3. Replace direct exec calls in C1/C2 with retry_exec

Safety

  • ONLY modify tests/r3-regression-tests.sh
  • Do NOT touch any config, runtime, or bot files

Related

Comments on the Issue (you are @copilot in this section)


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Co-authored-by: Piboonsak <17107792+Piboonsak@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix session lock issues in R3 exec smoke tests fix(test): add retry logic for session lock in R3 exec smoke tests Mar 8, 2026
Copy link
Owner

@Piboonsak Piboonsak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ LGTM! Changes match spec exactly. Only test script modified, retry logic + pre-wait added.

@Piboonsak Piboonsak marked this pull request as ready for review March 8, 2026 12:37
@Piboonsak Piboonsak merged commit 03d422d into main Mar 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(test): R3 exec smoke tests need retry logic for session lock

2 participants