fix(e2e): stabilize nightly recovery coverage#5401
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR adds failed device-approval recovery in policy and watcher flows, expands Kimi inference compatibility handling for routed/live runs and nightly secrets, and simplifies one crash-loop recovery E2E guard assertion. ChangesOpenClaw approval recovery
Kimi inference compatibility coverage
Crash-loop recovery E2E assertion
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Code Coverage OverviewLanguages: TypeScript TypeScript / code-coverage/pluginThe overall coverage in the branch is 96%. Coverage data for the branch is not yet available. Show a code coverage summary of the most covered files.
TypeScript / code-coverage/cliThe overall coverage in the branch is 44%. Coverage data for the branch is not yet available. Show a code coverage summary of the most covered files.
Updated |
PR Review AdvisorFindings: 0 needs attention, 5 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Consider writing more tests for
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Auto-dispatched E2E: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
Vitest E2E Scenario RecommendationRequired Vitest E2E scenarios: Dispatch required Vitest E2E scenarios:
Full Vitest E2E advisor summaryVitest E2E Scenario AdvisorBase: Required Vitest E2E scenarios
Optional Vitest E2E scenarios
Relevant changed files
|
Selective E2E Results — ❌ Some jobs failedRun: 27481775942
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
scripts/nemoclaw-start.sh (1)
1882-2029: Run the sandbox boot E2Es for these entrypoint changes.These recovery branches only execute inside the sandbox entrypoint/watchers, so unit coverage will not exercise Landlock, non-root, and process-lifecycle behavior. Please run
sandbox-survival-e2e,sandbox-operations-e2e,cloud-e2e, andopenclaw-slack-pairing-e2ebefore merge.As per coding guidelines,
scripts/nemoclaw-start.sh: "This file is a sandbox entrypoint script. Changes affect every sandbox boot and are invisible to unit tests."Also applies to: 2424-2486
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/nemoclaw-start.sh` around lines 1882 - 2029, The new recovery logic in scripts/nemoclaw-start.sh should be validated with the sandbox boot end-to-end suites, since this entrypoint/watcher behavior is not covered by unit tests. Please run sandbox-survival-e2e, sandbox-operations-e2e, cloud-e2e, and openclaw-slack-pairing-e2e against the changes around run() and the pending-approval recovery branch to verify Landlock, non-root, and process-lifecycle behavior before merge.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/lib/openclaw_device_approval_policy.py`:
- Around line 211-213: The current code deletes the original pending entry using
request_id, but the pending dict is keyed by arbitrary labels (like 'original',
'replacement') not by request_id, leaving stale entries behind. In
scripts/lib/openclaw_device_approval_policy.py at lines 211-213, replace the
pending.pop(request_id, None) call to instead remove the original_key when it
exists, then conditionally remove recovery_key only if it differs from
original_key. In scripts/nemoclaw-start.sh at lines 2481-2482, mirror the same
key-based deletion logic in the inline shell recovery path so the interactive
openclaw devices approve command clears both stale entries consistently.
In `@test/e2e/test-issue-2478-crash-loop-recovery.sh`:
- Around line 227-253: Remove the stale third argument from every
gateway_guards_active call and delete the now-unused gateway_log_marker setup.
The function now only accepts pid and timeout, so update the call sites in
test/e2e/test-issue-2478-crash-loop-recovery.sh that still pass guard_log_start,
negative_guard_log_start, and restore_guard_log_start, and keep the
gateway_guards_active helper aligned with its 2-parameter signature.
In `@test/e2e/test-kimi-inference-compat.sh`:
- Around line 112-123: The ensure_public_nvidia_key() function currently
validates only NVIDIA_INFERENCE_API_KEY against the nvapi-* pattern, but it
should validate that at least one of the two alias variables
(NVIDIA_INFERENCE_API_KEY or NVIDIA_API_KEY) matches the pattern. After the
synchronization logic copies values between the two variables, update the final
validation check to return success if either NVIDIA_INFERENCE_API_KEY or
NVIDIA_API_KEY matches the nvapi-* pattern, ensuring both keys are treated as
valid alternatives for the validation.
---
Nitpick comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 1882-2029: The new recovery logic in scripts/nemoclaw-start.sh
should be validated with the sandbox boot end-to-end suites, since this
entrypoint/watcher behavior is not covered by unit tests. Please run
sandbox-survival-e2e, sandbox-operations-e2e, cloud-e2e, and
openclaw-slack-pairing-e2e against the changes around run() and the
pending-approval recovery branch to verify Landlock, non-root, and
process-lifecycle behavior before merge.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 05b49698-6094-4adc-8ec8-214535bfd4de
📒 Files selected for processing (12)
.github/workflows/nightly-e2e.yamlci/test-file-size-budget.jsonnemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.jsscripts/lib/openclaw_device_approval_policy.pyscripts/nemoclaw-start.shsrc/lib/actions/sandbox/auto-pair-approval.test.tssrc/lib/actions/sandbox/auto-pair-approval.tstest/e2e/test-issue-2478-crash-loop-recovery.shtest/e2e/test-kimi-inference-compat.shtest/kimi-inference-compat-plugin.test.tstest/nemoclaw-start.test.tstest/openclaw-device-approval-policy.test.ts
💤 Files with no reviewable changes (1)
- test/nemoclaw-start.test.ts
| pending.pop(request_id, None) | ||
| if recovery_key: | ||
| pending.pop(recovery_key, None) |
There was a problem hiding this comment.
Delete the original pending entry by its map key, not by requestId.
Both recovery implementations mutate pending.json as a dict keyed by arbitrary labels (original, replacement, ...), but the success path removes the original request with pending.pop(request_id, None). When OpenClaw leaves both the original pending entry and a replacement entry behind, the replacement branch reports recovery success while the original entry stays pending.
scripts/lib/openclaw_device_approval_policy.py#L211-L213: removeoriginal_keywhen it exists, then removerecovery_keyonly if it is different from the original key.scripts/nemoclaw-start.sh#L2481-L2482: mirror the same key-based deletion in the inline shell recovery path so interactiveopenclaw devices approveclears both stale entries too.
📍 Affects 2 files
scripts/lib/openclaw_device_approval_policy.py#L211-L213(this comment)scripts/nemoclaw-start.sh#L2481-L2482
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/lib/openclaw_device_approval_policy.py` around lines 211 - 213, The
current code deletes the original pending entry using request_id, but the
pending dict is keyed by arbitrary labels (like 'original', 'replacement') not
by request_id, leaving stale entries behind. In
scripts/lib/openclaw_device_approval_policy.py at lines 211-213, replace the
pending.pop(request_id, None) call to instead remove the original_key when it
exists, then conditionally remove recovery_key only if it differs from
original_key. In scripts/nemoclaw-start.sh at lines 2481-2482, mirror the same
key-based deletion logic in the inline shell recovery path so the interactive
openclaw devices approve command clears both stale entries consistently.
| gateway_guards_active() { | ||
| local pid="$1" | ||
| local timeout="${2:-30}" | ||
| local log_boundary="${3:-0}" | ||
| local elapsed=0 | ||
|
|
||
| if [ -z "$pid" ]; then | ||
| return 1 | ||
| fi | ||
|
|
||
| local env_contents | ||
| env_contents="$(proxy_env_contents)" | ||
| if ! echo "$env_contents" | grep -q 'nemoclaw-sandbox-safety-net'; then | ||
| echo " [guards] proxy-env.sh missing safety-net export" | ||
| return 1 | ||
| fi | ||
| if ! echo "$env_contents" | grep -q 'nemoclaw-ciao-network-guard'; then | ||
| echo " [guards] proxy-env.sh missing ciao-network-guard export" | ||
| return 1 | ||
| fi | ||
|
|
||
| while [ "$elapsed" -lt "$timeout" ]; do | ||
| if gateway_log_after_boundary "$log_boundary" | grep -Eq '\[sandbox-safety-net\] loaded \((openclaw-gateway|launcher)\)' \ | ||
| && gateway_log_after_boundary "$log_boundary" | grep -Eq '\[guard\] ciao-network-guard loaded \((openclaw-gateway|launcher)\)'; then | ||
| # Confirm gateway is still alive after guard activations. | ||
| if [ -n "$(gateway_pid)" ]; then | ||
| return 0 | ||
| fi | ||
| echo " [guards] guard fired but gateway no longer running" | ||
| return 1 | ||
| fi | ||
| # Backward-compatible proof for older images: this line is emitted by | ||
| # the ciao preload only when ciao calls os.networkInterfaces(). | ||
| if gateway_log_after_boundary "$log_boundary" | grep -Fq '[guard] os.networkInterfaces() failed:'; then | ||
| local env_contents | ||
| env_contents="$(proxy_env_contents)" | ||
| if echo "$env_contents" | grep -q 'nemoclaw-sandbox-safety-net' \ | ||
| && echo "$env_contents" | grep -q 'nemoclaw-ciao-network-guard'; then | ||
| if [ -n "$(gateway_pid)" ]; then | ||
| return 0 | ||
| fi | ||
| echo " [guards] guard fired but gateway no longer running" | ||
| echo " [guards] proxy-env.sh has guard exports but gateway no longer running" | ||
| return 1 | ||
| fi | ||
| sleep 3 | ||
| elapsed=$((elapsed + 3)) | ||
| done | ||
|
|
||
| echo " [guards] no fresh gateway-process guard activation signatures in gateway.log within ${timeout}s" | ||
| echo " [guards] proxy-env.sh missing safety-net or ciao guard exports within ${timeout}s" | ||
| return 1 | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Find all gateway_guards_active call sites and gateway_log_marker assignments
# Find all calls to gateway_guards_active
echo "=== gateway_guards_active call sites ==="
rg -n 'gateway_guards_active\s+' test/e2e/test-issue-2478-crash-loop-recovery.sh
# Find all gateway_log_marker assignments that capture into *_guard_log_start variables
echo ""
echo "=== gateway_log_marker assignments to *guard_log_start variables ==="
rg -n '(guard_log_start|negative_guard_log_start|restore_guard_log_start)=.*gateway_log_marker' test/e2e/test-issue-2478-crash-loop-recovery.shRepository: NVIDIA/NemoClaw
Length of output: 729
🏁 Script executed:
# Read the function definition to confirm its signature
sed -n '227,253p' test/e2e/test-issue-2478-crash-loop-recovery.shRepository: NVIDIA/NemoClaw
Length of output: 771
🏁 Script executed:
# Check if the marker variables are used anywhere beyond their assignment and the call sites
echo "=== Checking for other uses of guard_log_start ==="
rg -n 'guard_log_start' test/e2e/test-issue-2478-crash-loop-recovery.sh
echo ""
echo "=== Checking for other uses of negative_guard_log_start ==="
rg -n 'negative_guard_log_start' test/e2e/test-issue-2478-crash-loop-recovery.sh
echo ""
echo "=== Checking for other uses of restore_guard_log_start ==="
rg -n 'restore_guard_log_start' test/e2e/test-issue-2478-crash-loop-recovery.shRepository: NVIDIA/NemoClaw
Length of output: 1116
Function signature change incomplete — four call sites still pass the removed log_boundary parameter.
The function signature was updated to accept only 2 parameters (pid and timeout), but the following call sites still pass three arguments:
- Line 556:
gateway_guards_active "$new_pid" 30 "$guard_log_start" - Line 639:
gateway_guards_active "$NEGATIVE_PID" 30 "$negative_guard_log_start" - Line 665:
gateway_guards_active "$NEGATIVE_PID" 5 "$negative_guard_log_start" - Line 685:
gateway_guards_active "$SOAK_START_PID" 30 "$restore_guard_log_start"
While bash silently ignores extra arguments, the variables guard_log_start (line 528), negative_guard_log_start (line 600), and restore_guard_log_start (line 674) are computed via gateway_log_marker() but never used—they only serve as dead 3rd arguments.
Remove the third argument from all four call sites and delete the unused gateway_log_marker() invocations.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e/test-issue-2478-crash-loop-recovery.sh` around lines 227 - 253,
Remove the stale third argument from every gateway_guards_active call and delete
the now-unused gateway_log_marker setup. The function now only accepts pid and
timeout, so update the call sites in
test/e2e/test-issue-2478-crash-loop-recovery.sh that still pass guard_log_start,
negative_guard_log_start, and restore_guard_log_start, and keep the
gateway_guards_active helper aligned with its 2-parameter signature.
| ensure_public_nvidia_key() { | ||
| if [ -z "${NVIDIA_INFERENCE_API_KEY:-}" ] && [ -n "${NVIDIA_API_KEY:-}" ]; then | ||
| export NVIDIA_INFERENCE_API_KEY="$NVIDIA_API_KEY" | ||
| fi | ||
| if [ -z "${NVIDIA_API_KEY:-}" ] && [ -n "${NVIDIA_INFERENCE_API_KEY:-}" ]; then | ||
| export NVIDIA_API_KEY="$NVIDIA_INFERENCE_API_KEY" | ||
| fi | ||
| if [ -n "${NVIDIA_INFERENCE_API_KEY:-}" ] && [[ "${NVIDIA_INFERENCE_API_KEY}" == nvapi-* ]]; then | ||
| return 0 | ||
| fi | ||
| return 1 | ||
| } |
There was a problem hiding this comment.
Validate both NVIDIA key aliases against the same rule before returning success.
ensure_public_nvidia_key() currently succeeds/fails solely based on NVIDIA_INFERENCE_API_KEY after one-way fill. If both vars are set and only NVIDIA_API_KEY is valid, the function can fail despite the “either key” contract.
Suggested fix
ensure_public_nvidia_key() {
- if [ -z "${NVIDIA_INFERENCE_API_KEY:-}" ] && [ -n "${NVIDIA_API_KEY:-}" ]; then
- export NVIDIA_INFERENCE_API_KEY="$NVIDIA_API_KEY"
- fi
- if [ -z "${NVIDIA_API_KEY:-}" ] && [ -n "${NVIDIA_INFERENCE_API_KEY:-}" ]; then
- export NVIDIA_API_KEY="$NVIDIA_INFERENCE_API_KEY"
- fi
- if [ -n "${NVIDIA_INFERENCE_API_KEY:-}" ] && [[ "${NVIDIA_INFERENCE_API_KEY}" == nvapi-* ]]; then
- return 0
- fi
- return 1
+ local candidate=""
+ if [ -n "${NVIDIA_API_KEY:-}" ] && [[ "${NVIDIA_API_KEY}" == nvapi-* ]]; then
+ candidate="${NVIDIA_API_KEY}"
+ elif [ -n "${NVIDIA_INFERENCE_API_KEY:-}" ] && [[ "${NVIDIA_INFERENCE_API_KEY}" == nvapi-* ]]; then
+ candidate="${NVIDIA_INFERENCE_API_KEY}"
+ else
+ return 1
+ fi
+ export NVIDIA_API_KEY="${candidate}"
+ export NVIDIA_INFERENCE_API_KEY="${candidate}"
+ return 0
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ensure_public_nvidia_key() { | |
| if [ -z "${NVIDIA_INFERENCE_API_KEY:-}" ] && [ -n "${NVIDIA_API_KEY:-}" ]; then | |
| export NVIDIA_INFERENCE_API_KEY="$NVIDIA_API_KEY" | |
| fi | |
| if [ -z "${NVIDIA_API_KEY:-}" ] && [ -n "${NVIDIA_INFERENCE_API_KEY:-}" ]; then | |
| export NVIDIA_API_KEY="$NVIDIA_INFERENCE_API_KEY" | |
| fi | |
| if [ -n "${NVIDIA_INFERENCE_API_KEY:-}" ] && [[ "${NVIDIA_INFERENCE_API_KEY}" == nvapi-* ]]; then | |
| return 0 | |
| fi | |
| return 1 | |
| } | |
| ensure_public_nvidia_key() { | |
| local candidate="" | |
| if [ -n "${NVIDIA_API_KEY:-}" ] && [[ "${NVIDIA_API_KEY}" == nvapi-* ]]; then | |
| candidate="${NVIDIA_API_KEY}" | |
| elif [ -n "${NVIDIA_INFERENCE_API_KEY:-}" ] && [[ "${NVIDIA_INFERENCE_API_KEY}" == nvapi-* ]]; then | |
| candidate="${NVIDIA_INFERENCE_API_KEY}" | |
| else | |
| return 1 | |
| fi | |
| export NVIDIA_API_KEY="${candidate}" | |
| export NVIDIA_INFERENCE_API_KEY="${candidate}" | |
| return 0 | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e/test-kimi-inference-compat.sh` around lines 112 - 123, The
ensure_public_nvidia_key() function currently validates only
NVIDIA_INFERENCE_API_KEY against the nvapi-* pattern, but it should validate
that at least one of the two alias variables (NVIDIA_INFERENCE_API_KEY or
NVIDIA_API_KEY) matches the pattern. After the synchronization logic copies
values between the two variables, update the final validation check to return
success if either NVIDIA_INFERENCE_API_KEY or NVIDIA_API_KEY matches the nvapi-*
pattern, ensuring both keys are treated as valid alternatives for the
validation.
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Selective E2E Results — ❌ Some jobs failedRun: 27482229875
|
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Selective E2E Results — ❌ Some jobs failedRun: 27482361445
|
## Summary Declares `supportsStore: false` in the OpenClaw Kimi K2.6 model-specific setup manifest so managed NVIDIA/Kimi routes get the same compatibility shape as the previous custom mock path. This fixes the live `kimi-inference-compat-e2e` K2 config assertion that failed after #5401 moved the scenario to public NVIDIA Endpoints. ## Related Issue Related to #5401. ## Changes - Add `supportsStore: false` to `nemoclaw-blueprint/model-specific-setup/openclaw/kimi-k2.6-managed-inference.json`. - Update `test/generate-openclaw-config.test.ts` so registry-only Kimi compat includes the store flag. ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [ ] Git hooks passed during commit and push, or `npx prek run --from-ref main --to-ref HEAD` passes - [x] Targeted tests pass for changed behavior - [ ] Full `npm test` passes (broad runtime changes only) - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `npm run docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) Targeted verification: - `npx biome check --write nemoclaw-blueprint/model-specific-setup/openclaw/kimi-k2.6-managed-inference.json test/generate-openclaw-config.test.ts` - `npx vitest run --project cli test/generate-openclaw-config.test.ts test/validate-config-schemas.test.ts --testNamePattern "Kimi|model-specific|registry compat"` Docs review: no user-facing docs changes needed; this only completes an existing model-specific OpenClaw compatibility manifest. Note: local broad pre-commit/pre-push hooks currently fail in unrelated runtime recovery preload tests because temp preload files are seen as group-writable (`mode=664`), matching the existing local hook failure observed on #5401 follow-up work. Targeted changed tests passed. --- Signed-off-by: Carlos Villela <cvillela@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Updated model-specific configuration to disable store support * **Tests** * Updated test case formatting <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Summary
Stabilizes the nightly E2E follow-up failures by keeping the Kimi scenario on the public NVIDIA Kimi endpoint, recognizing the generated routed Kimi model ref, and repairing narrow OpenClaw scope-approval failures that report nonzero after local state changes. It also aligns the crash-loop recovery assertion with the current gateway guard markers and keeps the test-size budget ratcheted after moving coverage into a focused policy test.
Related Issue
Related to #2478, #4462, #2620, #3046.
Changes
nvidia-prodwithmoonshotai/kimi-k2.6; retain the local mock only behindNEMOCLAW_KIMI_USE_MOCK=1and sanitize Kimi failure logs.inference/moonshotai/kimi-k2.6model ref in the Kimi compatibility plugin.operator.admin.test/nemoclaw-start.test.tssize budget downward.Type of Change
Verification
npx prek run --from-ref main --to-ref HEADpassesnpm testpasses (broad runtime changes only)npm run docsbuilds without warnings (doc changes only)Docs review found no user-facing docs changes were warranted;
docs/and generated.agents/skills/remained clean.Signed-off-by: Carlos Villela cvillela@nvidia.com
Summary by CodeRabbit
New Features
Bug Fixes
Security & CI
Tests / Chores