Skip to content

ci: prototype E2E advisor#1

Open
jyaunches wants to merge 23 commits into
mainfrom
ci/e2e-advisor-prototype
Open

ci: prototype E2E advisor#1
jyaunches wants to merge 23 commits into
mainfrom
ci/e2e-advisor-prototype

Conversation

@jyaunches

Copy link
Copy Markdown
Owner

Summary

  • add a Phase 1 static E2E advisor prototype workflow
  • add an E2E manifest that maps jobs/scripts to semantic risk domains
  • add deterministic rules that classify changed files into domains and emit JSON/Markdown artifacts

Safety model

  • dry-run only: uploads artifacts and writes the job summary
  • read-only permissions: contents and pull requests only
  • static analysis only: reads git diffs/metadata and does not execute PR-provided repo scripts
  • no PR comments, labels, workflow dispatch, model credentials, or privileged tokens

Outputs

  • artifacts/e2e-advisor/e2e-advisor-input.json
  • artifacts/e2e-advisor/e2e-advisor-result.json
  • artifacts/e2e-advisor/e2e-advisor-summary.md

Test

  • ran locally: node tools/e2e-advisor/advisor.mjs --base origin/main --head HEAD --out-dir /tmp/e2e-advisor-smoke

@github-actions

github-actions Bot commented May 8, 2026

Copy link
Copy Markdown

🚀 Docs preview ready!

https://NVIDIA.github.io/NemoClaw/pr-preview/pr-1/

@jyaunches jyaunches force-pushed the ci/e2e-advisor-prototype branch from 8432f66 to 01c4818 Compare May 8, 2026 18:01
jyaunches added 15 commits May 8, 2026 14:08
Addresses the two minor items from the PR NVIDIA#3289 review:

- tools/e2e-advisor/comment.mjs: move findExistingComment() inside the
  try/catch so a 403 from listing existing comments is treated as a
  permission error and skipped, matching the PATCH/POST error handling.
  Also coerce non-Error throwables to strings before logging.

- tools/e2e-advisor/pi-analyze.mjs: reject non-object Pi output (null,
  arrays, primitives) explicitly in normalizePiResult instead of letting
  property access on a non-object silently produce an empty result.

Refs: PR NVIDIA#3289 review by @ericksoa
Restructures the e2e-advisor workflow to fix the trusted-code boundary
violation flagged in PR NVIDIA#3289 review.

Before: the workflow checked out the PR workspace and then invoked
tools/e2e-advisor/pi-analyze.mjs and comment.mjs *from that same
workspace* while API keys (ANTHROPIC_API_KEY, OPENAI_API_KEY,
PI_E2E_ADVISOR_API_KEY, etc.) and an issues-write token were in scope.
A future same-repo PR could have modified the advisor scripts
themselves and executed arbitrary Node in a secrets-bearing job.

After:

- Advisor code (pi-analyze.mjs, comment.mjs, schema.json,
  pi-models.template.json) is always checked out from NVIDIA/NemoClaw
  main into $GITHUB_WORKSPACE/advisor and is the only code executed
  in the job. Updates to the advisor flow through main review.

- PR content (or dispatch ref, or cloned target PR) is checked out as
  inert analysis data into $GITHUB_WORKSPACE/pr-workdir. The advisor
  scripts read it via git diff / fs.readFileSync; nothing from this
  directory is executed as code.

- Both checkouts use persist-credentials: false.

- @mariozechner/pi-coding-agent is pinned via PI_AGENT_VERSION (0.73.1)
  so a compromised upstream release cannot execute automatically;
  upgrades go through a normal code-review change to this file.

- Removed the top-level 'npm ci' install step — the advisor scripts
  use only Node built-ins, so the previous install brought the full
  repo devDependency tree into the secret-bearing job for no benefit.

Refs: PR NVIDIA#3289 review by @ericksoa; CodeRabbit comment on pin.
jyaunches added a commit that referenced this pull request May 11, 2026
Lands shared fixtures, helpers, assertion helpers, install-method
splits, conventions + lint, and the parity-compare CI harness that
unblock the per-wave migration phases (2\u201312).

Deliverables (per specs/2026-05-11_e2e-test-migration/spec.md Phase 1):

Fixtures (test/e2e/lib/fixtures/):
- fake-openai.sh: local OpenAI-compatible endpoint (Risk NVIDIA#2 mitigation)
- fake-{telegram,discord,slack}.sh: messaging stubs via shared
  _fake-http-stub.sh harness
- older-base-image.sh: tagged ghcr base-image Dockerfile generator

Helpers (test/e2e/lib/):
- logging.sh: canonical e2e_{section,info,pass,fail} with stable
  PASS:/FAIL:/=== Phase markers (absorbs reuse category #1)
- sandbox-exec.sh: canonical nemoclaw-shell wrapper with safe quoting,
  exit-code propagation, and dry-run short-circuit (category NVIDIA#10)
- env.sh: auto-sources logging.sh so every consumer gets it for free

Assertion helpers (test/e2e/lib/assert/):
- inference-works.sh: chat-completion round-trip
- no-credentials-leaked.sh: credential-pattern scan
- policy-preset-applied.sh: gateway policy preset verification
- messaging-bridge-reachable.sh: L7 proxy / bridge reachability

Install dispatcher splits (test/e2e/lib/setup/):
- install-{repo,curl,ollama,launchable}.sh (four profiles)
- install.sh: dispatcher routes by profile/method name (category NVIDIA#5)

Runtime probe wiring:
- run-scenario.sh: adds --validate-only flag (probe-only, no setup)
- resolver/index.ts: E2E_PROBE_OVERRIDES_JSON escape hatch for keys
  with embedded underscores (e.g. security.policy_engine)

Convention lint + parity harness:
- scripts/e2e/lint-conventions.ts: enforces 6 conventions on suite
  step scripts + requires parity-map.yaml entries for legacy scripts
- scripts/e2e/compare-parity.sh: diffs legacy vs scenario PASS/FAIL
  via parity-map.yaml; flaky: true marker supported (Risk NVIDIA#4)
- test/e2e/parity-map.yaml: seeded with one entry per existing legacy
  script; migration phases 2\u201312 append assertion mappings
- .github/workflows/e2e-parity-compare.yaml: dispatches legacy script
  + migrated scenario on same runner and diffs outcomes

Tests (all passing, 41 total):
- test/e2e-lib-helpers.test.ts: +18 tests (1.A\u20131.E)
- test/e2e-convention-lint.test.ts: +11 tests (1.G\u20131.H)
- test/e2e-expected-state-validator.test.ts: +2 tests (1.F)

No regressions: full cli Vitest project (3258 tests) still green.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant