fix: replace xpia.md with safe inline policy to prevent cyber_policy_violation in Codex workflows#1494
Conversation
There was a problem hiding this comment.
Pull request overview
Mitigates OpenAI cyber_policy_violation failures in Codex (OpenAI) workflows by replacing the xpia.md prompt inclusion with a neutral, inline policy block that preserves the same sandbox-boundary intent without using filter-triggering terminology.
Changes:
- Add a Codex-only post-processing pass to replace
cat .../xpia.mdwith a safe inline heredoc policy. - Apply the same replacement directly to the two Codex lock workflows so the fix takes effect immediately.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| scripts/ci/postprocess-smoke-workflows.ts | Adds Codex-only replacement logic for xpia.md via regex + heredoc content. |
| .github/workflows/smoke-codex.lock.yml | Replaces cat .../xpia.md with inline safe policy block in the generated prompt. |
| .github/workflows/secret-digger-codex.lock.yml | Same inline safe policy replacement, preserving >> "$GH_AW_PROMPT" behavior. |
Comments suppressed due to low confidence (1)
scripts/ci/postprocess-smoke-workflows.ts:120
- This comment block repeats the specific trigger terms the PR is trying to avoid. Even though it’s “just a comment”, it can still end up in Codex requests if the agent reads this file via tools. To keep the mitigation robust, prefer removing the explicit examples here as well.
// Replace the xpia.md cat command with a safe inline security policy.
// gh-aw v0.64.2+ includes xpia.md in the Codex prompt but the file contains
// specific cybersecurity attack terminology (e.g. "container escape",
// "DNS/ICMP tunneling", "port scanning", "exploit tools") that triggers
// OpenAI's cyber_policy_violation content filter, causing every model request
// to fail. This replacement expresses the same XPIA-prevention and access-
// control intent without using the triggering terms.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // introduced an xpia.md security policy that uses specific cybersecurity | ||
| // terminology (e.g. "container escape", "DNS/ICMP tunneling", "port scanning", | ||
| // "exploit tools") which triggers OpenAI's cyber_policy_violation content | ||
| // filter, causing every Codex model request to fail with: | ||
| // "This user's access to this model has been temporarily limited for | ||
| // potentially suspicious activity related to cybersecurity." | ||
| // The safe inline replacement achieves the same XPIA-prevention intent without | ||
| // using trigger terms. |
There was a problem hiding this comment.
The new explanatory comments include the exact cybersecurity attack terms that this PR notes are triggering OpenAI's cyber_policy_violation filter. If this file is ever surfaced to Codex via tool output (e.g., cat/search results), it could reintroduce the same failure mode. Consider removing the concrete examples (or replacing them with a neutral description like “specific offensive security terminology”) to avoid having those trigger strings anywhere in repo text that might be sent to the model.
This issue also appears on line 114 of the same file.
| // introduced an xpia.md security policy that uses specific cybersecurity | |
| // terminology (e.g. "container escape", "DNS/ICMP tunneling", "port scanning", | |
| // "exploit tools") which triggers OpenAI's cyber_policy_violation content | |
| // filter, causing every Codex model request to fail with: | |
| // "This user's access to this model has been temporarily limited for | |
| // potentially suspicious activity related to cybersecurity." | |
| // The safe inline replacement achieves the same XPIA-prevention intent without | |
| // using trigger terms. | |
| // introduced an xpia.md security policy that uses specific offensive security | |
| // terminology which triggers OpenAI's cyber_policy_violation content filter, | |
| // causing every Codex model request to fail with: | |
| // "This user's access to this model has been temporarily limited for | |
| // potentially suspicious activity related to cybersecurity." | |
| // The safe inline replacement achieves the same XPIA-prevention intent without | |
| // including those trigger terms. |
| const codexWorkflowPaths = [ | ||
| path.join(repoRoot, '.github/workflows/smoke-codex.lock.yml'), | ||
| path.join(repoRoot, '.github/workflows/secret-digger-codex.lock.yml'), | ||
| ]; |
There was a problem hiding this comment.
codexWorkflowPaths duplicates entries already present in workflowPaths (e.g., smoke-codex.lock.yml and secret-digger-codex.lock.yml). This duplication is easy to get out of sync when adding/removing workflows. Consider deriving codexWorkflowPaths from workflowPaths (or using a single source of truth + a predicate) so the Codex-only pass can’t accidentally miss a Codex workflow later.
…akage (#1493) * fix: always derive GH_HOST from GITHUB_SERVER_URL to prevent proxy leakage When --env-all passes through a proxy-rewritten GH_HOST (e.g. localhost:18443 from DIFC proxy), gh CLI fails with "none of the git remotes correspond to GH_HOST". Fix by always deriving GH_HOST from GITHUB_SERVER_URL (the canonical source injected by the Actions runner) instead of preserving leaked proxy values. For GHES/GHEC: overrides any leaked value with the correct hostname. For github.com: deletes any leaked GH_HOST so gh CLI uses its default. Closes #1492 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: replace xpia.md with safe inline policy to prevent cyber_policy_violation in Codex workflows (#1494) * Initial plan * fix: replace xpia.md with safe inline policy to fix cyber_policy_violation * fix: address code review feedback on xpia.md replacement logic Agent-Logs-Url: https://github.com/github/gh-aw-firewall/sessions/ecbda070-c667-4859-8ca2-8b15dee1e0a2 --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> * fix: use neutral policy language in SAFE_XPIA_CONTENT (#1495) Replace cybersecurity-triggering content (was using <security> tag, 'network access controls', 'circumventing', 'authentication tokens', 'sandboxed environment') with neutral operational policy language using a <policy> tag. These terms triggered OpenAI's cyber_policy_violation filter, causing Smoke Codex to fail with 'temporarily limited for potentially suspicious activity related to cybersecurity'. Also add xpiaSafeBlockRegex so the postprocess script is idempotent when SAFE_XPIA_CONTENT changes (can update already-replaced blocks without requiring a full recompile from .md source). Regenerated smoke-codex.lock.yml and secret-digger-codex.lock.yml. Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> * fix: clarify GH_HOST comment and fix misleading debug log message Agent-Logs-Url: https://github.com/github/gh-aw-firewall/sessions/6467baff-1e02-4ac7-ba24-841bc7081226 * chore: fix npm audit vulnerabilities (handlebars, brace-expansion) Run npm audit fix to resolve: - handlebars 4.0.0-4.7.8: critical (JS injection, prototype pollution) - brace-expansion 4.0.0-5.0.4: moderate (DoS via zero-step sequence) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: resolve CodeQL file-system-race in postprocess script Remove existsSync guard before readFileSync to eliminate TOCTOU race condition (js/file-system-race). Use try/catch on readFileSync instead, which atomically handles missing files. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
gh-aw v0.64.2introducedxpia.md, a security policy injected into every Codex agent prompt. It contains specific offensive security terminology ("container escape", "privilege escalation", "DNS/ICMP tunneling", "port scanning", "exploit tools") that triggers OpenAI'scyber_policy_violationcontent filter — causing everygpt-5.3-codexmodel request to fail immediately, before any tool calls.Changes
scripts/ci/postprocess-smoke-workflows.ts: Added a Codex-only post-processing pass that replaces thecat xpia.mdshell command in Codex lock files with a safe inline heredoc. The replacement expresses the same XPIA-prevention and sandbox-boundary intent using neutral language:Matches both path forms (
${RUNNER_TEMP}/gh-aw/prompts/xpia.mdin v0.64.2+ and/opt/gh-aw/prompts/xpia.mdin v0.58.x) and handles both redirect styles used across workflow versions. Applied only to Codex/OpenAI workflows — not Claude or Copilot.smoke-codex.lock.yml,secret-digger-codex.lock.yml: Applied the transformation directly so the fix takes effect immediately without waiting for a recompile.Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
https://api.github.com/repos/github/gh-aw-firewall/actions/runs/23712406273/artifacts/usr/bin/gh gh run download 23712406273 --name agent --dir /tmp/agent-artifact(http block)If you need me to access, download, or install something from one of these locations, you can either:
Original prompt
📱 Kick off Copilot coding agent tasks wherever you are with GitHub Mobile, available on iOS and Android.