-
Notifications
You must be signed in to change notification settings - Fork 341
Safe-output sanitizer strips HTML comments — no structured metadata channel for downstream workflows #18992
Description
Context
Field report from running gh-aw v0.50.7 in production in samuelkahessay/prd-to-prod. Verified still present on main at 2d91393f3 (post-v0.51.1). This is a feature/documentation gap rather than a bug — the sanitizer is working as designed.
Problem
removeXmlComments() in sanitize_content_core.cjs:322-333 strips all <!-- ... --> content from safe-output bodies. This is a security feature (prevents injection via HTML comments) and is explicitly tested at sanitize_content.test.cjs:277.
The gap: agents have no sanitization-safe way to pass structured metadata through safe outputs. Workflow authors who use HTML comments for machine-readable markers (a common pattern in GitHub Actions) find them silently stripped with no error or warning.
Sanitization path:
sanitize_content_core.cjs:322—removeXmlComments()strips all<!-- ... -->sanitize_content_core.cjs:776— called insidesanitizeContentCore()safe_output_type_validator.cjs:341,354— allsanitize: truefields route through this- Issue/PR/comment/review body fields are all marked
sanitize: true(safe_output_type_validator.test.cjs:16-110)
Reproduction
const { sanitizeContentCore } = require("./sanitize_content_core.cjs");
const input = "Review complete. <!-- [PIPELINE-VERDICT] APPROVE --> All criteria pass.";
const output = sanitizeContentCore(input, 65000);
console.log(output.includes("[PIPELINE-VERDICT]")); // false — marker silently removedSuggestion
Provide a structured metadata channel on safe-output types that is not subject to body sanitization. For example, a metadata field that accepts key-value pairs:
{
"type": "add_comment",
"body": "Review complete. All criteria pass.",
"metadata": {
"verdict": "APPROVE",
"criteria_passed": 5
}
}The metadata could be rendered in the GitHub comment body as a fenced code block (which the sanitizer allows), or stored as a separate machine-readable artifact.
Alternatively, documenting the sanitization behavior prominently in the safe-outputs reference would help workflow authors avoid this pitfall.
Impact from production usage
| Metric | Value |
|---|---|
| Frequency | Every workflow using HTML comments for structured data in safe outputs |
| Debugging cost | Medium-high — silent data loss, no error. Requires reading sanitizer source to diagnose. |
| Workaround | Switched to plaintext [PIPELINE-VERDICT] markers, which is fragile but survives sanitization |
Environment
- gh-aw version at failure: v0.50.7
- Verified on:
mainat2d91393f3 - Pipeline: Parallel agentic workflow (decompose → dispatch → implement → review → merge)
- Repo: samuelkahessay/prd-to-prod