Skip to content

Safe-output sanitizer strips HTML comments — no structured metadata channel for downstream workflows #18992

@samuelkahessay

Description

@samuelkahessay

Context

Field report from running gh-aw v0.50.7 in production in samuelkahessay/prd-to-prod. Verified still present on main at 2d91393f3 (post-v0.51.1). This is a feature/documentation gap rather than a bug — the sanitizer is working as designed.

Problem

removeXmlComments() in sanitize_content_core.cjs:322-333 strips all <!-- ... --> content from safe-output bodies. This is a security feature (prevents injection via HTML comments) and is explicitly tested at sanitize_content.test.cjs:277.

The gap: agents have no sanitization-safe way to pass structured metadata through safe outputs. Workflow authors who use HTML comments for machine-readable markers (a common pattern in GitHub Actions) find them silently stripped with no error or warning.

Sanitization path:

  • sanitize_content_core.cjs:322removeXmlComments() strips all <!-- ... -->
  • sanitize_content_core.cjs:776 — called inside sanitizeContentCore()
  • safe_output_type_validator.cjs:341,354 — all sanitize: true fields route through this
  • Issue/PR/comment/review body fields are all marked sanitize: true (safe_output_type_validator.test.cjs:16-110)

Reproduction

const { sanitizeContentCore } = require("./sanitize_content_core.cjs");
const input = "Review complete. <!-- [PIPELINE-VERDICT] APPROVE --> All criteria pass.";
const output = sanitizeContentCore(input, 65000);
console.log(output.includes("[PIPELINE-VERDICT]")); // false — marker silently removed

Suggestion

Provide a structured metadata channel on safe-output types that is not subject to body sanitization. For example, a metadata field that accepts key-value pairs:

{
  "type": "add_comment",
  "body": "Review complete. All criteria pass.",
  "metadata": {
    "verdict": "APPROVE",
    "criteria_passed": 5
  }
}

The metadata could be rendered in the GitHub comment body as a fenced code block (which the sanitizer allows), or stored as a separate machine-readable artifact.

Alternatively, documenting the sanitization behavior prominently in the safe-outputs reference would help workflow authors avoid this pitfall.

Impact from production usage

Metric Value
Frequency Every workflow using HTML comments for structured data in safe outputs
Debugging cost Medium-high — silent data loss, no error. Requires reading sanitizer source to diagnose.
Workaround Switched to plaintext [PIPELINE-VERDICT] markers, which is fragile but survives sanitization

Environment

  • gh-aw version at failure: v0.50.7
  • Verified on: main at 2d91393f3
  • Pipeline: Parallel agentic workflow (decompose → dispatch → implement → review → merge)
  • Repo: samuelkahessay/prd-to-prod

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions