fix(security): wrap cron script output in XML tags to prevent prompt injection#5116
fix(security): wrap cron script output in XML tags to prevent prompt injection#5116maymuneth wants to merge 2 commits into
Conversation
|
Direction is right — XML tags are a modestly better framing than markdown code blocks for LLM prompt injection mitigation (Anthropic recommends this pattern in their prompt engineering docs). But the fix is incomplete. The escape-sequence problem: Neither markdown code blocks nor XML tags provide real containment if the wrapped content can emit the closing delimiter. Consider a script whose output includes: When embedded into the prompt template: The injected Two fixes to pair with this PR: 1. Escape the closing tag in script output: safe_output = script_output.replace("</data>", "</data\u200b>") # zero-width spaceOr use an unlikely-to-collide marker: WRAPPER_TAG = "script_output_7f3a" # unique enough that scripts won't emit it
f"<{WRAPPER_TAG}>\n{script_output}\n</{WRAPPER_TAG}>\n\n"2. Truncate the script output to a reasonable size (e.g. 10k chars) before embedding. Long outputs give attackers more room to inject, and operators rarely need full stdout in the prompt anyway. Cron jobs that need huge output should log to disk and pass a summary. Bonus concern — the error path is particularly dangerous: prompt = (
"## Script Error\n"
"The data-collection script failed. Report this to the user.\n\n"
f"<data>\n{script_output}\n</data>\n\n"
...
)The template instructs the model to report the error content to the user. A failing script whose stderr contains injected instructions gets those instructions surfaced to the user (possibly with formatting that looks like a legitimate assistant message). Whoever controls the cron script — or can trigger a failure with attacker-controlled stderr — effectively controls what the user sees in the notification. Consider logging script errors separately and showing the user a generic "script failed, see logs" message instead of embedding raw stderr in the LLM prompt at all. TL;DR: LGTM as one step, but ship it alongside escape-tag substitution + output length limit, or merge into a more comprehensive cron prompt-injection hardening PR. |
Inside a markdown code block, LLMs can still interpret such content
as instructions — especially when the code block contains natural
language text rather than code.
Fix
Wrap script output in XML
<data>tags instead of markdown codeblocks. XML tags signal to the LLM that the content is structured
data to be processed, not instructions to be followed — a standard
prompt injection mitigation.
Applied to both the success path (line 300) and the error path
(line 312).
Type of Change
Checklist