Problem
The has_incomplete_scratchpad check in trajectory.py uses a simple string-contains check for <REASONING_SCRATCHPAD>. This causes false positives in several scenarios:
-
Context contamination: When the conversation context contains code examples, error logs, or tool outputs that include the literal string <REASONING_SCRATCHPAD> (e.g., from searching source code), the model may reference this text in its response, triggering the detector.
-
Retry loop: After the first false positive triggers a retry, the context does not change, so the model produces the same output, triggering another false positive. After verification_gate_retries attempts, the entire response is dropped.
-
No XML structure validation: The check only looks for the opening tag, not whether it is actually an incomplete XML block at the root level of the response.
Suggested Fix
Strip code blocks before checking, preventing false positives from quoted source code:
def has_incomplete_scratchpad(text: str) -> bool:
import re
cleaned = re.sub(r'```.*?```', '', text, flags=re.DOTALL)
cleaned = re.sub(r'`[^`]*`', '', cleaned)
opens = len(re.findall(r'<REASONING_SCRATCHPAD>', cleaned))
closes = len(re.findall(r'</REASONING_SCRATCHPAD>', cleaned))
return opens > closes
Environment
- Hermes Agent (latest main)
- Providers affected: zai (glm-5.1), MiniMax M2.7
Impact
Medium - causes intermittent response failures when context contains the literal string.
Problem
The
has_incomplete_scratchpadcheck intrajectory.pyuses a simple string-contains check for<REASONING_SCRATCHPAD>. This causes false positives in several scenarios:Context contamination: When the conversation context contains code examples, error logs, or tool outputs that include the literal string
<REASONING_SCRATCHPAD>(e.g., from searching source code), the model may reference this text in its response, triggering the detector.Retry loop: After the first false positive triggers a retry, the context does not change, so the model produces the same output, triggering another false positive. After
verification_gate_retriesattempts, the entire response is dropped.No XML structure validation: The check only looks for the opening tag, not whether it is actually an incomplete XML block at the root level of the response.
Suggested Fix
Strip code blocks before checking, preventing false positives from quoted source code:
Environment
Impact
Medium - causes intermittent response failures when context contains the literal string.