Skip to content

[Bug]: Azure OpenAI security filters block some Hermes requests due to trigger word #6576

@olafgeibig

Description

@olafgeibig

Bug Description

When using Hermes Agent with Azure Open AI content filters “Default” and “DefaultV2”, requests can be rejected because the assembled prompt contains bracketed meta-instructions that Azure heuristics treat as prompt-injection attempts.

In particular, Hermes injects bracketed markers like:
[SYSTEM: The user has invoked the "<skill>" skill, indicating they want you to follow its instructions. The full skill content is loaded below.]

These appear in:

  • hermes-agent/agent/skill_commands.py
  • hermes-agent/cron/scheduler.py

Steps to Reproduce

  • Configure Azure OpenAI models as a custom model using the new V1 API and the models have standard ootb filters Default or DefaultV1
  • Use a skill by doing a slash command

Expected Behavior

The API request is successful

Actual Behavior

Request fails

Affected Component

Agent Core (conversation loop, context compression, memory)

Messaging Platform (if gateway-related)

No response

Operating System

MacOS 26.4

Python Version

3.11.15

Hermes Version

0.8.0

Relevant Logs / Traceback

⚠️  API call failed (attempt 1/3): BadRequestError [HTTP 400]
   🔌 Provider: custom  Model: gpt-5.2
   🌐 Endpoint: https://redacted.openai.azure.com/openai/v1
   📝 Error: HTTP 400: The response was filtered due to the prompt triggering Azure OpenAI's content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766
   📋 Details: {'message': "The response was filtered due to the prompt triggering Azure OpenAI's content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", 'type': None, 'param'
⏳ Retrying in 2.422021883499886s (attempt 1/3)...

Root Cause Analysis (optional)

prompt triggers the security filter

Proposed Fix (optional)

For now I developed a patch that patches the hermes source to replace [SYSTEM with [IMPORTANT and then the problem is gone.

I won't submit a PR because I don't want to mess with your prompts

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions