Bug Description
🐛 Issue Summary
When using Hermes Agent for CLI chat or gateway messaging (Telegram/Discord/Slack), the agent frequently throws the error:
Error: Response truncated due to output length limit
when generating long-form responses. This truncates the output mid-stream, breaks the conversation flow, and prevents users from receiving complete, usable answers.
✅ Expected Behavior
- Long responses should be automatically split into chunks or streamed continuously to the user, rather than being hard-truncated with an error.
- The system should handle output length limits gracefully (e.g., auto-continue the response, prompt the user to request more, or adjust context window dynamically) instead of interrupting the process.
- The fix should work consistently across all use cases: regular chat, tool calls, code generation, and multi-step reasoning.
Steps to Reproduce
🔍 Minimal Reproduction Steps
- Launch the Hermes Agent CLI with
hermes chat (or start the gateway with hermes gateway start).
- Send a prompt that requires a long, detailed response, for example:
- "Please perform a full code review of this project, including architecture analysis, dependency issues, and optimization suggestions."
- "Write a complete Python automated trading strategy with backtesting, risk management, and logging modules."
- Observe the agent throw
Error: Response truncated due to output length limit and cut off the response mid-generation.
- Repeat the test with different long-form prompts to confirm the issue is consistent.
Expected Behavior
📌 Additional Context
- This error occurs reliably when the model generates long text, complex tool call chains, or full code implementations.
- I have already updated to the latest version via
hermes update and confirmed the bug persists.
- I have searched existing issues and found no open report that fully addresses this specific error scenario.
Actual Behavior
🧪 Environment Details
- Hermes Agent version:
[FILL IN YOUR VERSION, e.g., v0.4.2, run hermes --version to check]
- OS/Environment:
[FILL IN, e.g., macOS 15.3, Ubuntu 22.04, Docker]
- LLM Provider/Model:
[FILL IN, e.g., OpenRouter GPT-4o, local Ollama Llama 3 70B]
- Trigger Scenario:
[FILL IN, e.g., CLI chat, Telegram gateway, Discord bot]
Affected Component
CLI (interactive chat)
Messaging Platform (if gateway-related)
No response
Operating System
macOS
Python Version
3.11.9
Hermes Version
2.1.0
Relevant Logs / Traceback
Root Cause Analysis (optional)
No response
Proposed Fix (optional)
No response
Are you willing to submit a PR for this?
Bug Description
🐛 Issue Summary
When using Hermes Agent for CLI chat or gateway messaging (Telegram/Discord/Slack), the agent frequently throws the error:
Error: Response truncated due to output length limitwhen generating long-form responses. This truncates the output mid-stream, breaks the conversation flow, and prevents users from receiving complete, usable answers.
✅ Expected Behavior
Steps to Reproduce
🔍 Minimal Reproduction Steps
hermes chat(or start the gateway withhermes gateway start).Error: Response truncated due to output length limitand cut off the response mid-generation.Expected Behavior
📌 Additional Context
hermes updateand confirmed the bug persists.Actual Behavior
🧪 Environment Details
[FILL IN YOUR VERSION, e.g., v0.4.2, runhermes --versionto check][FILL IN, e.g., macOS 15.3, Ubuntu 22.04, Docker][FILL IN, e.g., OpenRouter GPT-4o, local Ollama Llama 3 70B][FILL IN, e.g., CLI chat, Telegram gateway, Discord bot]Affected Component
CLI (interactive chat)
Messaging Platform (if gateway-related)
No response
Operating System
macOS
Python Version
3.11.9
Hermes Version
2.1.0
Relevant Logs / Traceback
Root Cause Analysis (optional)
No response
Proposed Fix (optional)
No response
Are you willing to submit a PR for this?