Problem or Use Case
When Hermes is used with provider=custom and api_mode=codex_responses against a codex-lb /v1/responses backend, the streaming path can fail if the backend emits codex.rate_limits before response.created.
Observed error:
No response received: Expected to have received response.created before codex.rate_limits
In regression testing, the core RuntimeError is:
Expected to have received `response.created` before `codex.rate_limits`
Hermes already handles one Responses streaming edge case by falling back when response.completed is missing. However, it does not currently handle another real-world compatibility case: a provider-specific prelude event arriving before response.created.
This makes the current stream event-order assumption too strict for some OpenAI-compatible backends such as codex-lb, causing the whole request to fail even though a valid final response could still be obtained through a fallback path.
Proposed Solution
Add a fallback branch in _run_codex_stream(...) for errors matching:
Expected to have received `response.created` before ...
Instead of aborting the conversation, Hermes should fall back to a non-streaming responses.create(...) call without stream=True.
Suggested behavior:
- keep the existing fallback for missing
response.completed
- add detection for
response.created / prelude ordering mismatches
- fall back to non-stream
responses.create(...) for this class of stream-protocol mismatch
- add a regression test covering the
codex.rate_limits prelude case
This approach is low-risk, preserves existing behavior, and improves compatibility with OpenAI-compatible backends that emit provider-specific prelude events before response.created.
Alternatives Considered
No response
Feature Type
Performance / reliability
Scope
None
Contribution
Debug Report (optional)
Problem or Use Case
When Hermes is used with
provider=customandapi_mode=codex_responsesagainst a codex-lb/v1/responsesbackend, the streaming path can fail if the backend emitscodex.rate_limitsbeforeresponse.created.Observed error:
In regression testing, the core RuntimeError is:
Hermes already handles one Responses streaming edge case by falling back when
response.completedis missing. However, it does not currently handle another real-world compatibility case: a provider-specific prelude event arriving beforeresponse.created.This makes the current stream event-order assumption too strict for some OpenAI-compatible backends such as codex-lb, causing the whole request to fail even though a valid final response could still be obtained through a fallback path.
Proposed Solution
Add a fallback branch in
_run_codex_stream(...)for errors matching:Instead of aborting the conversation, Hermes should fall back to a non-streaming
responses.create(...)call withoutstream=True.Suggested behavior:
response.completedresponse.created/ prelude ordering mismatchesresponses.create(...)for this class of stream-protocol mismatchcodex.rate_limitsprelude caseThis approach is low-risk, preserves existing behavior, and improves compatibility with OpenAI-compatible backends that emit provider-specific prelude events before
response.created.Alternatives Considered
No response
Feature Type
Performance / reliability
Scope
None
Contribution
Debug Report (optional)