Skip to content

fix: backfill codex stream output from output_item.done events#5689

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-db0c54fb
Apr 7, 2026
Merged

fix: backfill codex stream output from output_item.done events#5689
teknium1 merged 1 commit into
mainfrom
hermes/hermes-db0c54fb

Conversation

@teknium1

@teknium1 teknium1 commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

Salvages the core fix from PR #5673 (egerev) onto current main.

Problem

The chatgpt.com/backend-api/codex endpoint streams valid output items via response.output_item.done events, but the OpenAI SDK's get_final_response() returns an empty output list. This caused every Codex response to be rejected as invalid with "response.output is empty".

Fix

  • Collect response.output_item.done events during streaming
  • After get_final_response(), backfill response.output from collected items when empty
  • Fall back to synthesizing from text deltas when no done events were received
  • Move synthesis from the validation loop (fix: codex OAuth credential pool disconnect + expired token import #5681, too late) into _run_codex_stream() (before the response leaves the streaming function)
  • Simplify validation to just log diagnostics since recovery now happens upstream

Credit

Core approach from PR #5673 by @egerev. Closes #5673.

Test plan

python -m pytest tests/test_run_agent_codex_responses.py -n0 -q — 33 passed

Salvages the core fix from PR #5673 (egerev) onto current main.

The chatgpt.com/backend-api/codex endpoint streams valid output items
via response.output_item.done events, but the OpenAI SDK's
get_final_response() returns an empty output list. This caused every
Codex response to be rejected as invalid.

Fix: collect output_item.done events during streaming and backfill
response.output when get_final_response() returns empty. Falls back
to synthesizing from text deltas when no done events were received.

Also moves the synthesis logic from the validation loop (too late, from
#5681) into _run_codex_stream() (before the response leaves the
streaming function), and simplifies the validation to just log
diagnostics since recovery now happens upstream.

Co-authored-by: Egor <egerev@users.noreply.github.com>
@teknium1 teknium1 merged commit 0e336b0 into main Apr 7, 2026
5 of 6 checks passed
Tommyeds pushed a commit to Tommyeds/hermes-agent that referenced this pull request Apr 12, 2026
…esearch#5689)

Salvages the core fix from PR NousResearch#5673 (egerev) onto current main.

The chatgpt.com/backend-api/codex endpoint streams valid output items
via response.output_item.done events, but the OpenAI SDK's
get_final_response() returns an empty output list. This caused every
Codex response to be rejected as invalid.

Fix: collect output_item.done events during streaming and backfill
response.output when get_final_response() returns empty. Falls back
to synthesizing from text deltas when no done events were received.

Also moves the synthesis logic from the validation loop (too late, from
NousResearch#5681) into _run_codex_stream() (before the response leaves the
streaming function), and simplifies the validation to just log
diagnostics since recovery now happens upstream.

Co-authored-by: Egor <egerev@users.noreply.github.com>
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026
…esearch#5689)

Salvages the core fix from PR NousResearch#5673 (egerev) onto current main.

The chatgpt.com/backend-api/codex endpoint streams valid output items
via response.output_item.done events, but the OpenAI SDK's
get_final_response() returns an empty output list. This caused every
Codex response to be rejected as invalid.

Fix: collect output_item.done events during streaming and backfill
response.output when get_final_response() returns empty. Falls back
to synthesizing from text deltas when no done events were received.

Also moves the synthesis logic from the validation loop (too late, from
NousResearch#5681) into _run_codex_stream() (before the response leaves the
streaming function), and simplifies the validation to just log
diagnostics since recovery now happens upstream.

Co-authored-by: Egor <egerev@users.noreply.github.com>
atlas243 pushed a commit to atlas243/hermes-agent that referenced this pull request Apr 28, 2026
…tomizations

Brings in 30+ commits of upstream Hermes changes (including the codex
output[] backfill fix from NousResearch#5689 / commit 0e336b0) AND closes the
loop on the branch-safe update flow that left this branch silently
behind upstream for 11+ days.

Symptom that triggered this work: every gateway turn was failing with
"Invalid API response (attempt 1/3): response.output is empty" on
gpt-5.4 via openai-codex.  The fix landed upstream 2026-04-06; without
this merge it never reached the customizations branch because
``hermes update`` only updated origin/main and switched back to
blaize-customizations without merging main into it.

== Conflict resolution highlights ==

run_agent.py: kept main's _touch_activity(desc) API + main's codex
backfill in _run_codex_stream; preserved HEAD's _reasoning_deltas_fired
reset and the public touch_activity() wrapper for delegate_tool /
gateway/run.py callers (now delegates to _touch_activity for
description sync).  Guarded the cached-agent touch_activity() reset
with hasattr() so test mocks don't break.

hermes_cli/config.py: bumped _config_version 18 → 19 and added HEAD's
progress-aware-timeout migration as a new 18 → 19 step (idempotent
via "if 'timeout' not in config" guard, so users on either v12 or v18
land in a correct state).

hermes_cli/main.py: kept HEAD's branch-safety guards
(should_restore_original_branch, should_auto_restart_gateway) and
swapped in main's improved multi-profile gateway restart logic
(supports_systemd_services, find_gateway_pids, retry-on-die).

gateway/run.py: kept HEAD's per-channel overrides + two-threshold
progress-aware timeout monitor (CLAUDE.md documents this as the
intentional design); added main's _notify_long_running periodic
"Still working" notifications and main's service_tier /
request_overrides plumbing on cached agent reuse.

tools/cronjob_tools.py: restored both 'reason' (HEAD) and 'script'
(main) schema entries that the auto-merger had collided.  Restored
both timeout_seconds (HEAD) and script (main) function args.

hermes_cli/commands.py: kept HEAD's priority_skills reordering for
Telegram menus while taking main's _collect_gateway_skill_entries
refactor (priority_skills now applied as a post-processing step on
the helper's output).  Kept both new CommandDefs (restart-gateway
from HEAD, debug from main).

cron/scheduler.py: took main's inactivity-based timeout structure but
restored HEAD's per-job timeout_seconds lookup
(job.get("timeout_seconds")) so per-job overrides still work.

gateway/platforms/telegram.py: kept HEAD's _menu_config_mtime AND
main's _model_picker_state, _approval_state, plus all of main's
new helper methods.

== Update flow fix (prevents future drift) ==

hermes_cli/main.py cmd_update: after restoring the working tree to
the customizations branch, run ``git merge --no-edit origin/main``
into the customizations branch so it actually catches up to main.
On clean merge, log success and proceed.  On conflict, ``git merge
--abort`` so the working tree stays clean, surface the conflict to
the user, and force should_auto_restart_gateway = False.  Applies
in both the up-to-date-already path and the new-commits-pulled path.

Adds two regression tests in TestUpdateMergesMainIntoCustomizations:
  - test_clean_merge_runs_after_branch_restore: verifies the merge
    is invoked when on the customizations branch
  - test_conflict_aborts_merge_and_blocks_auto_restart: verifies
    merge --abort runs on conflict and launchd_restart is skipped

== Test fixes ==

tests/hermes_cli/test_config.py: bumped expected config version 18 → 19
tests/tools/test_browser_camofox_state.py: same bump

== Pre-existing upstream test failures (unrelated to this merge) ==

Verified failing on clean origin/main:
  - test_wsl_with_systemd: macOS lacks systemctl
  - test_concurrent_inserts_settle_at_cap: ~70s slow concurrent test
  - test_file_staleness::test_warning_when_file_modified_externally
  - test_file_staleness::test_patch_warns_on_stale_file
    (macOS treats /var/folders as a sensitive system path)
  - test_transcription::test_explicit_local_no_cloud_fallback
  - test_transcription::test_local_nothing_available

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…esearch#5689)

Salvages the core fix from PR NousResearch#5673 (egerev) onto current main.

The chatgpt.com/backend-api/codex endpoint streams valid output items
via response.output_item.done events, but the OpenAI SDK's
get_final_response() returns an empty output list. This caused every
Codex response to be rejected as invalid.

Fix: collect output_item.done events during streaming and backfill
response.output when get_final_response() returns empty. Falls back
to synthesizing from text deltas when no done events were received.

Also moves the synthesis logic from the validation loop (too late, from
NousResearch#5681) into _run_codex_stream() (before the response leaves the
streaming function), and simplifies the validation to just log
diagnostics since recovery now happens upstream.

Co-authored-by: Egor <egerev@users.noreply.github.com>
olympus-terminal pushed a commit to olympus-terminal/hermes-agent that referenced this pull request May 16, 2026
…esearch#5689)

Salvages the core fix from PR NousResearch#5673 (egerev) onto current main.

The chatgpt.com/backend-api/codex endpoint streams valid output items
via response.output_item.done events, but the OpenAI SDK's
get_final_response() returns an empty output list. This caused every
Codex response to be rejected as invalid.

Fix: collect output_item.done events during streaming and backfill
response.output when get_final_response() returns empty. Falls back
to synthesizing from text deltas when no done events were received.

Also moves the synthesis logic from the validation loop (too late, from
NousResearch#5681) into _run_codex_stream() (before the response leaves the
streaming function), and simplifies the validation to just log
diagnostics since recovery now happens upstream.

Co-authored-by: Egor <egerev@users.noreply.github.com>
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…esearch#5689)

Salvages the core fix from PR NousResearch#5673 (egerev) onto current main.

The chatgpt.com/backend-api/codex endpoint streams valid output items
via response.output_item.done events, but the OpenAI SDK's
get_final_response() returns an empty output list. This caused every
Codex response to be rejected as invalid.

Fix: collect output_item.done events during streaming and backfill
response.output when get_final_response() returns empty. Falls back
to synthesizing from text deltas when no done events were received.

Also moves the synthesis logic from the validation loop (too late, from
NousResearch#5681) into _run_codex_stream() (before the response leaves the
streaming function), and simplifies the validation to just log
diagnostics since recovery now happens upstream.

Co-authored-by: Egor <egerev@users.noreply.github.com>
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…esearch#5689)

Salvages the core fix from PR NousResearch#5673 (egerev) onto current main.

The chatgpt.com/backend-api/codex endpoint streams valid output items
via response.output_item.done events, but the OpenAI SDK's
get_final_response() returns an empty output list. This caused every
Codex response to be rejected as invalid.

Fix: collect output_item.done events during streaming and backfill
response.output when get_final_response() returns empty. Falls back
to synthesizing from text deltas when no done events were received.

Also moves the synthesis logic from the validation loop (too late, from
NousResearch#5681) into _run_codex_stream() (before the response leaves the
streaming function), and simplifies the validation to just log
diagnostics since recovery now happens upstream.

Co-authored-by: Egor <egerev@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant