Skip to content

Fix silent success for non-deliverable Bedrock Telegram turns#82905

Merged
joshavant merged 3 commits into
mainfrom
fix/82394-bedrock-empty-telegram
May 17, 2026
Merged

Fix silent success for non-deliverable Bedrock Telegram turns#82905
joshavant merged 3 commits into
mainfrom
fix/82394-bedrock-empty-telegram

Conversation

@joshavant

@joshavant joshavant commented May 17, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Treat terminal attempts with no deliverable/progress evidence as trajectory errors instead of silent success.
  • Extend Bedrock empty/reasoning-only retry eligibility for the reported model path.
  • Preserve legitimate non-text delivery evidence, including committed messaging sends, media-only replies, heartbeat responses, cron progress, yields, client tool calls, and tool errors.
  • Carry the latest-main Telegram message-cache type/lint repair needed to keep CI green after rebasing over fix(telegram): preserve bot reply-chain context #82863.

Verification

  • .agents/skills/codex-review/scripts/codex-review --mode local (clean; focused Vitest coverage passed: 7 files, 309 tests)
  • node scripts/run-vitest.mjs src/agents/pi-embedded-runner/run/attempt-trajectory-status.test.ts src/agents/pi-embedded-runner/run.incomplete-turn.test.ts src/trajectory/export.test.ts src/trajectory/metadata.test.ts src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.subscribeembeddedpisession.test.ts (passed: 8 files, 312 tests)
  • node scripts/run-tsgo.mjs -p tsconfig.extensions.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/extensions-pr82905.tsbuildinfo
  • node scripts/run-bundled-extension-oxlint.mjs
  • node scripts/check-extension-package-tsc-boundary.mjs --mode=compile
  • AWS Crabbox run_943fee6e6d01 / cbx_9bbc8d8db38c: original Telegram + Bedrock repro observed a Telegram reply after the patch
  • AWS Crabbox run_ca257471b04a / cbx_1081c2d5346a: pnpm check:changed passed after Node 22/Corepack setup
  • GitHub CI on ba1b76ccd933a6f379e5d64d98fbef5b8f02c28b: all checks green/skipped except Real behavior proof

Real behavior proof
Behavior addressed: A Bedrock reasoning/empty terminal turn for Telegram could complete with no Telegram reply while trajectory artifacts still reported success.
Real environment tested: AWS Crabbox direct provider cbx_9bbc8d8db38c, Telegram live bot group via Convex telegram credential, Bedrock model amazon-bedrock/openai.gpt-oss-120b-1:0.
Exact steps or command run after this patch: Installed dependencies, built the repo, started the QA live Telegram gateway, sent the original prompt from #82394, and waited for the SUT Telegram bot reply.
Evidence after fix: Crabbox run run_943fee6e6d01; sent Telegram message 8958; observed SUT reply message 8959.
Observed result after fix: replyObserved: true, observedSutMessageCount: 1, command exit 0.
What was not tested: Full repository test suite; the changed gate, focused tests, and PR CI were run instead.

Fixes #82394

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: L maintainer Maintainer-authored PR labels May 17, 2026
@clawsweeper

clawsweeper Bot commented May 17, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Summary
The PR adds non-deliverable terminal-turn classification, Bedrock empty/reasoning-only retry eligibility, Telegram block-reply delivery counting, trajectory terminalError export, focused tests, and a changelog entry.

Reproducibility: yes. source-reproducible: current main can mark a no-delivery terminal attempt as trajectory success, and the retry allowlist omits bedrock-converse-stream. The linked issue gives the concrete Telegram + Bedrock prompt path, but I did not rerun it here.

Real behavior proof
Sufficient (live_output): The PR body includes after-fix real behavior proof from an AWS Crabbox live Telegram bot group using the reported Bedrock model, with observed SUT reply and run IDs.

Next step before merge
No ClawSweeper repair is needed; this recent protected PR should receive normal maintainer review and then be landed or revised by a human.

Security
Cleared: The diff does not change dependencies, CI, install scripts, secrets handling, permissions, or other security-sensitive execution surfaces.

Review details

Best possible solution:

Land a maintainer-reviewed version that keeps the retry and trajectory-status logic centralized, preserves non-text delivery evidence, and retains the focused tests plus recorded live Telegram/Bedrock proof.

Do we have a high-confidence way to reproduce the issue?

Yes, source-reproducible: current main can mark a no-delivery terminal attempt as trajectory success, and the retry allowlist omits bedrock-converse-stream. The linked issue gives the concrete Telegram + Bedrock prompt path, but I did not rerun it here.

Is this the best way to solve the issue?

Yes, the proposed direction is a narrow maintainable fix: classify terminal delivery status before trajectory recording and extend the existing non-visible retry guard to Bedrock Converse. The remaining step is maintainer review/landing, not an automated repair.

What I checked:

  • Protected label: The live PR metadata lists labels agents, maintainer, and size: L; repository policy keeps maintainer-labeled PRs open for explicit human handling. (12d2bc6a6412)
  • Current-main failure shape: Current main records trace.artifacts and session.ended as success whenever there is no prompt error, abort, or timeout, so a terminal no-delivery assistant turn can still be marked successful. (src/agents/pi-embedded-runner/run/attempt.ts:4280, ee10fe17f063)
  • Current-main Bedrock retry gap: The non-visible retry guard on main allows openai-completions, anthropic-messages, strict-agentic/Gemini/Ollama paths, but not bedrock-converse-stream. (src/agents/pi-embedded-runner/run/incomplete-turn.ts:623, ee10fe17f063)
  • PR implementation: The patch adds resolveAttemptTrajectoryTerminal, wires terminal status and terminalError into trajectory events, and extends Bedrock Converse retry eligibility while preserving delivery evidence checks. (src/agents/pi-embedded-runner/run/attempt-trajectory-status.ts:1, 12d2bc6a6412)
  • PR proof and tests: The PR body reports focused Vitest coverage, a successful pnpm check:changed Crabbox run, and AWS Crabbox live Telegram + Bedrock proof where the original prompt produced a Telegram reply after the patch. (12d2bc6a6412)
  • Maintainer note context: The Telegram maintainer note says Telegram behavior PRs that touch transport, streaming, topics, callbacks, authorization, or reply context need real Telegram proof; this PR body supplies a live Telegram proof path. (.agents/maintainer-notes/telegram.md:1, ee10fe17f063)

Likely related people:

  • steipete: Recent GitHub path history shows repeated work on incomplete-turn retry behavior, Telegram/media delivery evidence, and trajectory/runtime state files central to this PR. (role: feature-history owner; confidence: high; commits: be166b9ae48d, 55322d73012e, cf79689ca1bd; files: src/agents/pi-embedded-runner/run/incomplete-turn.ts, src/agents/pi-embedded-subscribe.ts, src/trajectory/metadata.ts)
  • joshavant: Besides authoring this PR, prior merged history shows recent work in the central embedded runner boundary touched by this fix. (role: recent adjacent contributor; confidence: medium; commits: fba250d4a26c; files: src/agents/pi-embedded-runner/run/attempt.ts)
  • vincentkoc: Local blame for the current trajectory status block and retry helper lines points at a recent current-main commit that touched the relevant runner files. (role: recent current-main contributor; confidence: medium; commits: 817f0cd6c838, 310c8530eb81; files: src/agents/pi-embedded-runner/run/attempt.ts, src/agents/pi-embedded-runner/run/incomplete-turn.ts)

Remaining risk / open question:

  • I did not rerun the focused Vitest suite or the live Crabbox Telegram/Bedrock scenario in this read-only review; I relied on source inspection plus the PR body's run IDs and observed output.

Codex review notes: model gpt-5.5, reasoning high; reviewed against ee10fe17f063.

@joshavant joshavant force-pushed the fix/82394-bedrock-empty-telegram branch from edc853b to 12d2bc6 Compare May 17, 2026 04:46
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. mantis: telegram-visible-proof Mantis should capture Telegram visible proof. labels May 17, 2026
@openclaw-barnacle openclaw-barnacle Bot added the channel: telegram Channel integration: telegram label May 17, 2026
@joshavant joshavant merged commit 422a137 into main May 17, 2026
120 of 126 checks passed
@joshavant joshavant deleted the fix/82394-bedrock-empty-telegram branch May 17, 2026 04:57
galiniliev pushed a commit to galiniliev/openclaw that referenced this pull request May 20, 2026
…aw#82905)

* fix: handle non-deliverable terminal turns

* chore: add changelog for non-deliverable turns

* fix: align telegram message cache types
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
…aw#82905)

* fix: handle non-deliverable terminal turns

* chore: add changelog for non-deliverable turns

* fix: align telegram message cache types
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
…aw#82905)

* fix: handle non-deliverable terminal turns

* chore: add changelog for non-deliverable turns

* fix: align telegram message cache types
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
…aw#82905)

* fix: handle non-deliverable terminal turns

* chore: add changelog for non-deliverable turns

* fix: align telegram message cache types
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
…aw#82905)

* fix: handle non-deliverable terminal turns

* chore: add changelog for non-deliverable turns

* fix: align telegram message cache types
galiniliev pushed a commit to galiniliev/openclaw that referenced this pull request May 25, 2026
…aw#82905)

* fix: handle non-deliverable terminal turns

* chore: add changelog for non-deliverable turns

* fix: align telegram message cache types
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
…aw#82905)

* fix: handle non-deliverable terminal turns

* chore: add changelog for non-deliverable turns

* fix: align telegram message cache types
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
…aw#82905)

* fix: handle non-deliverable terminal turns

* chore: add changelog for non-deliverable turns

* fix: align telegram message cache types
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
…aw#82905)

* fix: handle non-deliverable terminal turns

* chore: add changelog for non-deliverable turns

* fix: align telegram message cache types
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
…aw#82905)

* fix: handle non-deliverable terminal turns

* chore: add changelog for non-deliverable turns

* fix: align telegram message cache types
SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026
…aw#82905)

* fix: handle non-deliverable terminal turns

* chore: add changelog for non-deliverable turns

* fix: align telegram message cache types
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
…aw#82905)

* fix: handle non-deliverable terminal turns

* chore: add changelog for non-deliverable turns

* fix: align telegram message cache types
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling channel: telegram Channel integration: telegram maintainer Maintainer-authored PR mantis: telegram-visible-proof Mantis should capture Telegram visible proof. proof: sufficient ClawSweeper judged the real behavior proof convincing. size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Empty assistant turn (thinking-only, no text, no toolCall) recorded as finalStatus="success" → user sees generic "Something went wrong" error

1 participant