Skip to content

fix: skip ingesting empty error/aborted assistant messages#172

Merged
jalehman merged 3 commits into
Martian-Engineering:mainfrom
craigamcw:fix/skip-empty-error-messages
Apr 9, 2026
Merged

fix: skip ingesting empty error/aborted assistant messages#172
jalehman merged 3 commits into
Martian-Engineering:mainfrom
craigamcw:fix/skip-empty-error-messages

Conversation

@craigamcw

@craigamcw craigamcw commented Mar 24, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Ingestion guard: ingestSingle now skips assistant messages where stopReason is "error" or "aborted" and content is empty ([], "", null). Messages with partial content before the error are still preserved.
  • Assembly guard (defense-in-depth): resolveMessageItem skips empty assistant messages during context assembly when both the stored content text and message_parts are empty — catching any previously-ingested empty messages without affecting tool-call-only assistant turns.

Problem

When a cloud LLM provider returns a transient 500 error, OpenClaw appends an assistant message with stopReason: "error" and empty content to the session JSONL. LCM ingests these into the database. On retry, the accumulated empty messages are assembled into context, creating a positive feedback loop:

  1. API returns 500 → empty error message appended to session
  2. LCM ingests the empty message into its database
  3. Next turn: assembler includes the empty message in context
  4. API receives increasingly large payload with many empty assistant turns
  5. API continues to fail → more empty messages ingested → repeat

In production, this manifested as a permanently broken agent where the LCM database had accumulated 175 messages (dozens empty/duplicated) in a 31KB system prompt with 32 tools — the cloud model API rejected every request with a 500. The only recovery was manual database surgery.

Test plan

  • New test: skips ingest for assistant messages with stopReason error and empty content — covers empty array, empty string, aborted, normal messages, and error-with-content (all should behave correctly)
  • All 390 existing tests pass (no regressions)
  • Verified in production: agent recovered after clearing corrupted LCM data; fix prevents recurrence

🤖 Generated with Claude Code

craigamcw and others added 3 commits March 24, 2026 13:55
When an API call returns a 500 or similar transient error, OpenClaw
appends an assistant message with stopReason "error" and empty content
to the session. LCM ingests these into the database, and on retry the
accumulated empty messages are assembled into context — creating a
positive feedback loop where each retry sends a larger, malformed
payload that continues to fail.

This commit adds two defenses:

1. engine.ts (ingestSingle): Skip assistant messages where stopReason
   is "error" or "aborted" AND content is empty ([], "", null). Messages
   with actual partial content before the error are still preserved.

2. assembler.ts (resolveMessageItem): Defense-in-depth — skip empty
   assistant messages during context assembly when both the stored
   content text and message_parts are empty. This catches any
   previously-ingested empty messages without affecting legitimate
   assistant messages that have tool calls (which have empty text
   content but non-empty parts).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Accept both stopReason and stop_reason when filtering empty assistant error/aborted turns during ingest. Extend the engine regression test to cover the snake_case field so the guard matches the finish-reason normalization already used elsewhere in the codebase.

Regeneration-Prompt: |\n  Review PR Martian-Engineering#172 after rebasing against origin/main and verify whether its empty-assistant ingest guard still misses any finish-reason spellings used elsewhere in this repository. Keep the fix narrow: preserve the PR's behavior, but make the ingest guard recognize both camelCase stopReason and snake_case stop_reason for assistant messages with empty content and error or aborted stop reasons. Add regression coverage in test/engine.test.ts for the snake_case variant and rerun the focused engine test file before pushing the result back to the contributor branch.
@jalehman jalehman merged commit 8bf5e7f into Martian-Engineering:main Apr 9, 2026
1 check passed
@jalehman

jalehman commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

Thank you!

@github-actions github-actions Bot mentioned this pull request Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants