Skip to content

Need overflow diagnostics showing top LCM prompt contributors and duplicate clusters #495

@Adam-Researchh

Description

@Adam-Researchh

Summary

When LCM assembly causes a channel/user-facing session to fail with provider context_length_exceeded, the diagnostics currently make it difficult to quickly identify the real contributors. Operators need a compact, depersonalized “why is this prompt huge?” report: conversation token totals, duplicate counts, top message/summary contributors, raw vs summary budget split, and whether bootstrap/reconcile just imported messages.

This is a usability/operability issue surfaced while debugging an apparent channel outage that was actually LCM assembly failure.

Environment

  • OpenClaw: 2026.4.22
  • lossless-claw: 0.9.2
  • Active channel session and TUI session
  • Long-lived main conversation with duplicate large messages

Current useful logs

The existing logs are helpful but not enough on their own:

[lcm] assemble: done conversation=146 ... contextItems=901 hasSummaryItems=true inputMessages=805 outputMessages=828 tokenBudget=500000 estimatedTokens=499239
[lcm] assemble-debug ... evictableCount=814 ... freshTailSegmentCount=32 ... tailTokens=7965 ... evictableTotalTokens=563438 ... removedToolUseBlocks=27 touchedAssistantMessages=27

They do not directly say which messages/summaries made the assembled prompt huge. A manual SQLite query was needed to discover:

- one archived conversation had 6,776 messages and ~4.17M stored tokens
- three repeated bootstrap messages were ~59,845 tokens each
- repeated media messages were ~34k-45k tokens each
- large tool/config/API outputs were ~23k-26k tokens each

Expected behavior

A built-in command or overflow diagnostic should report top contributors automatically, without requiring manual SQL.

Actual behavior

The operator had to inspect the SQLite DB manually to determine that duplicate bootstrap/media/tool messages were the root cause.

Impact

  • Slow incident response.
  • Operators may incorrectly blame the channel transport, provider, or model.
  • Harder to file precise bug reports.
  • Harder to decide whether to rotate/archive a session, repair summaries, or externalize large messages.

Suggested diagnostic output

Add something like /lossless explain-overflow or enrich existing overflow logs with:

  • active conversation id/session key (redacted)
  • stored message count and stored token total
  • assembled raw messages vs summaries counts/tokens
  • top 20 assembled contributors by token count
  • top 20 stored contributors by token count
  • duplicate identity/content clusters above threshold
  • recently imported count from bootstrap/reconcile
  • pending compaction debt status and whether current runtime can execute it
  • recommended safe next action: rotate session, doctor repair, externalize large messages, reduce budget, etc.

Example shape:

LCM overflow diagnosis:
conversation=146 stored=6776 msgs / 4.17M tokens
last bootstrap imported=788 messages
assembly=901 items / 499k estimated tokens / 500k budget
largest contributors:
  msg seq 6258 user 59,845 tokens duplicate_of seq 5046
  msg seq 6789 user 45,472 tokens duplicate_of seq 5961
  summary sum_x ... 9,657 tokens fallback-marker
recommended: quarantine active conversation and run duplicate repair offline

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions