Skip to content

[Bug]: pi-trajectory-flush timeout warning lacks queued writer state #82961

@galiniliev

Description

@galiniliev

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

pi-trajectory-flush cleanup can still time out at run end, and the timeout warning does not identify whether the flush is waiting on queued writer work, event-loop yield, or file append IO.

Steps to reproduce

NOT_ENOUGH_INFO

Expected behavior

When trajectory flush cleanup times out, the warning should include enough bounded, non-secret state to show what the flush is waiting on.

Actual behavior

The observed logs only report the cleanup timeout envelope:

agent cleanup timed out: runId=[redacted run id] sessionId=[redacted session id] step=pi-trajectory-flush timeoutMs=10000

The warning does not include queued write count, queued bytes, active writer operation, or append size, so the next investigation cannot distinguish pending event-loop yield from file append IO.

OpenClaw version

NOT_ENOUGH_INFO

Operating system

NOT_ENOUGH_INFO

Install method

NOT_ENOUGH_INFO

Model

NOT_ENOUGH_INFO

Provider / routing chain

NOT_ENOUGH_INFO

Additional provider/model setup details

NOT_ENOUGH_INFO

Logs, screenshots, and evidence

Trace/proof:
- Representative log text:
  "agent cleanup timed out: runId=[redacted run id] sessionId=[redacted session id] step=pi-trajectory-flush timeoutMs=10000"
- Observed count: 12 matching log lines in one gateway log snapshot.
- The local evidence points at src/agents/run-cleanup-timeout.ts and src/agents/pi-embedded-runner/run/attempt.ts wrapping cleanup step "pi-trajectory-flush".

Impact and severity

Affected: embedded agent runs with trajectory capture enabled.
Severity: Medium.
Frequency: 12 matching timeout lines in the provided log snapshot.
Consequence: trajectory data may not flush promptly at run end, cleanup work can continue after the caller has moved on, and the warning lacks the state needed to triage the stalled flush path.

Additional information

Current source already has the trajectory-specific timeout override from #81622, but the timeout warning still lacks queue/IO state for the trajectory writer. The fix should preserve the existing cleanup timeout behavior and add bounded diagnostics only.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Normal backlog priority with limited blast radius.bugSomething isn't workingclawsweeper:linked-pr-openClawSweeper found an open linked pull request for this issue.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.maintainerMaintainer-authored PR

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions