Skip to content

[NemoClaw][macOS][Agent&Skills] Agent abandons multi-step task and starts new conversation after partial response — task context lost mid-run #2620

@caroline-xuan

Description

@caroline-xuan

Description

Issue Summary

During a multi-step task, the agent abandons its in-progress plan partway through and emits a brand-new conversational turn asking the user what they want — completely forgetting the task context. Distinct from a simple hang or truncation: the agent actively resets to a new conversation while it still has the data and tools to finish.

Environment

  • Platform: macOS
  • NemoClaw: latest local install (verified 2026-04-28)
  • OpenShell: bundled
  • Model: moonshotai/kimi-k2.5 (NVIDIA Endpoints variant)
  • Surface: OpenClaw Web UI at 127.0.0.1:18789/chat?session=

Steps to Reproduce

  1. Fresh install + nemoclaw onboard.
  2. Open the dashboard URL, start a fresh chat session.
  3. Send this prompt:
    Run these 3 commands in the sandbox shell, one at a time, and after each command explain what its output means in 2 sentences:
      1. hostname
      2. date
      3. uptime
    

    When done, summarize all three findings in a single paragraph that ties them together.

  4. Watch the agent run.

Expected Behavior

  • 3 exec calls in order: hostname → date → uptime.
  • 2-sentence explanation after each.
  • 1 final summary paragraph.
  • Total: 3 tool calls + 4 text segments.

Actual Behavior (timeline)

  • 8:40 PM — hostname exec ✓; full 2-sentence explanation rendered correctly.
  • 8:41–8:42 PM — agent emits several Tool not found / Unknown tool calls (separate UX bug, see Related).
  • 8:42 PM — date exec ✓; explanation begins streaming but is truncated mid-sentence at "...you sent the initial" (no period, no continuation).
  • 8:44 PM — agent emits a brand-new turn:
    Hey!  You had me in the middle of running those three commands — I got through hostname and date, but hadn't run uptime yet. Want me to finish that up and give
      you the summary? Or if you're onto something else, what's up?
  • 8:46 PM — user types "hi" to test; agent responds "Hey! What's up?" — the original 3-command task context is fully gone.
  • uptime is never executed; the summary paragraph never appears.

Impact

  • Multi-step tasks become unreliable: agent may abandon any in-progress plan and ask the user to restart, even when it has the data and tools to finish.
  • Forces the user to manually resume / re-prompt, losing time and context.
  • Combined with the related streaming-truncation bug, makes long-form tool-using interactions effectively unusable on this build/model combo.

Suggested Fix Direction

  • Investigate whether the truncation event (incomplete final assistant message at 8:42 PM above) is being parsed by the agent loop as a successful turn end, causing the next iteration to "forget" it was in the middle of a plan.
  • Add an integration test: a 3-step exec plan must complete all 3 steps regardless of intermediate truncations or tool errors.

Bug Details

Field Value
Priority Unprioritized
Action Dev - Open - To fix
Disposition Open issue
Module Machine Learning - NemoClaw
Keyword NemoClaw, NemoClaw_Agent&Skills, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw-SWQA-RelBlckr-Recommended, NemoClaw-SWQA-VDR

[NVB#6122540]

Metadata

Metadata

Assignees

No one assigned

    Labels

    NV QABugs found by the NVIDIA QA TeamUATIssues flagged for User Acceptance Testing.VDRLinked to VDR findingarea: cliCommand line interface, flags, terminal UX, or outputarea: sandboxOpenShell sandbox lifecycle, runtime, config, or recoveryplatform: macosAffects macOS, including Apple Siliconprovider: nvidiaNVIDIA inference endpoint, NIM, or NVIDIA provider behavior

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions