feat(core): preserve task plan state in compaction summaries#163
Conversation
When context compaction fires, the agent loses awareness of its task plan (completed, in-progress, pending work) and may re-plan already-done tasks. Add extractTaskPlanSummary() that queries the TaskStore and produces a structured <task-plan> XML section with status markers ([x], [~], [ ], [-], [!]), priority labels, and parent-child indentation. Extend compactMessages() to accept an optional taskStore and append the plan to the compaction summary. Wire the TaskStore into agent-core at the compaction call site. Backward compatible: existing callers without taskStore remain unaffected. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughAdds optional task-plan preservation to context compaction: a Changes
Sequence DiagramsequenceDiagram
actor AgentRuntime as Agent Core Runtime
participant CM as compactMessages()
participant ETPS as extractTaskPlanSummary()
participant TS as TaskStore
AgentRuntime->>AgentRuntime: Check if masked > targetTokens
alt taskStore available
AgentRuntime->>AgentRuntime: Get taskStore from runtimeContext
AgentRuntime->>CM: Call compactMessages(history, target, { taskStore })
CM->>ETPS: Request task-plan extraction
ETPS->>TS: TS.list() -> fetch tasks
TS-->>ETPS: Return task list
ETPS-->>CM: Return formatted <task-plan> string
CM->>CM: Prepend task-plan to compact summary and assemble Content[]
CM-->>AgentRuntime: Return Promise<Content[]>
AgentRuntime->>AgentRuntime: Await and continue
else no taskStore
AgentRuntime->>CM: Call compactMessages(history, target)
CM-->>AgentRuntime: Return Content[] synchronously
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Review rate limit: 2/5 reviews remaining, refill in 25 minutes and 55 seconds. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/core/src/agents/runtime/compaction.ts`:
- Around line 122-137: The current async block calls
extractTaskPlanSummary(options.taskStore!) without handling rejections, which
can break compaction; wrap that await in a try/catch inside the async IIFE
(around extractTaskPlanSummary) and on error swallow or log the error and set
taskPlan to undefined (or otherwise use the plain summary), so the function
returns the fallback summaryContent + recent instead of propagating the
exception; refer to extractTaskPlanSummary and the async IIFE handling
options?.taskStore to locate where to add the try/catch and fallback logic.
- Around line 54-75: The code treats any task whose parent isn't a root as an
orphan and only renders one subtask level; fix by building an id->task map and
rendering tasks recursively: create a taskMap from tasks (id -> task), replace
knownParents with taskMap.keys() or a full set of all task IDs, implement a
recursive render function (e.g., renderTask(task, indent)) that uses
subtaskMap.get(task.id) to render children at increasing indentation and call
renderTask for each root in rootTasks; for the orphan pass, only emit a task as
orphan if its parentTaskId is not present in taskMap (i.e., parent chain does
not exist). Ensure you reference and update rootTasks, subtaskMap, taskMap (or
knownParents), tasks, and the rendering logic that builds lines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1171032f-c1fb-4f87-b4f3-727cdb903492
📒 Files selected for processing (3)
packages/core/src/agents/runtime/agent-core.tspackages/core/src/agents/runtime/compaction.test.tspackages/core/src/agents/runtime/compaction.ts
Address PR feedback from CodeRabbit: - Wrap extractTaskPlanSummary call in try/catch so TaskStore failures don't break compaction - Replace flat 2-level subtask rendering with recursive renderTask() that supports arbitrary nesting depth - Add tests for multi-level nesting and error fallback Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
…ary (Phase 1 of #162) (#165) * feat(core): preserve task plan state in compaction summaries (#163) * feat(core): preserve task plan state in compaction summaries When context compaction fires, the agent loses awareness of its task plan (completed, in-progress, pending work) and may re-plan already-done tasks. Add extractTaskPlanSummary() that queries the TaskStore and produces a structured <task-plan> XML section with status markers ([x], [~], [ ], [-], [!]), priority labels, and parent-child indentation. Extend compactMessages() to accept an optional taskStore and append the plan to the compaction summary. Wire the TaskStore into agent-core at the compaction call site. Backward compatible: existing callers without taskStore remain unaffected. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix: add error handling and recursive nesting to compaction task plan Address PR feedback from CodeRabbit: - Wrap extractTaskPlanSummary call in try/catch so TaskStore failures don't break compaction - Replace flat 2-level subtask rendering with recursive renderTask() that supports arbitrary nesting depth - Add tests for multi-level nesting and error fallback Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> --------- Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * feat(telemetry,ui): capture reasoning on Langfuse span and collapse thoughts post-stream Phase 1 of the reasoning coordination tracked in #162. Captures delta.reasoning_content / delta.reasoning across stream chunks and surfaces it as gen_ai.response.thinking on the gen_ai chat span (gated on logPrompts, matching the completion event policy). Always emits gen_ai.usage.thinking_tokens when usage exposes it. Non-streaming responses get the same treatment by inspecting {thought:true} parts on the response — and the completion event no longer double-counts thoughts as content. Renders gemini_thought items as a compact "▸ thinking (N chars)" summary once the stream finalizes (live streaming render unchanged). Full text remains in ChatRecord, ACP agent_thought_chunk notifications, and Langfuse for downstream investigation. An in-TUI expand affordance is a follow-up. Once homelab-iac#31 (EMIT_REASONING_CONTENT) flips on, this also covers vLLM-served models that previously lost their <think> blocks at the gateway. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r nuke) — bumps to v0.28.0 (#169) * fix(core): preserve tool history when building no-tools requests /recap (and any other caller without tools, e.g. /btw) was sending an empty conversation to the model. The no-tools branch in pipeline buildRequest dropped every assistant turn with tool_calls and every tool-role message wholesale, so in tool-heavy sessions the recap saw only bare user prompts and hallucinated context. - generateRecap now passes tools: [] so the strip path doesn't fire, matching cc-2.18's awaySummary pattern. - pipeline.ts no-tools branch now flattens instead of dropping: keeps assistant prose content and removes only the tool_calls field; tool results become [tool result] assistant notes truncated at 2000 chars. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): release.yml now fires on auto-release/v* PRs (#160) auto-release.yml opens version-bump PRs from `auto-release/v*` branches into main, but release.yml's job gate only matched `head.ref == 'dev'`. Result: every auto-release PR was merging cleanly but skipping publish (v0.26.25 had to be dispatched manually). This adds the auto-release/* prefix to the gate and refreshes the stale top-of-file comment. Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: release v0.26.26 (#161) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * feat(telemetry,ui): reasoning span attribute + collapsed thought summary (Phase 1 of #162) (#165) * feat(core): preserve task plan state in compaction summaries (#163) * feat(core): preserve task plan state in compaction summaries When context compaction fires, the agent loses awareness of its task plan (completed, in-progress, pending work) and may re-plan already-done tasks. Add extractTaskPlanSummary() that queries the TaskStore and produces a structured <task-plan> XML section with status markers ([x], [~], [ ], [-], [!]), priority labels, and parent-child indentation. Extend compactMessages() to accept an optional taskStore and append the plan to the compaction summary. Wire the TaskStore into agent-core at the compaction call site. Backward compatible: existing callers without taskStore remain unaffected. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix: add error handling and recursive nesting to compaction task plan Address PR feedback from CodeRabbit: - Wrap extractTaskPlanSummary call in try/catch so TaskStore failures don't break compaction - Replace flat 2-level subtask rendering with recursive renderTask() that supports arbitrary nesting depth - Add tests for multi-level nesting and error fallback Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> --------- Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * feat(telemetry,ui): capture reasoning on Langfuse span and collapse thoughts post-stream Phase 1 of the reasoning coordination tracked in #162. Captures delta.reasoning_content / delta.reasoning across stream chunks and surfaces it as gen_ai.response.thinking on the gen_ai chat span (gated on logPrompts, matching the completion event policy). Always emits gen_ai.usage.thinking_tokens when usage exposes it. Non-streaming responses get the same treatment by inspecting {thought:true} parts on the response — and the completion event no longer double-counts thoughts as content. Renders gemini_thought items as a compact "▸ thinking (N chars)" summary once the stream finalizes (live streaming render unchanged). Full text remains in ChatRecord, ACP agent_thought_chunk notifications, and Langfuse for downstream investigation. An in-TUI expand affordance is a follow-up. Once homelab-iac#31 (EMIT_REASONING_CONTENT) flips on, this also covers vLLM-served models that previously lost their <think> blocks at the gateway. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: release v0.26.27 (#166) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * chore(telemetry): rebrand to proto-cli, nuke qwen-logger Alibaba RUM ping (#167) * chore(telemetry): rebrand qwen-code identifiers to proto-cli Aligns telemetry / public-facing identifiers with the actual product name. Verified against the Langfuse instance: new spans land with service.name=proto-cli on scope=proto.openai-pipeline; existing proto.* tracers (proto.llm, proto.turn, proto.tools, proto.harness, etc.) were already correct. Changes: - SERVICE_NAME: qwen-code → proto-cli (resource attribute, the marquee label in Langfuse's service column) - All EVENT_* constants: qwen-code.* → proto.* (matches the existing proto.harness.* convention already in this file) - pipeline.ts tracer: qwen-code.openai-pipeline → proto.openai-pipeline (one straggler vs. the 9 other proto.* tracers in core/) - types.ts event.name literals (PromptSuggestion, Speculation): qwen-code.* → proto.* - acpAgent.ts agentInfo.name: qwen-code → proto-cli (visible to ACP clients like Zed when listing agents) - marketplace.ts User-Agent: qwen-code → proto-cli (extension fetch identifier sent to api.github.com / raw.githubusercontent.com) Out of scope (deliberately): - packages/core/src/telemetry/qwen-logger/* — separate analytics ping to gb4w8c3ygj-default-sea.rum.aliyuncs.com (Alibaba RUM, the upstream Qwen team's endpoint). Should be disabled rather than rebranded; tracking separately. - DEFAULT_SERVICE_NAME='qwen-code-oauth' in mcp/token-storage — renaming would orphan existing keychain entries. - Misc qwen-code-* file paths, tmp dir names, sandbox image tag, test fixtures — not telemetry / not user-visible labels. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(telemetry): remove qwen-logger Alibaba RUM ping; keep useful events on Langfuse The qwen-logger system shipped usage telemetry to a fixed Alibaba RUM endpoint (gb4w8c3ygj-default-sea.rum.aliyuncs.com) — the upstream Qwen Code team's analytics pipeline. We don't operate that endpoint, the data isn't visible to us, and it labelled traffic as qwen-code-cli / qwen-code@${version}. Confirmed unused on our deployment; nuking. What's removed: - packages/core/src/telemetry/qwen-logger/ (entire directory: logger, event-types, tests) - packages/core/src/telemetry/integration.test.circular.ts (was a qwen-logger-specific circular-reference proxy-agent test, no longer applicable) - ~30 QwenLogger.getInstance(config)?.logXxxEvent(event) callsites in loggers.ts - QwenLogger exports from telemetry/index.ts and core/index.ts - QwenLogger spies and assertions in config.test.ts and the describe('logHookCall', ...) block in loggers.test.ts that was exclusively QwenLogger-shaped What's kept and rerouted to OTel/Langfuse: - HookCallEvent type and the logHookCall function — hook execution data is genuinely useful telemetry (which hook fired, success, duration, exit code, captured stdout/stderr, error). Now emits a proto.hook_call OTel log record via logs.getLogger(SERVICE_NAME) instead of the Alibaba ping. Existing call site in hookEventHandler.ts:619 still fires per hook execution. - LoopDetectionDisabledEvent likewise: was an empty no-op after the qwen-logger pull; rerouted to a proto.loop_detection_disabled OTel log record so the signal still reaches Langfuse. - New tests in loggers.test.ts assert OTel emission shape for logHookCall (success, error, sdk-not-initialized branches). Renamed (per "all not used" — no existing keychain entries to invalidate): - DEFAULT_SERVICE_NAME 'qwen-code-oauth' → 'proto-cli-oauth' - FORCE_ENCRYPTED_FILE_ENV_VAR 'QWEN_CODE_…' → 'PROTO_CLI_…' - file-token-storage encryption salt prefix and scrypt key seed switched to proto-cli; only invalidates non-existent tokens Verified live: kimi-k2.6 turn through the rebuilt CLI lands a Langfuse trace with service=proto-cli, scope=proto.openai-pipeline, gen_ai.response.thinking present. No outbound traffic to aliyuncs.com. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: release v0.26.28 (#168) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> --------- Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Problem
When context compaction fires during long agent sessions,
summarizeHistory()produces flat- Called tool: resultsummaries with no awareness of the agent's task plan. After compaction, the agent forgets which tasks are completed vs. pending and may re-plan already-done work.This is P4 from the Benchmark Competitiveness PRD — the last remaining gap (P0–P3 and P5 are already implemented).
Solution
extractTaskPlanSummary(taskStore)— Queries the TaskStore and produces a structured<task-plan>XML section with:[x]completed,[~]in_progress,[ ]pending,[-]cancelled,[!]blocked(${priority})compactMessages()— Accepts optional{ taskStore }parameter; appends task plan to compaction summaryagent-core.ts— Passesthis.runtimeContext.getTaskStore()at the compaction call site with async/sync dual-path handlingExample Output
After compaction, the agent now sees:
Changes
compaction.tsCompactMessagesOptionsinterface,extractTaskPlanSummary()function; extendedcompactMessages()signaturecompaction.test.tscompactMessages, backward compat, early returnagent-core.tsVerification
taskStoreremain unaffectedSummary by CodeRabbit
New Features
Bug Fixes
Tests
Documentation