Skip to content

fix: add session loop diagnostics#204

Merged
Astro-Han merged 1 commit into
devfrom
codex/feat-i133-diagnostics
Apr 24, 2026
Merged

fix: add session loop diagnostics#204
Astro-Han merged 1 commit into
devfrom
codex/feat-i133-diagnostics

Conversation

@Astro-Han

@Astro-Han Astro-Han commented Apr 24, 2026

Copy link
Copy Markdown
Owner

Summary

Add lightweight session diagnostics for repeated tool behavior:

  • record per-tool loop diagnostics in tool part state metadata
  • detect repeated tool inputs and repeated tool error classes within the same user block
  • inject a one-time model-facing reminder after the third repeat
  • preserve diagnostics metadata across tool completion, errors, and shell output updates
  • store target summaries as non-readable hashes to avoid leaking tool input details

Why

Issue #133 identified sessions where models repeatedly called the same tool input or retried the same failing tool pattern. This keeps V1 small: it observes and nudges the model without blocking tools, asking the user, adding model-specific rules, or changing normal exploration behavior.

Related Issue

Closes #133

How To Verify

cd packages/opencode
bun test test/session/diagnostics.test.ts
bun test test/session/prompt-effect.test.ts --timeout 30000
bun test test/session/processor-effect.test.ts --timeout 30000
bun run typecheck
bun run test

Final full package result:

2053 pass
8 skip
1 todo
0 fail

Also ran:

git diff --check

Screenshots or Recordings

Not applicable. This is harness/session behavior with no visible UI change.

Checklist

  • I linked the related issue, or stated why there is no issue
  • This PR has type, scope, and priority labels, or I requested maintainer labeling
  • I listed the relevant verification steps, including tests when behavior changed
  • I manually checked visible UI or copy changes when needed, with screenshots or recordings
  • I considered macOS and Windows impact for desktop, packaging, updater, signing, paths, shell, or permissions changes
  • I called out docs, release notes, dependencies, permissions, credentials, deletion behavior, or generated/local file changes when relevant
  • I am targeting dev, and my PR title and commit messages use Conventional Commits in English

Maintainer labeling requested: type fix, scope opencode/session, priority based on issue #133.

Summary by CodeRabbit

Release Notes

  • New Features
    • Enhanced session loop detection automatically identifies and tracks repeated tool calls with identical inputs and errors.
    • Diagnostic reminders are injected when repeated patterns are detected, helping prevent infinite loops and unnecessary operations.

@coderabbitai

coderabbitai Bot commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: aae710c3-36ca-416e-9e8e-0645c4fbe323

📥 Commits

Reviewing files that changed from the base of the PR and between d35885d and 3440a8e.

📒 Files selected for processing (5)
  • packages/opencode/src/session/diagnostics.ts
  • packages/opencode/src/session/processor.ts
  • packages/opencode/src/session/prompt.ts
  • packages/opencode/test/session/diagnostics.test.ts
  • packages/opencode/test/session/prompt-effect.test.ts
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: smoke-macos-arm64
  • GitHub Check: unit-windows-opencode-config-project
  • GitHub Check: unit-windows-opencode-server-tools
  • GitHub Check: unit-windows-app
  • GitHub Check: unit-windows-opencode-session
  • GitHub Check: unit-windows-desktop
  • GitHub Check: unit-opencode
  • GitHub Check: unit-app
  • GitHub Check: unit-desktop
  • GitHub Check: typecheck
  • GitHub Check: analyze-js-ts
  • GitHub Check: e2e-artifacts
🧰 Additional context used
📓 Path-based instructions (2)
packages/opencode/**/*.ts

📄 CodeRabbit inference engine (packages/opencode/AGENTS.md)

packages/opencode/**/*.ts: Use Effect.gen(function* () { ... }) for Effect composition
Use Effect.fn("Domain.method") for named/traced effects and Effect.fnUntraced for internal helpers; these accept pipeable operators as extra arguments to avoid unnecessary outer .pipe() wrappers
Use Effect.callback for callback-based APIs
Prefer DateTime.nowAsDate over new Date(yield* Clock.currentTimeMillis) when you need a Date in Effect code
Use Schema.Class for multi-field data in Effect schemas
Use branded schemas (Schema.brand) for single-value types in Effect
Use Schema.TaggedErrorClass for typed errors in Effect schemas
Use Schema.Defect instead of unknown for defect-like causes in Effect code
In Effect.gen / Effect.fn, prefer yield* new MyError(...) over yield* Effect.fail(new MyError(...)) for direct early-failure branches
Use makeRuntime from src/effect/run-service.ts for all services; it returns { runPromise, runFork, runCallback } backed by a shared memoMap that deduplicates layers
Use InstanceState from src/effect/instance-state.ts for per-directory or per-project state that needs per-instance cleanup; do work directly in the InstanceState.make closure where ScopedCache handles run-once semantics
Use Effect.addFinalizer or Effect.acquireRelease inside the InstanceState.make closure for cleanup (subscriptions, process teardown, etc.)
Use Effect.forkScoped inside the InstanceState.make closure for background stream consumers — the fiber is interrupted when the instance is disposed
Prefer FileSystem.FileSystem instead of raw fs/promises for effectful file I/O in Effect services
Prefer ChildProcessSpawner.ChildProcessSpawner with ChildProcess.make(...) instead of custom process wrappers in Effect services
Prefer HttpClient.HttpClient instead of raw fetch in Effect services
Prefer Path.Path, Config, Clock, and DateTime services when those concerns are already inside Effect code
For backgroun...

Files:

  • packages/opencode/test/session/prompt-effect.test.ts
  • packages/opencode/src/session/diagnostics.ts
  • packages/opencode/test/session/diagnostics.test.ts
  • packages/opencode/src/session/processor.ts
  • packages/opencode/src/session/prompt.ts
packages/opencode/test/**/*.test.{ts,tsx}

📄 CodeRabbit inference engine (packages/opencode/test/AGENTS.md)

packages/opencode/test/**/*.test.{ts,tsx}: Use the tmpdir function from fixture/fixture.ts to create temporary directories for tests with automatic cleanup. Use await using syntax to ensure automatic cleanup when the variable goes out of scope.
When using the tmpdir function with git repository support, pass the git: true option to initialize a git repo with a root commit.
Use the config option in tmpdir to write an opencode.json config file during test setup by passing a partial Config.Info object.
Use the init option in tmpdir to define custom setup functions that can return extra data accessible via tmp.extra, and use the dispose option for custom cleanup logic.
Use testEffect(...) from test/lib/effect.ts for tests that exercise Effect services or Effect-based workflows.
Use it.effect(...) when the test should run with TestClock and TestConsole. Use it.live(...) when the test depends on real time, filesystem mtimes, child processes, git, locks, or other live OS behavior.
Prefer Effect-aware helpers from fixture/fixture.ts over building manual runtimes in tests: use tmpdirScoped() for scoped temp directories, provideInstance(dir)(effect) for low-level binding without directory creation, provideTmpdirInstance(...) for single temp instance binding, or provideTmpdirServer(...) for tests that also need the test LLM server.
Define const it = testEffect(...) near the top of the test file and keep the test body inside Effect.gen(function* () { ... }). Yield services directly with yield* MyService.Service or yield* MyTool.
Avoid custom ManagedRuntime, attach(...), or ad hoc run(...) wrappers in Effect tests when testEffect(...) already provides the runtime.
When a test needs instance-local state, prefer provideTmpdirInstance(...) or provideInstance(...) over manual Instance.provide(...) inside Promise-style tests.

Files:

  • packages/opencode/test/session/prompt-effect.test.ts
  • packages/opencode/test/session/diagnostics.test.ts
🧠 Learnings (22)
📓 Common learnings
Learnt from: Astro-Han
Repo: Astro-Han/pawwork PR: 193
File: packages/app/e2e/sidebar/sidebar-leading-slot.spec.ts:5-55
Timestamp: 2026-04-23T15:25:31.118Z
Learning: In Astro-Han/pawwork E2E tests, reaching a real "running" session state requires the `project` fixture (for model bootstrap) plus `llm.wait(1)` orchestration. The bare `sdk` fixture used by `withSession` does not trigger an LLM call even with `agent: "build"` + `system` prompt set via `sdk.session.promptAsync`, so simulating running state is not lightweight in the current test infrastructure.
📚 Learning: 2026-04-20T14:36:31.032Z
Learnt from: CR
Repo: Astro-Han/pawwork PR: 0
File: packages/opencode/test/AGENTS.md:0-0
Timestamp: 2026-04-20T14:36:31.032Z
Learning: Applies to packages/opencode/test/**/*.test.{ts,tsx} : Define `const it = testEffect(...)` near the top of the test file and keep the test body inside `Effect.gen(function* () { ... })`. Yield services directly with `yield* MyService.Service` or `yield* MyTool`.

Applied to files:

  • packages/opencode/test/session/prompt-effect.test.ts
  • packages/opencode/test/session/diagnostics.test.ts
  • packages/opencode/src/session/processor.ts
📚 Learning: 2026-04-20T14:36:31.032Z
Learnt from: CR
Repo: Astro-Han/pawwork PR: 0
File: packages/opencode/test/AGENTS.md:0-0
Timestamp: 2026-04-20T14:36:31.032Z
Learning: Applies to packages/opencode/test/**/*.test.{ts,tsx} : Prefer Effect-aware helpers from `fixture/fixture.ts` over building manual runtimes in tests: use `tmpdirScoped()` for scoped temp directories, `provideInstance(dir)(effect)` for low-level binding without directory creation, `provideTmpdirInstance(...)` for single temp instance binding, or `provideTmpdirServer(...)` for tests that also need the test LLM server.

Applied to files:

  • packages/opencode/test/session/prompt-effect.test.ts
  • packages/opencode/src/session/processor.ts
📚 Learning: 2026-04-20T14:36:31.032Z
Learnt from: CR
Repo: Astro-Han/pawwork PR: 0
File: packages/opencode/test/AGENTS.md:0-0
Timestamp: 2026-04-20T14:36:31.032Z
Learning: Applies to packages/opencode/test/**/*.test.{ts,tsx} : Use `it.effect(...)` when the test should run with `TestClock` and `TestConsole`. Use `it.live(...)` when the test depends on real time, filesystem mtimes, child processes, git, locks, or other live OS behavior.

Applied to files:

  • packages/opencode/test/session/prompt-effect.test.ts
📚 Learning: 2026-04-20T14:36:31.032Z
Learnt from: CR
Repo: Astro-Han/pawwork PR: 0
File: packages/opencode/test/AGENTS.md:0-0
Timestamp: 2026-04-20T14:36:31.032Z
Learning: Applies to packages/opencode/test/**/*.test.{ts,tsx} : Use `testEffect(...)` from `test/lib/effect.ts` for tests that exercise Effect services or Effect-based workflows.

Applied to files:

  • packages/opencode/test/session/prompt-effect.test.ts
  • packages/opencode/src/session/processor.ts
📚 Learning: 2026-04-20T14:36:31.032Z
Learnt from: CR
Repo: Astro-Han/pawwork PR: 0
File: packages/opencode/test/AGENTS.md:0-0
Timestamp: 2026-04-20T14:36:31.032Z
Learning: Applies to packages/opencode/test/**/*.test.{ts,tsx} : Avoid custom `ManagedRuntime`, `attach(...)`, or ad hoc `run(...)` wrappers in Effect tests when `testEffect(...)` already provides the runtime.

Applied to files:

  • packages/opencode/test/session/prompt-effect.test.ts
📚 Learning: 2026-04-20T14:36:31.032Z
Learnt from: CR
Repo: Astro-Han/pawwork PR: 0
File: packages/opencode/test/AGENTS.md:0-0
Timestamp: 2026-04-20T14:36:31.032Z
Learning: Applies to packages/opencode/test/**/*.test.{ts,tsx} : When a test needs instance-local state, prefer `provideTmpdirInstance(...)` or `provideInstance(...)` over manual `Instance.provide(...)` inside Promise-style tests.

Applied to files:

  • packages/opencode/test/session/prompt-effect.test.ts
📚 Learning: 2026-04-22T09:32:54.556Z
Learnt from: Astro-Han
Repo: Astro-Han/pawwork PR: 126
File: packages/opencode/test/provider/provider.test.ts:64-85
Timestamp: 2026-04-22T09:32:54.556Z
Learning: In `packages/opencode/test/provider/provider.test.ts`, the file intentionally uses AppRuntime-based async helpers (`run`, `list`, `getProvider`, etc.) rather than `testEffect(...)` for all tests. Converting individual tests to `testEffect` while leaving the rest on the async pattern would create internal inconsistency. A full harness migration of this file is the right approach if the pattern needs to change, but that should be a separate PR.

Applied to files:

  • packages/opencode/test/session/prompt-effect.test.ts
📚 Learning: 2026-04-20T14:36:31.032Z
Learnt from: CR
Repo: Astro-Han/pawwork PR: 0
File: packages/opencode/test/AGENTS.md:0-0
Timestamp: 2026-04-20T14:36:31.032Z
Learning: Applies to packages/opencode/test/**/*.test.{ts,tsx} : Use the `tmpdir` function from `fixture/fixture.ts` to create temporary directories for tests with automatic cleanup. Use `await using` syntax to ensure automatic cleanup when the variable goes out of scope.

Applied to files:

  • packages/opencode/test/session/prompt-effect.test.ts
📚 Learning: 2026-04-20T14:36:04.113Z
Learnt from: CR
Repo: Astro-Han/pawwork PR: 0
File: packages/app/e2e/AGENTS.md:0-0
Timestamp: 2026-04-20T14:36:04.113Z
Learning: Applies to packages/app/e2e/packages/app/src/testing/**/*.ts : Test-only hooks must be inert unless explicitly enabled and should not add normal-runtime listeners, reactive subscriptions, or per-update allocations

Applied to files:

  • packages/opencode/test/session/prompt-effect.test.ts
  • packages/opencode/test/session/diagnostics.test.ts
📚 Learning: 2026-04-23T15:25:31.118Z
Learnt from: Astro-Han
Repo: Astro-Han/pawwork PR: 193
File: packages/app/e2e/sidebar/sidebar-leading-slot.spec.ts:5-55
Timestamp: 2026-04-23T15:25:31.118Z
Learning: In Astro-Han/pawwork E2E tests, reaching a real "running" session state requires the `project` fixture (for model bootstrap) plus `llm.wait(1)` orchestration. The bare `sdk` fixture used by `withSession` does not trigger an LLM call even with `agent: "build"` + `system` prompt set via `sdk.session.promptAsync`, so simulating running state is not lightweight in the current test infrastructure.

Applied to files:

  • packages/opencode/test/session/prompt-effect.test.ts
📚 Learning: 2026-04-20T14:36:31.032Z
Learnt from: CR
Repo: Astro-Han/pawwork PR: 0
File: packages/opencode/test/AGENTS.md:0-0
Timestamp: 2026-04-20T14:36:31.032Z
Learning: Applies to packages/opencode/test/**/*.test.{ts,tsx} : Use the `init` option in `tmpdir` to define custom setup functions that can return extra data accessible via `tmp.extra`, and use the `dispose` option for custom cleanup logic.

Applied to files:

  • packages/opencode/test/session/prompt-effect.test.ts
📚 Learning: 2026-04-23T08:51:00.819Z
Learnt from: Astro-Han
Repo: Astro-Han/pawwork PR: 186
File: packages/opencode/test/plugin/workspace-adaptor.test.ts:139-144
Timestamp: 2026-04-23T08:51:00.819Z
Learning: For pawwork tests under packages/opencode/test/**, auth.json teardown may intentionally combine `Filesystem.write` (from `packages/opencode/src/util/filesystem.ts`) with `node:fs/promises` `unlink` for cleanup. Do not flag this as inconsistent style; it is the established/intentional pattern because `Filesystem` does not provide a `remove`/`unlink` helper.

Applied to files:

  • packages/opencode/test/session/prompt-effect.test.ts
  • packages/opencode/test/session/diagnostics.test.ts
📚 Learning: 2026-04-20T14:36:04.113Z
Learnt from: CR
Repo: Astro-Han/pawwork PR: 0
File: packages/app/e2e/AGENTS.md:0-0
Timestamp: 2026-04-20T14:36:04.113Z
Learning: Applies to packages/app/e2e/**/*.spec.ts : Call `project.trackSession(sessionID, directory?)` and `project.trackDirectory(directory)` for any resources created outside the fixture so teardown can clean them up

Applied to files:

  • packages/opencode/test/session/diagnostics.test.ts
📚 Learning: 2026-04-20T14:36:04.113Z
Learnt from: CR
Repo: Astro-Han/pawwork PR: 0
File: packages/app/e2e/AGENTS.md:0-0
Timestamp: 2026-04-20T14:36:04.113Z
Learning: Applies to packages/app/e2e/**/*.spec.ts : Use fixture-managed cleanup with `withSession(sdk, title, callback)` for temporary sessions instead of calling `sdk.session.delete(...)` directly

Applied to files:

  • packages/opencode/test/session/diagnostics.test.ts
📚 Learning: 2026-04-20T14:36:21.288Z
Learnt from: CR
Repo: Astro-Han/pawwork PR: 0
File: packages/opencode/AGENTS.md:0-0
Timestamp: 2026-04-20T14:36:21.288Z
Learning: Applies to packages/opencode/**/*.ts : Use `Effect.fn("Domain.method")` for named/traced effects and `Effect.fnUntraced` for internal helpers; these accept pipeable operators as extra arguments to avoid unnecessary outer `.pipe()` wrappers

Applied to files:

  • packages/opencode/src/session/processor.ts
📚 Learning: 2026-04-20T14:36:21.288Z
Learnt from: CR
Repo: Astro-Han/pawwork PR: 0
File: packages/opencode/AGENTS.md:0-0
Timestamp: 2026-04-20T14:36:21.288Z
Learning: Applies to packages/opencode/**/*.ts : Prefer `Path.Path`, `Config`, `Clock`, and `DateTime` services when those concerns are already inside Effect code

Applied to files:

  • packages/opencode/src/session/processor.ts
📚 Learning: 2026-04-20T14:36:21.288Z
Learnt from: CR
Repo: Astro-Han/pawwork PR: 0
File: packages/opencode/AGENTS.md:0-0
Timestamp: 2026-04-20T14:36:21.288Z
Learning: Applies to packages/opencode/**/*.ts : Use `Effect.gen(function* () { ... })` for Effect composition

Applied to files:

  • packages/opencode/src/session/processor.ts
📚 Learning: 2026-04-20T14:36:21.288Z
Learnt from: CR
Repo: Astro-Han/pawwork PR: 0
File: packages/opencode/AGENTS.md:0-0
Timestamp: 2026-04-20T14:36:21.288Z
Learning: Applies to packages/opencode/**/*.ts : Use `Effect.cached` when multiple concurrent callers should share a single in-flight computation rather than storing `Fiber | undefined` or `Promise | undefined` manually

Applied to files:

  • packages/opencode/src/session/processor.ts
📚 Learning: 2026-04-20T14:36:21.288Z
Learnt from: CR
Repo: Astro-Han/pawwork PR: 0
File: packages/opencode/AGENTS.md:0-0
Timestamp: 2026-04-20T14:36:21.288Z
Learning: Applies to packages/opencode/**/*.ts : Prefer `FileSystem.FileSystem` instead of raw `fs/promises` for effectful file I/O in Effect services

Applied to files:

  • packages/opencode/src/session/processor.ts
📚 Learning: 2026-04-20T14:36:21.288Z
Learnt from: CR
Repo: Astro-Han/pawwork PR: 0
File: packages/opencode/AGENTS.md:0-0
Timestamp: 2026-04-20T14:36:21.288Z
Learning: Applies to packages/opencode/**/*.ts : Use `Effect.callback` for callback-based APIs

Applied to files:

  • packages/opencode/src/session/processor.ts
📚 Learning: 2026-04-20T14:36:21.288Z
Learnt from: CR
Repo: Astro-Han/pawwork PR: 0
File: packages/opencode/AGENTS.md:0-0
Timestamp: 2026-04-20T14:36:21.288Z
Learning: Applies to packages/opencode/**/*.ts : For background loops or scheduled tasks, use `Effect.repeat` or `Effect.schedule` with `Effect.forkScoped` in the layer definition

Applied to files:

  • packages/opencode/src/session/processor.ts
🔇 Additional comments (21)
packages/opencode/src/session/diagnostics.ts (5)

64-66: LGTM: Hash function is appropriate for diagnostics fingerprinting.

The 64-bit truncated SHA-256 hash provides sufficient collision resistance for tracking tool input/error patterns within a session context.


68-72: LGTM: Input normalization correctly handles semantic equivalence.

The function filters non-semantic keys (request IDs, trace IDs, nonces), sorts object keys for stable serialization, and trims strings. This ensures semantically equivalent inputs produce identical hashes.


80-97: LGTM: Error fingerprinting normalizes variable components effectively.

The regex-based normalization replaces URLs, quoted strings, paths, hex IDs, and numbers with placeholders, ensuring equivalent error classes produce the same fingerprint regardless of specific values.


231-287: LGTM: consumeReminders correctly marks pending reminders as injected and generates appropriate nudge text.

The function properly:

  • Filters for assistant messages matching the parentID
  • Transitions pending reminders to "injected" status with timestamp
  • Returns updated tool parts for persistence
  • Generates differentiated text for input vs error repeats

212-229: No action needed; current behavior is intentional.

The spread-based merge at the loop level does replace the entire reminders array, but this is not a bug. All usages follow a pattern where reminders are either freshly generated or explicitly managed before mergeMetadata is called. The function is not designed for accumulating reminders across multiple merges; responsibility for reminder management is delegated to the callers.

packages/opencode/test/session/prompt-effect.test.ts (1)

494-540: LGTM: Comprehensive integration test for diagnostics reminder injection.

The test correctly validates the end-to-end flow:

  1. Three identical tool calls trigger the repeat detection
  2. The diagnostics reminder text is injected into the next LLM request
  3. Tool metadata is persisted with proper inputRepeatCount and input_repeat reminder state

The test follows project conventions using it.live, Effect.gen, and provideTmpdirServer fixture.

packages/opencode/src/session/prompt.ts (4)

38-38: LGTM: Import added for SessionDiagnostics integration.


887-887: LGTM: Shell metadata updates now preserve existing diagnostics.

The spread pattern { ...part.state.metadata, output, description: "" } correctly preserves any existing diagnostic metadata while updating the output field. This aligns with the mergeMetadata contract from SessionDiagnostics.

Also applies to: 905-905


1399-1399: LGTM: Aborted shell completion preserves diagnostic metadata.

Consistent with the other shell metadata updates, the spread pattern ensures diagnostics are not lost when finalizing an aborted shell command.


1514-1529: LGTM: Diagnostics consumption correctly integrated into the run loop.

The implementation:

  1. Consumes pending reminders after other reminder insertions
  2. Appends synthetic diagnostic text to the user message for model input (in-memory only)
  3. Persists the updated tool parts with reminders marked as "injected"

The synthetic text part is correctly added only to the in-memory msgs array used for model input, not persisted, ensuring the nudge is model-facing only as intended by the PR objectives.

packages/opencode/src/session/processor.ts (5)

167-174: LGTM: Helper functions for extracting tool state metadata.

Clean extraction of metadata and diagnostics from tool parts with proper type guards using isRecord.


176-217: LGTM: Record collection functions properly scope diagnostics to user blocks.

Both loopRecords and errorRecords correctly filter by parentID to ensure repeat tracking is scoped to the current user message block, preventing false positives across different conversation turns.


230-243: LGTM: Tool completion preserves diagnostics metadata.

The completeToolCall function correctly extracts existing diagnostics from the running state and merges them into the completed state using SessionDiagnostics.mergeMetadata.


246-273: LGTM: Tool failure observes error patterns and preserves diagnostics.

The failToolCall function:

  1. Observes the error pattern via SessionDiagnostics.observeToolError when parentID exists
  2. Falls back to existing diagnostics when no parentID
  3. Merges diagnostics into the error state metadata

This ensures error repeat detection works correctly while preserving any existing diagnostic data.


350-383: LGTM: Tool call event handler integrates diagnostics observation.

The handler correctly:

  1. Updates the tool part to running state first
  2. Guards against missing parentID before observing
  3. Fetches session info to get parentSessionID for subagent tracking
  4. Observes the tool call and merges diagnostics into the running state

The sequential flow ensures the tool part exists before diagnostics are attached.

packages/opencode/test/session/diagnostics.test.ts (6)

1-17: LGTM: Well-structured test setup for SessionDiagnostics.

The test file properly imports dependencies, creates test fixtures, and defines a helpful loop() extraction function for cleaner assertions.


18-40: LGTM: Input normalization tests verify semantic stability.

The tests correctly verify:

  1. Reordered keys and filtered non-semantic fields produce stable hashes
  2. Semantic fields like cursor produce different hashes when values differ

42-144: LGTM: Comprehensive observeToolCall test coverage.

The tests verify:

  1. Reminder creation only on the third repeat (not second, not fourth)
  2. Cross-block isolation prevents false positives
  3. Different URLs in exploratory patterns don't trigger reminders

This covers the key behaviors of the repeat detection logic.


146-173: LGTM: Error normalization test validates fingerprinting.

The test correctly verifies that equivalent error messages with different numeric values (positions 12, 18, 44) are consolidated into a single error class, triggering one reminder on the third occurrence.


175-217: LGTM: Metadata helper tests verify merging and privacy.

The tests correctly verify:

  1. mergeMetadata preserves existing tool metadata while adding diagnostics
  2. targetSummary hashes sensitive values without exposing them in the summary string

219-291: LGTM: consumeReminders test verifies injection and idempotence.

The test correctly verifies:

  1. Pending reminders are marked as "injected" with the provided timestamp
  2. Reminder text is generated for the model
  3. Second consumption produces no reminders (idempotent)

📝 Walkthrough

Walkthrough

A new SessionDiagnostics module is introduced to track tool call and error patterns by computing stable input/error fingerprints and generating reminders when repeats are detected. The session processor and prompt handler are updated to integrate this diagnostics system, observing tool calls/errors, consuming pending reminders, and preserving metadata during state transitions.

Changes

Cohort / File(s) Summary
SessionDiagnostics Core
packages/opencode/src/session/diagnostics.ts
New module implementing stable fingerprint computation (SHA-256), metadata merging, tool-call/error observation with repeat tracking, reminder consumption from prior messages, and synthetic reminder text generation.
Integration Points
packages/opencode/src/session/processor.ts, packages/opencode/src/session/prompt.ts
Updated to instrument tool calls/errors via SessionDiagnostics, extract and merge diagnostic metadata into tool parts, and append diagnostic reminder text to message content when repeats are detected.
Test Coverage
packages/opencode/test/session/diagnostics.test.ts, packages/opencode/test/session/prompt-effect.test.ts
Unit tests validating fingerprinting, repeat tracking, metadata utilities, and reminder consumption; integration test confirming loop diagnostics reminders are injected into LLM requests with correct metadata.

Sequence Diagram

sequenceDiagram
    participant PromptLoop as Prompt Loop
    participant SessionProc as SessionProcessor
    participant Diagnostics as SessionDiagnostics
    participant AssistantMsg as Prior Messages
    participant MetadataMgr as Part Metadata

    PromptLoop->>SessionProc: Start/Resume tool call
    SessionProc->>AssistantMsg: Fetch prior assistant messages
    AssistantMsg-->>SessionProc: Return tool part records
    SessionProc->>Diagnostics: observeToolCall(sessionID, tool, input, records)
    Diagnostics->>Diagnostics: Compute input fingerprint<br/>Increment repeat count
    Diagnostics-->>SessionProc: Return observed diagnostics
    SessionProc->>MetadataMgr: Merge diagnostics into part metadata
    
    PromptLoop->>Diagnostics: consumeReminders(messages, parentID)
    Diagnostics->>AssistantMsg: Scan for matching pending reminders
    Diagnostics->>MetadataMgr: Mark reminders as injected
    Diagnostics-->>PromptLoop: Return reminder text + updated messages
    
    PromptLoop->>PromptLoop: Append reminder text to user message
    PromptLoop->>PromptLoop: Persist updated diagnostic parts
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Suggested labels

P2, app

Poem

🐰 A curious loop the rabbit did track,
With fingerprints hashed and reminders brought back,
When tools call and repeat in the same session's dance,
The diagnostics whisper, "Let's try something else, perchance!"
Round and round, yet aware—no more getting stuck,
Just diagnostics and metadata bringing us luck!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: add session loop diagnostics' clearly and concisely summarizes the main change—adding diagnostic functionality for session loop detection.
Description check ✅ Passed The PR description fully covers all required template sections: summary of changes, problem/goal explanation, related issue link, verification steps with test commands, and checklist completion.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/feat-i133-diagnostics

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new SessionDiagnostics module designed to detect and handle 'doom loops' by tracking repeated tool inputs and error fingerprints. It replaces the previous threshold logic in SessionProcessor with a more sophisticated hashing and normalization approach, injecting system reminders when repetitions are detected. Additionally, the PR refactors service dependencies in the session processor and includes new unit and integration tests. I have no feedback to provide as there were no review comments to assess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request harness Model harness, prompts, tool descriptions, and session mechanics P1 High priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Add lightweight loop observation and session diagnostics

1 participant