[Feature] Detect low-yield repeated probing on the same investigation target

## What task are you trying to do?
We want PawWork to recognize when an agent has moved from a legitimate retry or fallback into low-yield repeated probing of the same investigation target, then summarize the blocker and ask the user instead of continuing to spin.

## What do you do today?
The current loop diagnostics added in PR #204 correctly detect repeated identical tool inputs and repeated identical tool error classes, and inject a one-time reminder after the third repeat. That worked for a recent session at the start: three `webfetch` calls hit the same GitHub Pages 404 and the `error_repeat` reminder was injected.

However, the model then escaped the current detector by changing tool family and slightly changing the command each time. In the same session, it switched into a long run of read-only `bash` probes against the same target, for example repeated `curl` plus different `grep` variants against `https://datawhalechina.github.io/`. These calls were technically successful and the command strings differed, so they no longer matched the current repeated-input or repeated-error heuristics, even though user-visible progress had effectively stalled.

## What would a good result look like?
PawWork tracks investigation progress at the target level, not only at the exact command or exact error level. When the agent keeps probing the same page, domain, repo, or resource family with low information gain across multiple turns or tool families, the harness should treat that as suspected stuck behavior.

A good result should:
- distinguish legitimate exploration from low-yield probing on the same target
- survive tool-family changes such as `webfetch` to `bash` when the underlying target is unchanged
- persist the stuck suspicion beyond a single one-shot reminder when behavior does not improve
- trigger a summarize-and-ask path when high-confidence stuck behavior is detected
- avoid forcing early interruption when the agent is still discovering genuinely new targets or evidence

## Which audience does this matter to most?
Both

## Extra context
A recent session showed the current boundary clearly: PR #204 did fire once on repeated `webfetch` 404s, so the problem is not that diagnostics were absent. The gap is that the model then switched to many slightly different `bash` reads on the same URL and continued for a long time until provider quota stopped the run.

This issue should stay practical and product-facing. It is not a request for a complex agent scheduler, broad model-specific patching, or a full harness rewrite. The smallest useful direction is likely some combination of:
- target-level investigation grouping
- low-information-progress signals for read-only probing
- a persistent suspected-stuck state instead of a one-time reminder only
- a stronger summarize-and-ask escalation path once the model has already ignored the earlier warning

## Positive and negative examples
Positive example, should NOT be treated as stuck:
A `GLM-5.1` session asked how `https://datawhalechina.github.io/` was implemented. It saw two direct `webfetch` 404s on the GitHub Pages URL, then quickly pivoted to genuinely new targets and evidence: the GitHub organization page, the candidate Pages repo, the site headers, and finally `https://www.datawhale.cn`. After that it summarized the finding that the GitHub Pages URL was no longer the real site, explained the likely implementation options, asked one clarifying question, and finished normally. This is a good example of legitimate fallback and target expansion after an initial 404.

Negative example, SHOULD be treated as suspected stuck:
A `Kimi K2.6` session on the same user task also hit repeated `webfetch` 404s, which correctly triggered the current PR #204 reminder. But after that reminder it escaped the detector by switching tool family and issuing a large number of read-only `bash` probes against the same underlying target, such as repeated `curl` plus different `grep` variants over the GitHub Pages 404 HTML. The exact command strings changed and many calls technically succeeded, but user-visible progress did not. This is the failure mode we want to catch.

## Acceptance criteria
- The harness can recognize repeated low-yield probing on the same investigation target even when exact commands differ.
- Detection can span more than one tool family when the underlying target is the same.
- The design distinguishes low-yield probing from real progress that introduces new targets or meaningful new evidence.
- The user sees a summarize-and-ask outcome instead of a long silent spin once high-confidence stuck behavior is reached.
- The implementation remains lightweight, local-first, and explainable during session debugging.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Detect low-yield repeated probing on the same investigation target #229

What task are you trying to do?

What do you do today?

What would a good result look like?

Which audience does this matter to most?

Extra context

Positive and negative examples

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature] Detect low-yield repeated probing on the same investigation target #229

Description

What task are you trying to do?

What do you do today?

What would a good result look like?

Which audience does this matter to most?

Extra context

Positive and negative examples

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions