Skip to content

feat: roi and target 支持 anchor#1188

Merged
MistEO merged 6 commits intomainfrom
feat/target_anchor
Mar 6, 2026
Merged

feat: roi and target 支持 anchor#1188
MistEO merged 6 commits intomainfrom
feat/target_anchor

Conversation

@MistEO
Copy link
Member

@MistEO MistEO commented Mar 6, 2026

Summary by Sourcery

在流水线中支持基于锚点(anchor)的 ROI 和动作目标,并提供具备上下文感知的解析方式以及更严格的 ROI/目标校验。

New Features:

  • 允许 ROI 引用锚点,这些锚点会解析为之前已识别的节点。
  • 允许动作目标(点击、滚动、滑动开始)引用锚点,而不再仅限于节点名称或坐标。

Enhancements:

  • 通过上下文来完成 ROI 和动作目标的解析,以支持锚点查找,并在目标或节点缺失时改进错误处理。
  • 在识别和 OCR 批处理流程中更早地规范化并校验 ROI,将缺失或无效的 ROI 视为失败。
  • 优化运行时缓存访问辅助方法,用于从之前的节点和锚点中获取矩形区域。

Documentation:

  • 在英文流水线协议中记录 ROI 和动作目标的锚点用法,包括在所引用结果缺失时的行为说明。

Tests:

  • 扩展流水线测试,以覆盖多种动作场景下 ROI 和动作目标对锚点的引用。
Original summary in English

Summary by Sourcery

Support anchor-based ROI and action targets in the pipeline, with context-aware resolution and stricter ROI/target validation.

New Features:

  • Allow ROIs to reference anchors that resolve to previously recognized nodes.
  • Allow action targets (click, scroll, swipe begin) to reference anchors instead of only node names or coordinates.

Enhancements:

  • Route ROI and action target resolution through context to support anchor lookup and improve error handling when targets or nodes are missing.
  • Normalize and validate ROIs earlier in recognition and OCR batch flows, treating missing or invalid ROIs as failures.
  • Refine runtime cache access helpers for retrieving rectangles from prior nodes and anchors.

Documentation:

  • Document anchor usage for ROI and action targets in the English pipeline protocol, including behavior when referenced results are missing.

Tests:

  • Extend pipeline tests to cover ROI and action target anchor references across multiple action scenarios.

新功能:

  • 允许 ROI 定义引用锚点,这些锚点会解析为之前识别到的节点。
  • 允许动作目标(点击、滚动、滑动开始)引用锚点,而不是直接使用节点名称或坐标。

改进:

  • 扩展目标解析、导出以及内部目标类型,为识别和动作两侧都加入 Anchor 变体。
  • 将任务上下文传入 ActionHelper,以便在运行时解析基于锚点的目标。

文档:

  • 在英文版流水线协议中记录锚点在 ROI 和动作目标中的用法。

测试:

  • 增加流水线测试,在多种动作场景下验证使用锚点引用时 ROI 和目标的行为。
Original summary in English

Summary by Sourcery

在流水线中支持基于锚点(anchor)的 ROI 和动作目标,并提供具备上下文感知的解析方式以及更严格的 ROI/目标校验。

New Features:

  • 允许 ROI 引用锚点,这些锚点会解析为之前已识别的节点。
  • 允许动作目标(点击、滚动、滑动开始)引用锚点,而不再仅限于节点名称或坐标。

Enhancements:

  • 通过上下文来完成 ROI 和动作目标的解析,以支持锚点查找,并在目标或节点缺失时改进错误处理。
  • 在识别和 OCR 批处理流程中更早地规范化并校验 ROI,将缺失或无效的 ROI 视为失败。
  • 优化运行时缓存访问辅助方法,用于从之前的节点和锚点中获取矩形区域。

Documentation:

  • 在英文流水线协议中记录 ROI 和动作目标的锚点用法,包括在所引用结果缺失时的行为说明。

Tests:

  • 扩展流水线测试,以覆盖多种动作场景下 ROI 和动作目标对锚点的引用。
Original summary in English

Summary by Sourcery

Support anchor-based ROI and action targets in the pipeline, with context-aware resolution and stricter ROI/target validation.

New Features:

  • Allow ROIs to reference anchors that resolve to previously recognized nodes.
  • Allow action targets (click, scroll, swipe begin) to reference anchors instead of only node names or coordinates.

Enhancements:

  • Route ROI and action target resolution through context to support anchor lookup and improve error handling when targets or nodes are missing.
  • Normalize and validate ROIs earlier in recognition and OCR batch flows, treating missing or invalid ROIs as failures.
  • Refine runtime cache access helpers for retrieving rectangles from prior nodes and anchors.

Documentation:

  • Document anchor usage for ROI and action targets in the English pipeline protocol, including behavior when referenced results are missing.

Tests:

  • Extend pipeline tests to cover ROI and action target anchor references across multiple action scenarios.

@@ -462,7 +462,7 @@ MaaResourcePostPath(resource, "resource/debug"); // debug 节点使用 rate_lim
- `roi`: *array<int, 4>* | *string*
感兴趣区域(ROI),定义图像识别边界,仅在该区域内进行相关图像处理。可选,默认 [0, 0, 0, 0] ,即全屏。
- *array<int, 4>*: 识别区域坐标 [x, y, w, h]。支持负数 **💡 v5.6**:x/y 负数表示从右/下边缘计算;w/h 为 0 表示延伸至边缘,为负数时取绝对值并将 (x, y) 视为右下角。
- *string*: 填写节点名,在之前执行过的某节点识别到的目标范围内识别。
- *string*: 填写节点名,在之前执行过的某节点识别到的目标范围内识别。也支持 `[Anchor]锚点名` 格式引用锚点对应的节点。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

要不明确下如果节点曾经没执行过/anchor节点为空/不存在下的表现?是全屏还是空来着(

@MistEO MistEO marked this pull request as ready for review March 6, 2026 18:48
Copilot AI review requested due to automatic review settings March 6, 2026 18:48
@MistEO MistEO merged commit e78524c into main Mar 6, 2026
21 checks passed
@MistEO MistEO deleted the feat/target_anchor branch March 6, 2026 18:48
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我发现了 4 个问题,并给出了一些整体性的反馈:

  • ActionHelper::wait_freezes 中对空 box 默认全屏的行为,改成当 correct_roi(box, pre_image) 返回 null 时直接失败,会改变现有行为;建议在做校正之前就先把空 box 扩展为全屏,以保留之前“空 box 代表全屏”的语义。
  • TemplateComparator::analyze 中移除了 ROI 校正,并假定 roi_ 对两张图始终都是有效的,这在 pre_imagecur_image 的捕获尺寸发生变化时,会有越界访问的风险;建议重新引入按图像维度进行的边界检查,或者在文档和调用约束中明确要求:所有调用方必须传入对两张图都已预先校正好的 ROI。
给 AI 代理的提示
Please address the comments from this code review:

## Overall Comments
-`ActionHelper::wait_freezes` 中对空 `box` 默认全屏的行为,改成当 `correct_roi(box, pre_image)` 返回 null 时直接失败,会改变现有行为;建议在做校正之前就先把空 `box` 扩展为全屏,以保留之前“空 box 代表全屏”的语义。
-`TemplateComparator::analyze` 中移除了 ROI 校正,并假定 `roi_` 对两张图始终都是有效的,这在 `pre_image``cur_image` 的捕获尺寸发生变化时,会有越界访问的风险;建议重新引入按图像维度进行的边界检查,或者在文档和调用约束中明确要求:所有调用方必须传入对两张图都已预先校正好的 ROI。

## Individual Comments

### Comment 1
<location path="source/MaaFramework/Task/Component/ActionHelper.cpp" line_range="104-106" />
<code_context>
         break;

     case Target::Type::PreTask: {
-        auto& cache = tasker_->runtime_cache();
-        std::string name = std::get<std::string>(target.param);
-        MaaNodeId node_id = cache.get_latest_node(name).value_or(MaaInvalidId);
-        NodeDetail node_detail = cache.get_node_detail(node_id).value_or(NodeDetail { });
-        RecoResult reco_result = cache.get_reco_result(node_detail.reco_id).value_or(RecoResult { });
-        raw = reco_result.box.value_or(cv::Rect { });
+        const auto& name = std::get<std::string>(target.param);
+        raw = get_rect_from_node(name);
         LogDebug << "pre task" << VAR(name) << VAR(raw);
     } break;
</code_context>
<issue_to_address>
**issue (bug_risk):** 在 get_target_rect 中应用缩放/偏移之前处理从 get_rect_from_node 得到的空矩形。

如果 `get_rect_from_node` 返回的是空矩形(`{0,0,0,0}`),当前的计算仍然会应用 `target.offset`,这样 `get_target_rect` 可能会从一个无效来源生成一个非空矩形。那些依赖 `roi.empty()` 来检测无效目标的调用方(例如 `Actuator::click``swipe``wait_freezes`),就会错误地对这个伪造的矩形执行操作,而不是提前返回。

为了保持这样的约定:`get_target_rect` 返回的非空矩形意味着来源区域是有效的,当 `raw` 为空时应当直接短路返回,例如:

```cpp
case Target::Type::PreTask: {
    const auto& name = std::get<std::string>(target.param);
    raw = get_rect_from_node(name);
    if (raw.empty()) {
        LogWarn << "pre task has no rect" << VAR(name);
        return {};
    }
    LogDebug << "pre task" << VAR(name) << VAR(raw);
} break;
```

`Anchor` 分支也同样需要类似处理。
</issue_to_address>

### Comment 2
<location path="test/python/pipeline_test.py" line_range="967-976" />
<code_context>
+    def _test_roi_target_anchor(self, context: Context):
</code_context>
<issue_to_address>
**suggestion (testing):** 当前锚点(anchor)相关测试只验证解析过程,没有验证在运行时基于锚点的 ROI/target 是否真的生效。

请扩展这个测试(或新增一个测试),让它覆盖运行时的锚点解析路径:

- 使用真实的 API 在 `Context` 中将锚点绑定到节点上,并运行一个小的 pipeline,在 ROI/target 中使用 `[Anchor]...`,从而实际执行识别/动作。
- 断言动作(click/scroll/swipe/wait_freezes 等)所使用的 ROI/点坐标来自锚点解析后的节点 box,而不是硬编码坐标。

这样可以验证基于锚点的 ROI/target 解析的端到端行为,而不仅仅是配置的解析/导出。

建议的实现如下:

```python
    def _test_roi_target_anchor(self, context: Context):
        """
        End-to-end test that verifies ROI/target anchors are actually resolved at runtime
        and that actions receive ROIs/points derived from the bound anchor node's box.
        """
        print("  Testing roi/target anchor reference...")

        # Work on a cloned context so we don't affect other tests
        new_ctx = context.clone()

        #
        # 1. Create a deterministic node box and bind it to an anchor.
        #
        # These numbers are arbitrary but should be obviously non-zero and non-symmetric
        # so it's easy to spot if anything degenerates to (0, 0, 0, 0) or similar.
        anchor_box = (13, 27, 111, 59)  # (x, y, w, h)

        # Create a node in the context that represents this box. The exact API may differ
        # in your code base; this assumes a minimal "add_node" + "bind_anchor" style API.
        anchor_node = new_ctx.add_node(
            name="AnchorNodeForRoiTargetTest",
            box=anchor_box,
        )
        new_ctx.bind_anchor("MyRoiAnchor", anchor_node)

        #
        # 2. Patch the click (or generic) action to record the ROI and point it receives.
        #
        recorded_rois = []
        recorded_points = []

        # Depending on your implementation, this may be something like:
        #   new_ctx.actions["Click"]
        #   new_ctx.action_registry["Click"]
        # or similar. Adjust the lookup if needed.
        original_click_cls = new_ctx.actions["Click"]

        class RecordingClick(original_click_cls):  # type: ignore[misc]
            def __call__(self, *args, **kwargs):
                # Different action implementations pass ROI/point differently.
                # Common patterns:
                #   __call__(self, roi, point, *args, **kwargs)
                #   __call__(self, point, *args, **kwargs)  with self.roi
                roi = getattr(self, "roi", None)
                point = getattr(self, "point", None)

                # Prefer explicit kwargs/args if present.
                if "roi" in kwargs:
                    roi = kwargs["roi"]
                if "point" in kwargs:
                    point = kwargs["point"]
                elif "pos" in kwargs:
                    # Some APIs use "pos" instead of "point".
                    point = kwargs["pos"]

                # Try positional fallbacks if we still don't have anything.
                if roi is None and args:
                    roi = args[0]
                if point is None and len(args) > 1:
                    point = args[1]

                recorded_rois.append(roi)
                recorded_points.append(point)
                return super().__call__(*args, **kwargs)

        new_ctx.actions["Click"] = RecordingClick

        #
        # 3. Define a small pipeline that uses the anchor in both ROI and target.
        #
        new_ctx.override_pipeline(
            {
                "RoiAnchorTest": {
                    "recognition": "TemplateMatch",
                    "template": ["test.png"],
                    # ROI is resolved from the anchor
                    "roi": "[Anchor]MyRoiAnchor",
                    # Target also uses the anchor so the click point should come from
                    # the same box. The exact schema may differ ("target", "pos", etc.).
                    "actions": [
                        {
                            "type": "Click",
                            "target": "[Anchor]MyRoiAnchor",
                        }
                    ],
                }
            }
        )

        #
        # 4. Run the pipeline so we exercise the runtime anchor resolution path.
        #
        # Depending on your framework, this could be:
        #   new_ctx.run("RoiAnchorTest")
        #   new_ctx.run_pipeline("RoiAnchorTest")
        # or something similar.
        new_ctx.run_pipeline("RoiAnchorTest")

        #
        # 5. Assert that ROI/point were derived from the anchor-resolved node box.
        #
        assert recorded_rois, "Anchor-based pipeline should have produced at least one action ROI"
        assert recorded_points, "Anchor-based pipeline should have produced at least one action point"

        # If ROI is represented as a box tuple, we can compare directly.
        roi = recorded_rois[0]
        if isinstance(roi, tuple) and len(roi) == 4:
            assert (
                roi == anchor_box
            ), f"Action ROI {roi} should match anchor node box {anchor_box}"
        else:
            # If ROI is a richer object, fall back to its box/rect/region attribute.
            roi_box = None
            for attr in ("box", "rect", "region", "bbox"):
                if hasattr(roi, attr):
                    roi_box = getattr(roi, attr)
                    break

            assert (
                roi_box is not None
            ), "Unable to determine ROI box for anchor ROI; please expose an attribute like 'box' or 'rect'"

            assert (
                tuple(roi_box) == anchor_box
            ), f"Action ROI box {roi_box} should match anchor node box {anchor_box}"

        point = recorded_points[0]
        # Normalize point if it's a richer object
        if not isinstance(point, tuple):
            for attr in ("point", "pos", "center"):
                if hasattr(point, attr):
                    point = getattr(point, attr)
                    break

        assert (
            isinstance(point, tuple) and len(point) == 2
        ), f"Expected point to be a 2-tuple, got {point!r}"

        px, py = point
        ax, ay, aw, ah = anchor_box
        assert ax <= px <= ax + aw and ay <= py <= ay + ah, (
            f"Click point {point} should lie inside anchor box {anchor_box}, "
            "indicating it was derived from the anchor-resolved node ROI"
        )

        print("    PASS: roi/target anchor runtime resolution")

```

下面这些改动假设了一些你那边可能不同的具体 API,需要你根据现有代码做适配:

1. 锚点绑定:
   - 根据你当前在 `Context` 中创建节点和绑定锚点的方式,调整 `new_ctx.add_node(...)``new_ctx.bind_anchor("MyRoiAnchor", anchor_node)`- 如果你使用的是不同的 box 结构(例如 `Box(x, y, w, h)` 对象),需要相应调整 `anchor_box` 和比较逻辑。

2. 动作注册:
   -`new_ctx.actions["Click"]` 更新为你上下文中实际使用的注册表访问方式(例如 `new_ctx.action_registry["Click"]``new_ctx._actions["Click"]` 等)。
   - 确保基础的 click 动作是类(而不是普通函数),以便可以继承;如果是函数,则需要用可调用对象进行包装,而不是继承。

3. Pipeline 执行:
   - 用你框架中执行指定 pipeline 的正确方法替换 `new_ctx.run_pipeline("RoiAnchorTest")`(例如 `new_ctx.run("RoiAnchorTest")`)。

4. 动作调用签名:
   - `RecordingClick.__call__` 尝试以尽量通用的方式提取 `roi``point`。如果你的 click 动作使用更固定的签名(例如 `__call__(self, point)`,且 `self.roi` 已经解析完毕),可以简化该方法,使其严格贴合你的真实签名,并准确捕获运行时的 ROI/点位。

5. ROI/点表示:
   - 如果你的 ROI 或点类型是自定义类,请根据实际情况更新归一化逻辑中使用的属性名(例如,若需要用 `roi.rect.to_tuple()``point.x``point.y`)。
   - 如果你已经有辅助函数(例如 `roi_to_box(roi)`),优先使用这些辅助函数,而不是手工检查属性,以保持与代码库其他部分的一致性。

在保持上述端到端锚点解析结构不变的前提下,把这些调整融入到你的真实 API 中。
</issue_to_address>

### Comment 3
<location path="docs/en_us/3.1-PipelineProtocol.md" line_range="459" />
<code_context>
     Region of Interest (ROI), defining the image recognition boundary; related image processing is performed only within this area. Optional, default [0, 0, 0, 0], i.e. full screen.
   - *array<int, 4>*: Recognition area coordinates [x, y, w, h]. Supports negative values **💡 v5.6**: negative x/y means calculating from right/bottom edge; w/h of 0 means extending to edge, negative means taking absolute value and treating (x, y) as bottom-right corner.
-  - *string*: Fill in the node name, and identify within the target range identified by a previously executed node.
+  - *string*: Fill in the node name, and identify within the target range identified by a previously executed node. Also supports `[Anchor]AnchorName` format to reference the node corresponding to an anchor **💡 v5.9**. If the referenced pre-task or anchor has no recognition result, the recognition is treated as failed.  

 - `roi_offset`: *array<int, 4>*  
</code_context>
<issue_to_address>
**nitpick (typo):** 建议重新措辞以避免这句话中重复使用 “identify”。

短语 "identify within the target range identified by a previously executed node" 出现了重复用词。可以考虑使用不同动词(特别是第二个位置)来提升可读性。

```suggestion
  - *string*: Fill in the node name to perform recognition within the target range produced by a previously executed node. Also supports `[Anchor]AnchorName` format to reference the node corresponding to an anchor **💡 v5.9**. If the referenced pre-task or anchor has no recognition result, the recognition is treated as failed.  
```
</issue_to_address>

### Comment 4
<location path="docs/en_us/3.1-PipelineProtocol.md" line_range="1082" />
<code_context>
     Position of the scroll target. The mouse will first move to this position before scrolling. Optional, default is `true`.  
   - *true*: Target is the position just recognized in this node (i.e., itself).  
-  - *string*: Fill in a node name; the target is the position recognized by a previously executed node.  
+  - *string*: Fill in a node name; the target is the position recognized by a previously executed node. Also supports `[Anchor]AnchorName` format to reference the node corresponding to an anchor **💡 v5.9**.  
   - *array<int, 2>*: Fixed coordinate point `[x, y]`.  
   - *array<int, 4>*: Fixed coordinate area `[x, y, w, h]`, a random point will be selected within the rectangle (with higher probability towards the center and lower probability at the edges). Use [0, 0, 0, 0] for full screen.
</code_context>
<issue_to_address>
**question:** 请明确当引用的前置任务或锚点没有识别结果时,滚动目标的行为。

ROI 和点击目标的文档已经明确说明:当被引用的节点没有识别结果时,会将其视为识别/动作失败。对于滚动(scroll),能否也在文档中写清楚等效的行为(或者特别说明与前者不同的地方),以保持行为与预期的一致性?
</issue_to_address>

Sourcery 对开源项目免费——如果你觉得我们的评审有帮助,欢迎分享 ✨
帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈改进后续的评审。
Original comment in English

Hey - I've found 4 issues, and left some high level feedback:

  • The change in ActionHelper::wait_freezes from defaulting an empty box to full-screen, to now failing when correct_roi(box, pre_image) returns null, alters existing behavior; consider preserving the previous "empty box means full-screen" semantics by expanding an empty box before correction.
  • By removing the ROI correction in TemplateComparator::analyze and assuming roi_ is always valid for both images, you risk out-of-bounds access if capture dimensions change between pre_image and cur_image; consider reintroducing per-image bounds checking or documenting and enforcing that all callers must pass pre-corrected ROIs valid for both images.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The change in `ActionHelper::wait_freezes` from defaulting an empty `box` to full-screen, to now failing when `correct_roi(box, pre_image)` returns null, alters existing behavior; consider preserving the previous "empty box means full-screen" semantics by expanding an empty `box` before correction.
- By removing the ROI correction in `TemplateComparator::analyze` and assuming `roi_` is always valid for both images, you risk out-of-bounds access if capture dimensions change between `pre_image` and `cur_image`; consider reintroducing per-image bounds checking or documenting and enforcing that all callers must pass pre-corrected ROIs valid for both images.

## Individual Comments

### Comment 1
<location path="source/MaaFramework/Task/Component/ActionHelper.cpp" line_range="104-106" />
<code_context>
         break;

     case Target::Type::PreTask: {
-        auto& cache = tasker_->runtime_cache();
-        std::string name = std::get<std::string>(target.param);
-        MaaNodeId node_id = cache.get_latest_node(name).value_or(MaaInvalidId);
-        NodeDetail node_detail = cache.get_node_detail(node_id).value_or(NodeDetail { });
-        RecoResult reco_result = cache.get_reco_result(node_detail.reco_id).value_or(RecoResult { });
-        raw = reco_result.box.value_or(cv::Rect { });
+        const auto& name = std::get<std::string>(target.param);
+        raw = get_rect_from_node(name);
         LogDebug << "pre task" << VAR(name) << VAR(raw);
     } break;
</code_context>
<issue_to_address>
**issue (bug_risk):** Handle empty rects from get_rect_from_node before applying scaling/offsets in get_target_rect.

If `get_rect_from_node` returns an empty rect (`{0,0,0,0}`), the current math still applies `target.offset`, so `get_target_rect` can produce a non-empty rect from an invalid source. Callers that rely on `roi.empty()` to detect invalid targets (e.g., `Actuator::click`, `swipe`, `wait_freezes`) will then incorrectly act on this bogus rect instead of early-returning.

To preserve the contract that a non-empty rect from `get_target_rect` implies a valid source region, short-circuit when `raw` is empty, e.g.:

```cpp
case Target::Type::PreTask: {
    const auto& name = std::get<std::string>(target.param);
    raw = get_rect_from_node(name);
    if (raw.empty()) {
        LogWarn << "pre task has no rect" << VAR(name);
        return {};
    }
    LogDebug << "pre task" << VAR(name) << VAR(raw);
} break;
```

and similarly for the `Anchor` branch.
</issue_to_address>

### Comment 2
<location path="test/python/pipeline_test.py" line_range="967-976" />
<code_context>
+    def _test_roi_target_anchor(self, context: Context):
</code_context>
<issue_to_address>
**suggestion (testing):** Anchor tests only verify parsing, not that ROI/targets using anchors actually work at runtime

Please extend this test (or add a new one) so it also exercises the runtime anchor resolution path:

- Bind anchors to nodes in the `Context` using the real API and run a small pipeline that uses `[Anchor]...` in ROI/targets so recognitions/actions actually execute.
- Assert that the ROIs/points used by actions (click/scroll/swipe/wait_freezes, etc.) come from the anchor-resolved node boxes rather than hardcoded coordinates.

That will validate the end-to-end behavior of anchor-based ROI/target resolution, not just the config parsing/dumping.

Suggested implementation:

```python
    def _test_roi_target_anchor(self, context: Context):
        """
        End-to-end test that verifies ROI/target anchors are actually resolved at runtime
        and that actions receive ROIs/points derived from the bound anchor node's box.
        """
        print("  Testing roi/target anchor reference...")

        # Work on a cloned context so we don't affect other tests
        new_ctx = context.clone()

        #
        # 1. Create a deterministic node box and bind it to an anchor.
        #
        # These numbers are arbitrary but should be obviously non-zero and non-symmetric
        # so it's easy to spot if anything degenerates to (0, 0, 0, 0) or similar.
        anchor_box = (13, 27, 111, 59)  # (x, y, w, h)

        # Create a node in the context that represents this box. The exact API may differ
        # in your code base; this assumes a minimal "add_node" + "bind_anchor" style API.
        anchor_node = new_ctx.add_node(
            name="AnchorNodeForRoiTargetTest",
            box=anchor_box,
        )
        new_ctx.bind_anchor("MyRoiAnchor", anchor_node)

        #
        # 2. Patch the click (or generic) action to record the ROI and point it receives.
        #
        recorded_rois = []
        recorded_points = []

        # Depending on your implementation, this may be something like:
        #   new_ctx.actions["Click"]
        #   new_ctx.action_registry["Click"]
        # or similar. Adjust the lookup if needed.
        original_click_cls = new_ctx.actions["Click"]

        class RecordingClick(original_click_cls):  # type: ignore[misc]
            def __call__(self, *args, **kwargs):
                # Different action implementations pass ROI/point differently.
                # Common patterns:
                #   __call__(self, roi, point, *args, **kwargs)
                #   __call__(self, point, *args, **kwargs)  with self.roi
                roi = getattr(self, "roi", None)
                point = getattr(self, "point", None)

                # Prefer explicit kwargs/args if present.
                if "roi" in kwargs:
                    roi = kwargs["roi"]
                if "point" in kwargs:
                    point = kwargs["point"]
                elif "pos" in kwargs:
                    # Some APIs use "pos" instead of "point".
                    point = kwargs["pos"]

                # Try positional fallbacks if we still don't have anything.
                if roi is None and args:
                    roi = args[0]
                if point is None and len(args) > 1:
                    point = args[1]

                recorded_rois.append(roi)
                recorded_points.append(point)
                return super().__call__(*args, **kwargs)

        new_ctx.actions["Click"] = RecordingClick

        #
        # 3. Define a small pipeline that uses the anchor in both ROI and target.
        #
        new_ctx.override_pipeline(
            {
                "RoiAnchorTest": {
                    "recognition": "TemplateMatch",
                    "template": ["test.png"],
                    # ROI is resolved from the anchor
                    "roi": "[Anchor]MyRoiAnchor",
                    # Target also uses the anchor so the click point should come from
                    # the same box. The exact schema may differ ("target", "pos", etc.).
                    "actions": [
                        {
                            "type": "Click",
                            "target": "[Anchor]MyRoiAnchor",
                        }
                    ],
                }
            }
        )

        #
        # 4. Run the pipeline so we exercise the runtime anchor resolution path.
        #
        # Depending on your framework, this could be:
        #   new_ctx.run("RoiAnchorTest")
        #   new_ctx.run_pipeline("RoiAnchorTest")
        # or something similar.
        new_ctx.run_pipeline("RoiAnchorTest")

        #
        # 5. Assert that ROI/point were derived from the anchor-resolved node box.
        #
        assert recorded_rois, "Anchor-based pipeline should have produced at least one action ROI"
        assert recorded_points, "Anchor-based pipeline should have produced at least one action point"

        # If ROI is represented as a box tuple, we can compare directly.
        roi = recorded_rois[0]
        if isinstance(roi, tuple) and len(roi) == 4:
            assert (
                roi == anchor_box
            ), f"Action ROI {roi} should match anchor node box {anchor_box}"
        else:
            # If ROI is a richer object, fall back to its box/rect/region attribute.
            roi_box = None
            for attr in ("box", "rect", "region", "bbox"):
                if hasattr(roi, attr):
                    roi_box = getattr(roi, attr)
                    break

            assert (
                roi_box is not None
            ), "Unable to determine ROI box for anchor ROI; please expose an attribute like 'box' or 'rect'"

            assert (
                tuple(roi_box) == anchor_box
            ), f"Action ROI box {roi_box} should match anchor node box {anchor_box}"

        point = recorded_points[0]
        # Normalize point if it's a richer object
        if not isinstance(point, tuple):
            for attr in ("point", "pos", "center"):
                if hasattr(point, attr):
                    point = getattr(point, attr)
                    break

        assert (
            isinstance(point, tuple) and len(point) == 2
        ), f"Expected point to be a 2-tuple, got {point!r}"

        px, py = point
        ax, ay, aw, ah = anchor_box
        assert ax <= px <= ax + aw and ay <= py <= ay + ah, (
            f"Click point {point} should lie inside anchor box {anchor_box}, "
            "indicating it was derived from the anchor-resolved node ROI"
        )

        print("    PASS: roi/target anchor runtime resolution")

```

The edited code assumes several concrete APIs that you may need to align with your existing code:

1. Anchor binding:
   - Adjust `new_ctx.add_node(...)` and `new_ctx.bind_anchor("MyRoiAnchor", anchor_node)` to match however you currently create nodes and bind anchors in `Context`.
   - If you use a different shape for boxes (e.g., `Box(x, y, w, h)` object), adapt `anchor_box` and the comparison logic accordingly.

2. Action registration:
   - Update `new_ctx.actions["Click"]` to the correct registry lookup in your context (e.g. `new_ctx.action_registry["Click"]`, `new_ctx._actions["Click"]`, etc.).
   - Ensure the base click action is class-like and can be subclassed; if it is a function, wrap it in a callable object instead of subclassing.

3. Pipeline execution:
   - Replace `new_ctx.run_pipeline("RoiAnchorTest")` with the correct method your framework uses to execute a named pipeline (e.g. `new_ctx.run("RoiAnchorTest")`).

4. Action call signature:
   - The `RecordingClick.__call__` method tries to be generic in how it extracts `roi` and `point`. If your click action uses a specific signature (e.g. `__call__(self, point)` with `self.roi` already resolved), simplify this method to match your actual signature and capture exactly the runtime ROI/point values.

5. ROI/point representation:
   - If your ROI or point types are custom classes, update the attribute names in the normalization logic (e.g., if you use `roi.rect.to_tuple()` or `point.x`, `point.y`).
   - If you have helper functions (e.g. `roi_to_box(roi)`), prefer using those instead of manual attribute inspection, to stay consistent with the rest of the codebase.

Integrate these adjustments so the test uses your real APIs while keeping the structure of this end-to-end anchor resolution check intact.
</issue_to_address>

### Comment 3
<location path="docs/en_us/3.1-PipelineProtocol.md" line_range="459" />
<code_context>
     Region of Interest (ROI), defining the image recognition boundary; related image processing is performed only within this area. Optional, default [0, 0, 0, 0], i.e. full screen.
   - *array<int, 4>*: Recognition area coordinates [x, y, w, h]. Supports negative values **💡 v5.6**: negative x/y means calculating from right/bottom edge; w/h of 0 means extending to edge, negative means taking absolute value and treating (x, y) as bottom-right corner.
-  - *string*: Fill in the node name, and identify within the target range identified by a previously executed node.
+  - *string*: Fill in the node name, and identify within the target range identified by a previously executed node. Also supports `[Anchor]AnchorName` format to reference the node corresponding to an anchor **💡 v5.9**. If the referenced pre-task or anchor has no recognition result, the recognition is treated as failed.  

 - `roi_offset`: *array<int, 4>*  
</code_context>
<issue_to_address>
**nitpick (typo):** Consider rephrasing to avoid the repeated use of "identify" in this sentence.

The phrase "identify within the target range identified by a previously executed node" is repetitive. Consider alternative wording (e.g., using a different verb the second time) to improve readability.

```suggestion
  - *string*: Fill in the node name to perform recognition within the target range produced by a previously executed node. Also supports `[Anchor]AnchorName` format to reference the node corresponding to an anchor **💡 v5.9**. If the referenced pre-task or anchor has no recognition result, the recognition is treated as failed.  
```
</issue_to_address>

### Comment 4
<location path="docs/en_us/3.1-PipelineProtocol.md" line_range="1082" />
<code_context>
     Position of the scroll target. The mouse will first move to this position before scrolling. Optional, default is `true`.  
   - *true*: Target is the position just recognized in this node (i.e., itself).  
-  - *string*: Fill in a node name; the target is the position recognized by a previously executed node.  
+  - *string*: Fill in a node name; the target is the position recognized by a previously executed node. Also supports `[Anchor]AnchorName` format to reference the node corresponding to an anchor **💡 v5.9**.  
   - *array<int, 2>*: Fixed coordinate point `[x, y]`.  
   - *array<int, 4>*: Fixed coordinate area `[x, y, w, h]`, a random point will be selected within the rectangle (with higher probability towards the center and lower probability at the edges). Use [0, 0, 0, 0] for full screen.
</code_context>
<issue_to_address>
**question:** Clarify behavior when the referenced pre-task or anchor has no recognition result for scroll targets.

The ROI and click target docs explicitly state what happens when the referenced node has no recognition result (treating it as a failed recognition/action). For scroll, could you document the equivalent behavior (or call out how it differs) to keep behavior and expectations consistent?
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 104 to +106
case Target::Type::PreTask: {
auto& cache = tasker_->runtime_cache();
std::string name = std::get<std::string>(target.param);
MaaNodeId node_id = cache.get_latest_node(name).value_or(MaaInvalidId);
NodeDetail node_detail = cache.get_node_detail(node_id).value_or(NodeDetail { });
RecoResult reco_result = cache.get_reco_result(node_detail.reco_id).value_or(RecoResult { });
raw = reco_result.box.value_or(cv::Rect { });
const auto& name = std::get<std::string>(target.param);
raw = get_rect_from_node(name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): 在 get_target_rect 中应用缩放/偏移之前处理从 get_rect_from_node 得到的空矩形。

如果 get_rect_from_node 返回的是空矩形({0,0,0,0}),当前的计算仍然会应用 target.offset,这样 get_target_rect 可能会从一个无效来源生成一个非空矩形。那些依赖 roi.empty() 来检测无效目标的调用方(例如 Actuator::clickswipewait_freezes),就会错误地对这个伪造的矩形执行操作,而不是提前返回。

为了保持这样的约定:get_target_rect 返回的非空矩形意味着来源区域是有效的,当 raw 为空时应当直接短路返回,例如:

case Target::Type::PreTask: {
    const auto& name = std::get<std::string>(target.param);
    raw = get_rect_from_node(name);
    if (raw.empty()) {
        LogWarn << "pre task has no rect" << VAR(name);
        return {};
    }
    LogDebug << "pre task" << VAR(name) << VAR(raw);
} break;

Anchor 分支也同样需要类似处理。

Original comment in English

issue (bug_risk): Handle empty rects from get_rect_from_node before applying scaling/offsets in get_target_rect.

If get_rect_from_node returns an empty rect ({0,0,0,0}), the current math still applies target.offset, so get_target_rect can produce a non-empty rect from an invalid source. Callers that rely on roi.empty() to detect invalid targets (e.g., Actuator::click, swipe, wait_freezes) will then incorrectly act on this bogus rect instead of early-returning.

To preserve the contract that a non-empty rect from get_target_rect implies a valid source region, short-circuit when raw is empty, e.g.:

case Target::Type::PreTask: {
    const auto& name = std::get<std::string>(target.param);
    raw = get_rect_from_node(name);
    if (raw.empty()) {
        LogWarn << "pre task has no rect" << VAR(name);
        return {};
    }
    LogDebug << "pre task" << VAR(name) << VAR(raw);
} break;

and similarly for the Anchor branch.

Comment on lines +967 to +976
def _test_roi_target_anchor(self, context: Context):
print(" Testing roi/target anchor reference...")

new_ctx = context.clone()

new_ctx.override_pipeline(
{
"RoiAnchorTest": {
"recognition": "TemplateMatch",
"template": ["test.png"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): 当前锚点(anchor)相关测试只验证解析过程,没有验证在运行时基于锚点的 ROI/target 是否真的生效。

请扩展这个测试(或新增一个测试),让它覆盖运行时的锚点解析路径:

  • 使用真实的 API 在 Context 中将锚点绑定到节点上,并运行一个小的 pipeline,在 ROI/target 中使用 [Anchor]...,从而实际执行识别/动作。
  • 断言动作(click/scroll/swipe/wait_freezes 等)所使用的 ROI/点坐标来自锚点解析后的节点 box,而不是硬编码坐标。

这样可以验证基于锚点的 ROI/target 解析的端到端行为,而不仅仅是配置的解析/导出。

建议的实现如下:

    def _test_roi_target_anchor(self, context: Context):
        """
        End-to-end test that verifies ROI/target anchors are actually resolved at runtime
        and that actions receive ROIs/points derived from the bound anchor node's box.
        """
        print("  Testing roi/target anchor reference...")

        # Work on a cloned context so we don't affect other tests
        new_ctx = context.clone()

        #
        # 1. Create a deterministic node box and bind it to an anchor.
        #
        # These numbers are arbitrary but should be obviously non-zero and non-symmetric
        # so it's easy to spot if anything degenerates to (0, 0, 0, 0) or similar.
        anchor_box = (13, 27, 111, 59)  # (x, y, w, h)

        # Create a node in the context that represents this box. The exact API may differ
        # in your code base; this assumes a minimal "add_node" + "bind_anchor" style API.
        anchor_node = new_ctx.add_node(
            name="AnchorNodeForRoiTargetTest",
            box=anchor_box,
        )
        new_ctx.bind_anchor("MyRoiAnchor", anchor_node)

        #
        # 2. Patch the click (or generic) action to record the ROI and point it receives.
        #
        recorded_rois = []
        recorded_points = []

        # Depending on your implementation, this may be something like:
        #   new_ctx.actions["Click"]
        #   new_ctx.action_registry["Click"]
        # or similar. Adjust the lookup if needed.
        original_click_cls = new_ctx.actions["Click"]

        class RecordingClick(original_click_cls):  # type: ignore[misc]
            def __call__(self, *args, **kwargs):
                # Different action implementations pass ROI/point differently.
                # Common patterns:
                #   __call__(self, roi, point, *args, **kwargs)
                #   __call__(self, point, *args, **kwargs)  with self.roi
                roi = getattr(self, "roi", None)
                point = getattr(self, "point", None)

                # Prefer explicit kwargs/args if present.
                if "roi" in kwargs:
                    roi = kwargs["roi"]
                if "point" in kwargs:
                    point = kwargs["point"]
                elif "pos" in kwargs:
                    # Some APIs use "pos" instead of "point".
                    point = kwargs["pos"]

                # Try positional fallbacks if we still don't have anything.
                if roi is None and args:
                    roi = args[0]
                if point is None and len(args) > 1:
                    point = args[1]

                recorded_rois.append(roi)
                recorded_points.append(point)
                return super().__call__(*args, **kwargs)

        new_ctx.actions["Click"] = RecordingClick

        #
        # 3. Define a small pipeline that uses the anchor in both ROI and target.
        #
        new_ctx.override_pipeline(
            {
                "RoiAnchorTest": {
                    "recognition": "TemplateMatch",
                    "template": ["test.png"],
                    # ROI is resolved from the anchor
                    "roi": "[Anchor]MyRoiAnchor",
                    # Target also uses the anchor so the click point should come from
                    # the same box. The exact schema may differ ("target", "pos", etc.).
                    "actions": [
                        {
                            "type": "Click",
                            "target": "[Anchor]MyRoiAnchor",
                        }
                    ],
                }
            }
        )

        #
        # 4. Run the pipeline so we exercise the runtime anchor resolution path.
        #
        # Depending on your framework, this could be:
        #   new_ctx.run("RoiAnchorTest")
        #   new_ctx.run_pipeline("RoiAnchorTest")
        # or something similar.
        new_ctx.run_pipeline("RoiAnchorTest")

        #
        # 5. Assert that ROI/point were derived from the anchor-resolved node box.
        #
        assert recorded_rois, "Anchor-based pipeline should have produced at least one action ROI"
        assert recorded_points, "Anchor-based pipeline should have produced at least one action point"

        # If ROI is represented as a box tuple, we can compare directly.
        roi = recorded_rois[0]
        if isinstance(roi, tuple) and len(roi) == 4:
            assert (
                roi == anchor_box
            ), f"Action ROI {roi} should match anchor node box {anchor_box}"
        else:
            # If ROI is a richer object, fall back to its box/rect/region attribute.
            roi_box = None
            for attr in ("box", "rect", "region", "bbox"):
                if hasattr(roi, attr):
                    roi_box = getattr(roi, attr)
                    break

            assert (
                roi_box is not None
            ), "Unable to determine ROI box for anchor ROI; please expose an attribute like 'box' or 'rect'"

            assert (
                tuple(roi_box) == anchor_box
            ), f"Action ROI box {roi_box} should match anchor node box {anchor_box}"

        point = recorded_points[0]
        # Normalize point if it's a richer object
        if not isinstance(point, tuple):
            for attr in ("point", "pos", "center"):
                if hasattr(point, attr):
                    point = getattr(point, attr)
                    break

        assert (
            isinstance(point, tuple) and len(point) == 2
        ), f"Expected point to be a 2-tuple, got {point!r}"

        px, py = point
        ax, ay, aw, ah = anchor_box
        assert ax <= px <= ax + aw and ay <= py <= ay + ah, (
            f"Click point {point} should lie inside anchor box {anchor_box}, "
            "indicating it was derived from the anchor-resolved node ROI"
        )

        print("    PASS: roi/target anchor runtime resolution")

下面这些改动假设了一些你那边可能不同的具体 API,需要你根据现有代码做适配:

  1. 锚点绑定:

    • 根据你当前在 Context 中创建节点和绑定锚点的方式,调整 new_ctx.add_node(...)new_ctx.bind_anchor("MyRoiAnchor", anchor_node)
    • 如果你使用的是不同的 box 结构(例如 Box(x, y, w, h) 对象),需要相应调整 anchor_box 和比较逻辑。
  2. 动作注册:

    • new_ctx.actions["Click"] 更新为你上下文中实际使用的注册表访问方式(例如 new_ctx.action_registry["Click"]new_ctx._actions["Click"] 等)。
    • 确保基础的 click 动作是类(而不是普通函数),以便可以继承;如果是函数,则需要用可调用对象进行包装,而不是继承。
  3. Pipeline 执行:

    • 用你框架中执行指定 pipeline 的正确方法替换 new_ctx.run_pipeline("RoiAnchorTest")(例如 new_ctx.run("RoiAnchorTest"))。
  4. 动作调用签名:

    • RecordingClick.__call__ 尝试以尽量通用的方式提取 roipoint。如果你的 click 动作使用更固定的签名(例如 __call__(self, point),且 self.roi 已经解析完毕),可以简化该方法,使其严格贴合你的真实签名,并准确捕获运行时的 ROI/点位。
  5. ROI/点表示:

    • 如果你的 ROI 或点类型是自定义类,请根据实际情况更新归一化逻辑中使用的属性名(例如,若需要用 roi.rect.to_tuple()point.xpoint.y)。
    • 如果你已经有辅助函数(例如 roi_to_box(roi)),优先使用这些辅助函数,而不是手工检查属性,以保持与代码库其他部分的一致性。

在保持上述端到端锚点解析结构不变的前提下,把这些调整融入到你的真实 API 中。

Original comment in English

suggestion (testing): Anchor tests only verify parsing, not that ROI/targets using anchors actually work at runtime

Please extend this test (or add a new one) so it also exercises the runtime anchor resolution path:

  • Bind anchors to nodes in the Context using the real API and run a small pipeline that uses [Anchor]... in ROI/targets so recognitions/actions actually execute.
  • Assert that the ROIs/points used by actions (click/scroll/swipe/wait_freezes, etc.) come from the anchor-resolved node boxes rather than hardcoded coordinates.

That will validate the end-to-end behavior of anchor-based ROI/target resolution, not just the config parsing/dumping.

Suggested implementation:

    def _test_roi_target_anchor(self, context: Context):
        """
        End-to-end test that verifies ROI/target anchors are actually resolved at runtime
        and that actions receive ROIs/points derived from the bound anchor node's box.
        """
        print("  Testing roi/target anchor reference...")

        # Work on a cloned context so we don't affect other tests
        new_ctx = context.clone()

        #
        # 1. Create a deterministic node box and bind it to an anchor.
        #
        # These numbers are arbitrary but should be obviously non-zero and non-symmetric
        # so it's easy to spot if anything degenerates to (0, 0, 0, 0) or similar.
        anchor_box = (13, 27, 111, 59)  # (x, y, w, h)

        # Create a node in the context that represents this box. The exact API may differ
        # in your code base; this assumes a minimal "add_node" + "bind_anchor" style API.
        anchor_node = new_ctx.add_node(
            name="AnchorNodeForRoiTargetTest",
            box=anchor_box,
        )
        new_ctx.bind_anchor("MyRoiAnchor", anchor_node)

        #
        # 2. Patch the click (or generic) action to record the ROI and point it receives.
        #
        recorded_rois = []
        recorded_points = []

        # Depending on your implementation, this may be something like:
        #   new_ctx.actions["Click"]
        #   new_ctx.action_registry["Click"]
        # or similar. Adjust the lookup if needed.
        original_click_cls = new_ctx.actions["Click"]

        class RecordingClick(original_click_cls):  # type: ignore[misc]
            def __call__(self, *args, **kwargs):
                # Different action implementations pass ROI/point differently.
                # Common patterns:
                #   __call__(self, roi, point, *args, **kwargs)
                #   __call__(self, point, *args, **kwargs)  with self.roi
                roi = getattr(self, "roi", None)
                point = getattr(self, "point", None)

                # Prefer explicit kwargs/args if present.
                if "roi" in kwargs:
                    roi = kwargs["roi"]
                if "point" in kwargs:
                    point = kwargs["point"]
                elif "pos" in kwargs:
                    # Some APIs use "pos" instead of "point".
                    point = kwargs["pos"]

                # Try positional fallbacks if we still don't have anything.
                if roi is None and args:
                    roi = args[0]
                if point is None and len(args) > 1:
                    point = args[1]

                recorded_rois.append(roi)
                recorded_points.append(point)
                return super().__call__(*args, **kwargs)

        new_ctx.actions["Click"] = RecordingClick

        #
        # 3. Define a small pipeline that uses the anchor in both ROI and target.
        #
        new_ctx.override_pipeline(
            {
                "RoiAnchorTest": {
                    "recognition": "TemplateMatch",
                    "template": ["test.png"],
                    # ROI is resolved from the anchor
                    "roi": "[Anchor]MyRoiAnchor",
                    # Target also uses the anchor so the click point should come from
                    # the same box. The exact schema may differ ("target", "pos", etc.).
                    "actions": [
                        {
                            "type": "Click",
                            "target": "[Anchor]MyRoiAnchor",
                        }
                    ],
                }
            }
        )

        #
        # 4. Run the pipeline so we exercise the runtime anchor resolution path.
        #
        # Depending on your framework, this could be:
        #   new_ctx.run("RoiAnchorTest")
        #   new_ctx.run_pipeline("RoiAnchorTest")
        # or something similar.
        new_ctx.run_pipeline("RoiAnchorTest")

        #
        # 5. Assert that ROI/point were derived from the anchor-resolved node box.
        #
        assert recorded_rois, "Anchor-based pipeline should have produced at least one action ROI"
        assert recorded_points, "Anchor-based pipeline should have produced at least one action point"

        # If ROI is represented as a box tuple, we can compare directly.
        roi = recorded_rois[0]
        if isinstance(roi, tuple) and len(roi) == 4:
            assert (
                roi == anchor_box
            ), f"Action ROI {roi} should match anchor node box {anchor_box}"
        else:
            # If ROI is a richer object, fall back to its box/rect/region attribute.
            roi_box = None
            for attr in ("box", "rect", "region", "bbox"):
                if hasattr(roi, attr):
                    roi_box = getattr(roi, attr)
                    break

            assert (
                roi_box is not None
            ), "Unable to determine ROI box for anchor ROI; please expose an attribute like 'box' or 'rect'"

            assert (
                tuple(roi_box) == anchor_box
            ), f"Action ROI box {roi_box} should match anchor node box {anchor_box}"

        point = recorded_points[0]
        # Normalize point if it's a richer object
        if not isinstance(point, tuple):
            for attr in ("point", "pos", "center"):
                if hasattr(point, attr):
                    point = getattr(point, attr)
                    break

        assert (
            isinstance(point, tuple) and len(point) == 2
        ), f"Expected point to be a 2-tuple, got {point!r}"

        px, py = point
        ax, ay, aw, ah = anchor_box
        assert ax <= px <= ax + aw and ay <= py <= ay + ah, (
            f"Click point {point} should lie inside anchor box {anchor_box}, "
            "indicating it was derived from the anchor-resolved node ROI"
        )

        print("    PASS: roi/target anchor runtime resolution")

The edited code assumes several concrete APIs that you may need to align with your existing code:

  1. Anchor binding:

    • Adjust new_ctx.add_node(...) and new_ctx.bind_anchor("MyRoiAnchor", anchor_node) to match however you currently create nodes and bind anchors in Context.
    • If you use a different shape for boxes (e.g., Box(x, y, w, h) object), adapt anchor_box and the comparison logic accordingly.
  2. Action registration:

    • Update new_ctx.actions["Click"] to the correct registry lookup in your context (e.g. new_ctx.action_registry["Click"], new_ctx._actions["Click"], etc.).
    • Ensure the base click action is class-like and can be subclassed; if it is a function, wrap it in a callable object instead of subclassing.
  3. Pipeline execution:

    • Replace new_ctx.run_pipeline("RoiAnchorTest") with the correct method your framework uses to execute a named pipeline (e.g. new_ctx.run("RoiAnchorTest")).
  4. Action call signature:

    • The RecordingClick.__call__ method tries to be generic in how it extracts roi and point. If your click action uses a specific signature (e.g. __call__(self, point) with self.roi already resolved), simplify this method to match your actual signature and capture exactly the runtime ROI/point values.
  5. ROI/point representation:

    • If your ROI or point types are custom classes, update the attribute names in the normalization logic (e.g., if you use roi.rect.to_tuple() or point.x, point.y).
    • If you have helper functions (e.g. roi_to_box(roi)), prefer using those instead of manual attribute inspection, to stay consistent with the rest of the codebase.

Integrate these adjustments so the test uses your real APIs while keeping the structure of this end-to-end anchor resolution check intact.

Region of Interest (ROI), defining the image recognition boundary; related image processing is performed only within this area. Optional, default [0, 0, 0, 0], i.e. full screen.
- *array<int, 4>*: Recognition area coordinates [x, y, w, h]. Supports negative values **💡 v5.6**: negative x/y means calculating from right/bottom edge; w/h of 0 means extending to edge, negative means taking absolute value and treating (x, y) as bottom-right corner.
- *string*: Fill in the node name, and identify within the target range identified by a previously executed node.
- *string*: Fill in the node name, and identify within the target range identified by a previously executed node. Also supports `[Anchor]AnchorName` format to reference the node corresponding to an anchor **💡 v5.9**. If the referenced pre-task or anchor has no recognition result, the recognition is treated as failed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick (typo): 建议重新措辞以避免这句话中重复使用 “identify”。

短语 "identify within the target range identified by a previously executed node" 出现了重复用词。可以考虑使用不同动词(特别是第二个位置)来提升可读性。

Suggested change
- *string*: Fill in the node name, and identify within the target range identified by a previously executed node. Also supports `[Anchor]AnchorName` format to reference the node corresponding to an anchor **💡 v5.9**. If the referenced pre-task or anchor has no recognition result, the recognition is treated as failed.
- *string*: Fill in the node name to perform recognition within the target range produced by a previously executed node. Also supports `[Anchor]AnchorName` format to reference the node corresponding to an anchor **💡 v5.9**. If the referenced pre-task or anchor has no recognition result, the recognition is treated as failed.
Original comment in English

nitpick (typo): Consider rephrasing to avoid the repeated use of "identify" in this sentence.

The phrase "identify within the target range identified by a previously executed node" is repetitive. Consider alternative wording (e.g., using a different verb the second time) to improve readability.

Suggested change
- *string*: Fill in the node name, and identify within the target range identified by a previously executed node. Also supports `[Anchor]AnchorName` format to reference the node corresponding to an anchor **💡 v5.9**. If the referenced pre-task or anchor has no recognition result, the recognition is treated as failed.
- *string*: Fill in the node name to perform recognition within the target range produced by a previously executed node. Also supports `[Anchor]AnchorName` format to reference the node corresponding to an anchor **💡 v5.9**. If the referenced pre-task or anchor has no recognition result, the recognition is treated as failed.

Position of the scroll target. The mouse will first move to this position before scrolling. Optional, default is `true`.
- *true*: Target is the position just recognized in this node (i.e., itself).
- *string*: Fill in a node name; the target is the position recognized by a previously executed node.
- *string*: Fill in a node name; the target is the position recognized by a previously executed node. Also supports `[Anchor]AnchorName` format to reference the node corresponding to an anchor **💡 v5.9**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: 请明确当引用的前置任务或锚点没有识别结果时,滚动目标的行为。

ROI 和点击目标的文档已经明确说明:当被引用的节点没有识别结果时,会将其视为识别/动作失败。对于滚动(scroll),能否也在文档中写清楚等效的行为(或者特别说明与前者不同的地方),以保持行为与预期的一致性?

Original comment in English

question: Clarify behavior when the referenced pre-task or anchor has no recognition result for scroll targets.

The ROI and click target docs explicitly state what happens when the referenced node has no recognition result (treating it as a failed recognition/action). For scroll, could you document the equivalent behavior (or call out how it differs) to keep behavior and expectations consistent?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support in MaaFramework’s pipeline format and runtime for using anchors as references in both recognition ROI and action targets, enabling nodes to resolve their ROI/target to the latest node bound to a named anchor.

Changes:

  • Extend pipeline parsing/dumping and internal target typing to support an Anchor variant (e.g. "[Anchor]Foo").
  • Resolve anchor-based ROI/targets at runtime via Context (plumbed into ActionHelper) and add guardrails for empty resolved targets.
  • Update schema/docs and add Python pipeline parsing/dumping tests for anchor-form ROI/targets.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tools/pipeline.schema.json Documents [Anchor]... usage in ROI/target schema markdown.
test/python/pipeline_test.py Adds parsing/dumping assertions for ROI/target anchor string forms.
source/MaaFramework/Vision/VisionTypes.h Adds Anchor to TargetType.
source/MaaFramework/Vision/VisionBase.cpp Adjusts ROI handling in base vision class constructor.
source/MaaFramework/Vision/TemplateComparator.cpp Simplifies ROI usage when comparing images.
source/MaaFramework/Task/Context.cpp Routes wait_freezes target resolution through Context-aware ActionHelper.
source/MaaFramework/Task/Component/Recognizer.cpp Adds anchor ROI resolution and centralizes ROI correction.
source/MaaFramework/Task/Component/Actuator.cpp Adds empty-target checks and uses Context-aware ActionHelper.
source/MaaFramework/Task/Component/ActionHelper.h Changes ActionHelper to be constructed from Context*.
source/MaaFramework/Task/Component/ActionHelper.cpp Implements anchor target resolution and node-rect helper.
source/MaaFramework/Resource/PipelineParser.cpp Parses string targets into Anchor vs PreTask via node-attr parsing.
source/MaaFramework/Resource/PipelineDumper.cpp Dumps anchor targets back to "[Anchor]..." string form.
docs/en_us/3.1-PipelineProtocol.md Documents anchor usage for ROI and some targets in the EN protocol.

Comment on lines +604 to +608
NodeDetail node_detail = cache.get_node_detail(*node_id).value_or(NodeDetail { });
RecoResult reco_result = cache.get_reco_result(node_detail.reco_id).value_or(RecoResult { });
cv::Rect raw = reco_result.box.value_or(cv::Rect { });
LogDebug << "pre task from cache" << VAR(name) << VAR(raw);
return { raw };
return std::vector { raw };
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_rois_from_pretask() returns a vector containing cv::Rect{} when the referenced node exists but has no recognition box (reco_result.box is null). Because get_rois() now calls correct_rois(...), an empty rect gets normalized to full-screen (w/h=0 extends to edge), so ROI anchor/pre-task references can incorrectly succeed instead of being treated as failure as documented. Consider treating raw.empty() (and optionally any empty rects from sub_best_box_ / sub_filtered_boxes_) as “no result” and returning an empty ROI list (or std::nullopt) so callers fail fast.

Copilot uses AI. Check for mistakes.
Position of the scroll target. The mouse will first move to this position before scrolling. Optional, default is `true`.
- *true*: Target is the position just recognized in this node (i.e., itself).
- *string*: Fill in a node name; the target is the position recognized by a previously executed node.
- *string*: Fill in a node name; the target is the position recognized by a previously executed node. Also supports `[Anchor]AnchorName` format to reference the node corresponding to an anchor **💡 v5.9**.
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Scroll.target docs mention anchor reference support, but unlike Click.target / roi it doesn’t document the failure behavior when the referenced pre-task/anchor has no recognition result. Since the implementation treats an empty resolved target as action failure, it would be clearer and more consistent to add the same note here.

Suggested change
- *string*: Fill in a node name; the target is the position recognized by a previously executed node. Also supports `[Anchor]AnchorName` format to reference the node corresponding to an anchor **💡 v5.9**.
- *string*: Fill in a node name; the target is the position recognized by a previously executed node. Also supports `[Anchor]AnchorName` format to reference the node corresponding to an anchor **💡 v5.9**. If the referenced node or anchor has no recognition result, the resolved target is empty and the `Scroll` action is treated as failed.

Copilot uses AI. Check for mistakes.
MistEO added a commit that referenced this pull request Mar 6, 2026
neko-para added a commit to neko-para/maa-support-extension that referenced this pull request Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants