Skip to content

[data] feat: show more detail info for dataset.explain#57798

Open
my-vegetable-has-exploded wants to merge 6 commits intoray-project:masterfrom
my-vegetable-has-exploded:explain-verbose
Open

[data] feat: show more detail info for dataset.explain#57798
my-vegetable-has-exploded wants to merge 6 commits intoray-project:masterfrom
my-vegetable-has-exploded:explain-verbose

Conversation

@my-vegetable-has-exploded
Copy link
Copy Markdown
Contributor

@my-vegetable-has-exploded my-vegetable-has-exploded commented Oct 16, 2025

Description

A new explain method has been added to the PhysicalOperator class, allowing different PhysicalOperators to customize their explain by overriding it. Since operators like Source and Filter are converted into MapTransformFns within a MapOperator—which may also be fused, so we provide detailed explain function for all MapTransformFns. To facilitate this, MapTransformFn now supports a custom explain_fn that can be passed in and invoked. This PR prototypes this functionality with Range and Parquet sources as initial examples.

Related issues

Part of #55052

Types of change

  • Bug fix 🐛
  • New feature ✨
  • Enhancement 🚀
  • Code refactoring 🔧
  • Documentation update 📖
  • Chore 🧹
  • Style 🎨

Checklist

Does this PR introduce breaking changes?

  • Yes ⚠️
  • No

Testing:

  • Added/updated tests for my changes
  • Tested the changes manually
  • This PR is not tested ❌ (please explain why)

Code Quality:

  • Signed off every commit (git commit -s)
  • Ran pre-commit hooks (setup guide)

Documentation:

  • Updated documentation (if applicable) (contribution guide)
  • Added new APIs to doc/source/ (if applicable)

Additional context

Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>
Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>
Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>
Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>
Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>
@my-vegetable-has-exploded my-vegetable-has-exploded changed the title [WIP] Explain verbose feat: show more detail info for dataset.explain Oct 17, 2025
Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>
@my-vegetable-has-exploded my-vegetable-has-exploded changed the title feat: show more detail info for dataset.explain [data] feat: show more detail info for dataset.explain Oct 18, 2025
@my-vegetable-has-exploded
Copy link
Copy Markdown
Contributor Author

But currently I don't have a good initial example for customized operator metric (Ray Data invokes numerous third-party libraries, which abstract away many low-level metric details). Do you have any suggestions? @richardliaw

@my-vegetable-has-exploded my-vegetable-has-exploded marked this pull request as ready for review October 19, 2025 12:01
@my-vegetable-has-exploded my-vegetable-has-exploded requested a review from a team as a code owner October 19, 2025 12:01
raise ValueError(f"Unexpected operator type: {type(op)}")

prefix = "" if depth == 0 else " " * ((depth - 1) * 3) + "+- "
curr_str += textwrap.indent(op_str + "\n", prefix)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Incorrect Indentation in Operator Explanations

The textwrap.indent() call in generate_plan_string applies the tree-structure prefix to every line of multi-line operator explanations. This incorrectly indents all continuation lines with the prefix, breaking the physical plan's formatting. The prefix should only apply to the first line of an operator's explanation.

Fix in Cursor Fix in Web

self._per_block_limit = per_block_limit

def explain(self, mode: str = "default") -> str:
return f"{self._name}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Inconsistent Default Mode in Explain Methods

The AbstractMap.explain() method defaults its mode parameter to "default". This is inconsistent with other explain methods throughout the codebase, which use "simple" as their default, potentially causing unexpected behavior.

Fix in Cursor Fix in Web

@ray-gardener ray-gardener bot added docs An issue or change related to documentation data Ray Data-related issues community-contribution Contributed by the community labels Oct 19, 2025
@github-actions
Copy link
Copy Markdown

github-actions bot commented Nov 3, 2025

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Nov 3, 2025
@github-actions github-actions bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Nov 13, 2025
@iamjustinhsu
Copy link
Copy Markdown
Contributor

iamjustinhsu commented Nov 19, 2025

Hey @my-vegetable-has-exploded, thank you for the contribution!

But currently I don't have a good initial example for customized operator metric (Ray Data invokes numerous third-party libraries, which abstract away many low-level metric details).

Can you elaborate on what you mean by this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues docs An issue or change related to documentation unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants