[Data] Add contributing guide to Ray Data documentation by bveeramani · Pull Request #58589 · ray-project/ray

bveeramani · 2025-11-13T07:33:17Z

Description

This PR adds a new contributing guide to the Ray Data documentation to help contributors write better code and tests.

This change adds three new documentation files to help Ray Data contributors:

How to contribute an improvement - Guidelines for finding work, getting early feedback, writing clear PRs, keeping changes small, and writing simple code that follows Ray Data's design principles.
How to write tests - Ray-specific and general best practices for writing tests that are fast, reliable, and maintainable. Includes guidance on avoiding common pitfalls like assuming output order, using shutdown_only unnecessarily, and testing against implementation details.
Contributing section - A new top-level section in the documentation that organizes these guides.

The documentation follows principles from "A Philosophy of Software Design" and adapts them specifically for Ray Data's codebase. It aims to help both new and experienced contributors write code that is simple, maintainable, and aligned with the project's design philosophy.

This PR adds a new contributing guide to the Ray Data documentation to help contributors write better code and tests. The guide includes best practices for submitting improvements and writing maintainable tests. ## Description This change adds three new documentation files to help Ray Data contributors: 1. **How to contribute an improvement** - Guidelines for finding work, getting early feedback, writing clear PRs, keeping changes small, and writing simple code that follows deep module design principles. 2. **How to write tests** - Ray-specific and general best practices for writing tests that are fast, reliable, and maintainable. Includes guidance on avoiding common pitfalls like assuming output order, using `shutdown_only` unnecessarily, and testing against implementation details. 3. **Contributing section** - A new top-level section in the documentation that organizes these guides. ## Related issues Related to ongoing efforts to improve code quality and contributor experience in Ray Data. Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

gemini-code-assist

Code Review

This PR adds a valuable contributing guide to the Ray Data documentation. The guides for contributing improvements and writing tests are well-structured and provide clear best practices. I've identified a few issues, including a broken link in the table of contents, a syntax error in a code example, and some minor typos and formatting inconsistencies. Addressing these will ensure the documentation is accurate and easy for contributors to follow.

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

doc/source/data/contributing/how-to-contribute-an-improvement.md

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

edoakes

This is fantastic!

doc/source/data/contributing/how-to-contribute-an-improvement.md

doc/source/data/contributing/how-to-write-tests.md

doc/source/data/contributing/how-to-contribute-an-improvement.md

doc/source/data/contributing/contributing.rst

owenowenisme · 2025-11-14T01:55:39Z

doc/source/data/contributing/how-to-write-tests.md

+```python
+ds2 = ds.repartition(5)
+assert sum(len(bundle.blocks) for bundle in ds.iter_internal_ref_bundles()) == 5
+# Assertion about the number of rows in each block has been removed.


Is this a good example?
I think here we really should check the row num from every block.
Maybe the better version should be adding comments for these magic numbers.

How would you decide how many rows go in each block? Both [10, 10, 0, 0, 0] and [2, 2, 2, 2, 2] satisfy the APIs contract

Ah you're right.

doc/source/data/contributing/how-to-contribute-an-improvement.md

doc/source/data/contributing/how-to-write-tests.md

raulchen · 2025-11-14T22:17:33Z

doc/source/data/contributing/how-to-write-tests.md

+
+```python
+ds = ray.data.read_parquet_bulk(paths + [txt_path], filesystem=fs)
+# Assertion has been removed.


but this one doesn't assert anything.
It won't catch any issues.
If people do want to test against initial_num_blocks, it's okay to add the assertion.
Just document it clearly.

The actual test that this is based on has additional assertions, but I didn't originally include them.

Added an additional assertion so this looks a bit less weird.

If people do want to test against initial_num_blocks, it's okay to add the assertion. Just document it clearly.

I think we should discourage this for a couple of reasons:

It requires testing against internal attributes. If we refactor (e.g., remove ExecutionPlan), the test will break even though nothing is broken.

The number of blocks isn't gaurenteed by the interface and can change depending on our specific implementation. For example, if there are two small files, it could be reasonable to either produce two blocks (to ensure parellism), or one block (to make blocks larger).

I mean, sometimes we do want to test against an internal implementation, that's fine for unit tests.
In this case, if this assertion isn't helpful, we should just remove the entire test.

Yeah, I agree we do (and should) test lower levels of abstraction. I guess it'd be more accurate to say that the test breaks abstraction barriers, and tests across multiple layers of abstraction (user-facing APIs and ExecutionPlan, and makes as assumptions about which attributes a Dataset has)

doc/source/data/contributing/how-to-write-tests.md

github-actions · 2025-11-29T00:38:01Z

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

doc/source/data/contributing/how-to-contribute-an-improvement.md

doc/source/data/contributing/how-to-write-tests.md

owenowenisme · 2025-12-09T02:05:47Z

doc/source/data/contributing/how-to-write-tests.md

+```python
+ds2 = ds.repartition(5)
+assert sum(len(bundle.blocks) for bundle in ds.iter_internal_ref_bundles()) == 5
+# Assertion about the number of rows in each block has been removed.


Ah you're right.

doc/source/data/data.rst

doc/source/data/contributing/contributing.rst

Co-authored-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

doc/source/data/contributing/how-to-contribute-an-improvement.md

Co-authored-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

…ct/ray into add-contributing-guide Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

owenowenisme

Some nits but up to you, also might be good to add this somewhere
https://docs.ray.io/en/master/ray-contribute/getting-involved.html

owenowenisme · 2025-12-09T12:49:18Z

doc/source/data/contributing/contributing.rst

@@ -0,0 +1,9 @@
+============
+Contributing


Contributing to Ray Data

owenowenisme · 2025-12-09T12:50:27Z

doc/source/data/contributing/how-to-contribute-an-improvement.md

@@ -0,0 +1,74 @@
+# How to contribute an improvement


Contribution Guide sounds more natural to me.

…ntributing guide (#59320) This is a follow-up PR to address these review comments: #58589 (review). --------- Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

…58589) ## Description This PR adds a new contributing guide to the Ray Data documentation to help contributors write better code and tests. This change adds three new documentation files to help Ray Data contributors: 1. **How to contribute an improvement** - Guidelines for finding work, getting early feedback, writing clear PRs, keeping changes small, and writing simple code that follows Ray Data's design principles. 2. **How to write tests** - Ray-specific and general best practices for writing tests that are fast, reliable, and maintainable. Includes guidance on avoiding common pitfalls like assuming output order, using `shutdown_only` unnecessarily, and testing against implementation details. 3. **Contributing section** - A new top-level section in the documentation that organizes these guides. The documentation follows principles from "A Philosophy of Software Design" and adapts them specifically for Ray Data's codebase. It aims to help both new and experienced contributors write code that is simple, maintainable, and aligned with the project's design philosophy. --------- Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com> Signed-off-by: peterxcli <peterxcli@gmail.com>

…ntributing guide (ray-project#59320) This is a follow-up PR to address these review comments: ray-project#58589 (review). --------- Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: peterxcli <peterxcli@gmail.com>

bveeramani added 2 commits November 12, 2025 23:28

Update stuff

cb3f0a2

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

bveeramani requested a review from a team as a code owner November 13, 2025 07:33

Fix typo

b05b619

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

gemini-code-assist bot reviewed Nov 13, 2025

View reviewed changes

FIx typo

a4c88a1

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

bveeramani commented Nov 13, 2025

View reviewed changes

doc/source/data/contributing/how-to-contribute-an-improvement.md Outdated Show resolved Hide resolved

ray-gardener bot added docs An issue or change related to documentation data Ray Data-related issues labels Nov 13, 2025

Add section

15ced55

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

edoakes reviewed Nov 13, 2025

View reviewed changes

owenowenisme reviewed Nov 14, 2025

View reviewed changes

doc/source/data/contributing/how-to-contribute-an-improvement.md Show resolved Hide resolved

doc/source/data/contributing/how-to-contribute-an-improvement.md Show resolved Hide resolved

doc/source/data/contributing/contributing.rst Show resolved Hide resolved

owenowenisme reviewed Nov 14, 2025

View reviewed changes

raulchen reviewed Nov 14, 2025

View reviewed changes

github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Nov 29, 2025

owenowenisme added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Nov 29, 2025

bveeramani added 3 commits December 8, 2025 13:56

Address some review comments

1e31f64

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

Address more review comments

021becd

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

Address more review comments

581ccdc

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

bveeramani added the go add ONLY when ready to merge, run all tests label Dec 8, 2025

bveeramani enabled auto-merge (squash) December 8, 2025 22:51

bveeramani disabled auto-merge December 8, 2025 22:51

Appease lint

57a1324

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

owenowenisme reviewed Dec 9, 2025

View reviewed changes

bveeramani and others added 2 commits December 9, 2025 01:48

Update doc/source/data/contributing/how-to-contribute-an-improvement.md

f83e889

Co-authored-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

Address review comments

008bb6a

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

owenowenisme reviewed Dec 9, 2025

View reviewed changes

doc/source/data/contributing/how-to-contribute-an-improvement.md Outdated Show resolved Hide resolved

Update doc/source/data/contributing/how-to-contribute-an-improvement.md

70134f8

Co-authored-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

bveeramani added 3 commits December 9, 2025 02:02

Update formatting

6aea90f

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

Resolve merge conflicts

a7e8455

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

Merge branch 'add-contributing-guide' of https://github.com/ray-proje…

bba5e8c

…ct/ray into add-contributing-guide Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

owenowenisme approved these changes Dec 9, 2025

View reviewed changes

raulchen approved these changes Dec 9, 2025

View reviewed changes

bveeramani merged commit 00b1f9d into master Dec 9, 2025
5 of 6 checks passed

bveeramani deleted the add-contributing-guide branch December 9, 2025 21:10

bveeramani mentioned this pull request Dec 9, 2025

[Data] Rename Ray Data contributing guides and link to general Ray contributing guide #59320

Merged

Conversation

bveeramani commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

edoakes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

owenowenisme Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Nov 29, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

owenowenisme left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

bveeramani commented Nov 13, 2025 •

edited

Loading

owenowenisme Nov 14, 2025 •

edited

Loading