Skip to content

[Data][Docs] Add job-level checkpointing documentation#60921

Merged
bveeramani merged 5 commits intoray-project:masterfrom
yuhuan130:new-pr-60289
Feb 11, 2026
Merged

[Data][Docs] Add job-level checkpointing documentation#60921
bveeramani merged 5 commits intoray-project:masterfrom
yuhuan130:new-pr-60289

Conversation

@yuhuan130
Copy link
Copy Markdown
Contributor

This PR documents Ray Data job-level checkpointing functionality added in #59409.

Adds documentation explaining job-level checkpointing and its application to offline batch inference, including configuration examples.

Sections modified:

  • Execution Configurations
  • End-to-end: Offline Batch Inference

Supersedes #60289

Fixes #60250

anonihunter and others added 2 commits January 19, 2026 15:25
…ointing

Signed-off-by: Abhishek Kumar <anonyomoushunter@gmail.com>
Signed-off-by: “Alex <alexchien130@gmail.com>
@yuhuan130 yuhuan130 requested a review from a team as a code owner February 10, 2026 10:49
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR adds valuable documentation for the new job-level checkpointing feature. The explanations are clear and the example is helpful. I've suggested a few minor wording improvements for clarity and proposed adding a note about an important requirement for the id_column to ensure users can use this feature correctly.

id_column="id",
checkpoint_path="s3://my-bucket/ray-data-checkpoints", # Must be accessible by all nodes
delete_checkpoint_on_success=False, # Preserves checkpoints after successful runs
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The id_column has a critical requirement: its values must be unique for each row, and the column must be present throughout the entire pipeline. This information is crucial for users to correctly use checkpointing. I recommend adding a .. note:: block after the example to highlight this.

For example:

.. note::
    The ``id_column`` must contain unique values for each row across the entire dataset. This column must also be present throughout all transformations in the pipeline.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable, want to add?

@ray-gardener ray-gardener bot added the community-contribution Contributed by the community label Feb 10, 2026
bveeramani and others added 2 commits February 10, 2026 14:50
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Copy link
Copy Markdown
Member

@bveeramani bveeramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bveeramani bveeramani added the go add ONLY when ready to merge, run all tests label Feb 11, 2026
@bveeramani bveeramani enabled auto-merge (squash) February 11, 2026 01:57
@bveeramani bveeramani merged commit 2ab4760 into ray-project:master Feb 11, 2026
8 checks passed
preneond pushed a commit to preneond/ray that referenced this pull request Feb 15, 2026
…0921)

This PR documents Ray Data job-level checkpointing functionality added
in ray-project#59409.

Adds documentation explaining job-level checkpointing and its
application to offline batch inference, including configuration
examples.

**Sections modified:**
- Execution Configurations
- End-to-end: Offline Batch Inference

Supersedes ray-project#60289

Fixes ray-project#60250

---------

Signed-off-by: Abhishek Kumar <anonyomoushunter@gmail.com>
Signed-off-by: “Alex <alexchien130@gmail.com>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: Abhishek Kumar <anonyomoushunter@gmail.com>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
Signed-off-by: Ondrej Prenek <ondra.prenek@gmail.com>
limarkdcunha pushed a commit to limarkdcunha/ray that referenced this pull request Feb 17, 2026
…0921)

This PR documents Ray Data job-level checkpointing functionality added
in ray-project#59409.

Adds documentation explaining job-level checkpointing and its
application to offline batch inference, including configuration
examples.

**Sections modified:**
- Execution Configurations
- End-to-end: Offline Batch Inference

Supersedes ray-project#60289

Fixes ray-project#60250

---------

Signed-off-by: Abhishek Kumar <anonyomoushunter@gmail.com>
Signed-off-by: “Alex <alexchien130@gmail.com>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: Abhishek Kumar <anonyomoushunter@gmail.com>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
preneond pushed a commit to preneond/ray that referenced this pull request Feb 17, 2026
…0921)

This PR documents Ray Data job-level checkpointing functionality added
in ray-project#59409.

Adds documentation explaining job-level checkpointing and its
application to offline batch inference, including configuration
examples.

**Sections modified:**
- Execution Configurations
- End-to-end: Offline Batch Inference

Supersedes ray-project#60289

Fixes ray-project#60250

---------

Signed-off-by: Abhishek Kumar <anonyomoushunter@gmail.com>
Signed-off-by: “Alex <alexchien130@gmail.com>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: Abhishek Kumar <anonyomoushunter@gmail.com>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
MuhammadSaif700 pushed a commit to MuhammadSaif700/ray that referenced this pull request Feb 17, 2026
…0921)

This PR documents Ray Data job-level checkpointing functionality added
in ray-project#59409.

Adds documentation explaining job-level checkpointing and its
application to offline batch inference, including configuration
examples.

**Sections modified:**
- Execution Configurations
- End-to-end: Offline Batch Inference

Supersedes ray-project#60289

Fixes ray-project#60250

---------

Signed-off-by: Abhishek Kumar <anonyomoushunter@gmail.com>
Signed-off-by: “Alex <alexchien130@gmail.com>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: Abhishek Kumar <anonyomoushunter@gmail.com>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
Signed-off-by: Muhammad Saif <2024BBIT200@student.Uet.edu.pk>
Kunchd pushed a commit to Kunchd/ray that referenced this pull request Feb 17, 2026
…0921)

This PR documents Ray Data job-level checkpointing functionality added
in ray-project#59409.

Adds documentation explaining job-level checkpointing and its
application to offline batch inference, including configuration
examples.

**Sections modified:**
- Execution Configurations
- End-to-end: Offline Batch Inference

Supersedes ray-project#60289

Fixes ray-project#60250

---------

Signed-off-by: Abhishek Kumar <anonyomoushunter@gmail.com>
Signed-off-by: “Alex <alexchien130@gmail.com>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: Abhishek Kumar <anonyomoushunter@gmail.com>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
ans9868 pushed a commit to ans9868/ray that referenced this pull request Feb 18, 2026
…0921)

This PR documents Ray Data job-level checkpointing functionality added
in ray-project#59409.

Adds documentation explaining job-level checkpointing and its
application to offline batch inference, including configuration
examples.

**Sections modified:**
- Execution Configurations
- End-to-end: Offline Batch Inference

Supersedes ray-project#60289

Fixes ray-project#60250

---------

Signed-off-by: Abhishek Kumar <anonyomoushunter@gmail.com>
Signed-off-by: “Alex <alexchien130@gmail.com>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: Abhishek Kumar <anonyomoushunter@gmail.com>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
Signed-off-by: Adel Nour <ans9868@nyu.edu>
Aydin-ab pushed a commit to kunling-anyscale/ray that referenced this pull request Feb 20, 2026
…0921)

This PR documents Ray Data job-level checkpointing functionality added
in ray-project#59409.

Adds documentation explaining job-level checkpointing and its
application to offline batch inference, including configuration
examples.

**Sections modified:**
- Execution Configurations
- End-to-end: Offline Batch Inference

Supersedes ray-project#60289

Fixes ray-project#60250

---------

Signed-off-by: Abhishek Kumar <anonyomoushunter@gmail.com>
Signed-off-by: “Alex <alexchien130@gmail.com>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: Abhishek Kumar <anonyomoushunter@gmail.com>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…0921)

This PR documents Ray Data job-level checkpointing functionality added
in ray-project#59409.

Adds documentation explaining job-level checkpointing and its
application to offline batch inference, including configuration
examples.

**Sections modified:**
- Execution Configurations
- End-to-end: Offline Batch Inference

Supersedes ray-project#60289

Fixes ray-project#60250

---------

Signed-off-by: Abhishek Kumar <anonyomoushunter@gmail.com>
Signed-off-by: “Alex <alexchien130@gmail.com>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: Abhishek Kumar <anonyomoushunter@gmail.com>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…0921)

This PR documents Ray Data job-level checkpointing functionality added
in ray-project#59409.

Adds documentation explaining job-level checkpointing and its
application to offline batch inference, including configuration
examples.

**Sections modified:**
- Execution Configurations
- End-to-end: Offline Batch Inference

Supersedes ray-project#60289

Fixes ray-project#60250

---------

Signed-off-by: Abhishek Kumar <anonyomoushunter@gmail.com>
Signed-off-by: “Alex <alexchien130@gmail.com>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: Abhishek Kumar <anonyomoushunter@gmail.com>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
preneond pushed a commit to preneond/ray that referenced this pull request Mar 23, 2026
…0921)

This PR documents Ray Data job-level checkpointing functionality added
in ray-project#59409.

Adds documentation explaining job-level checkpointing and its
application to offline batch inference, including configuration
examples.

**Sections modified:**
- Execution Configurations
- End-to-end: Offline Batch Inference

Supersedes ray-project#60289

Fixes ray-project#60250

---------

Signed-off-by: Abhishek Kumar <anonyomoushunter@gmail.com>
Signed-off-by: “Alex <alexchien130@gmail.com>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: Abhishek Kumar <anonyomoushunter@gmail.com>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Data] Add documentation for Ray Data checkpointing

3 participants