[Data][Docs] Add job-level checkpointing documentation#60921
[Data][Docs] Add job-level checkpointing documentation#60921bveeramani merged 5 commits intoray-project:masterfrom
Conversation
…ointing Signed-off-by: Abhishek Kumar <anonyomoushunter@gmail.com>
Signed-off-by: “Alex <alexchien130@gmail.com>
There was a problem hiding this comment.
Code Review
This PR adds valuable documentation for the new job-level checkpointing feature. The explanations are clear and the example is helpful. I've suggested a few minor wording improvements for clarity and proposed adding a note about an important requirement for the id_column to ensure users can use this feature correctly.
| id_column="id", | ||
| checkpoint_path="s3://my-bucket/ray-data-checkpoints", # Must be accessible by all nodes | ||
| delete_checkpoint_on_success=False, # Preserves checkpoints after successful runs | ||
| ) |
There was a problem hiding this comment.
The id_column has a critical requirement: its values must be unique for each row, and the column must be present throughout the entire pipeline. This information is crucial for users to correctly use checkpointing. I recommend adding a .. note:: block after the example to highlight this.
For example:
.. note::
The ``id_column`` must contain unique values for each row across the entire dataset. This column must also be present throughout all transformations in the pipeline.There was a problem hiding this comment.
Sounds reasonable, want to add?
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
…0921) This PR documents Ray Data job-level checkpointing functionality added in ray-project#59409. Adds documentation explaining job-level checkpointing and its application to offline batch inference, including configuration examples. **Sections modified:** - Execution Configurations - End-to-end: Offline Batch Inference Supersedes ray-project#60289 Fixes ray-project#60250 --------- Signed-off-by: Abhishek Kumar <anonyomoushunter@gmail.com> Signed-off-by: “Alex <alexchien130@gmail.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Abhishek Kumar <anonyomoushunter@gmail.com> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Balaji Veeramani <balaji@anyscale.com> Signed-off-by: Ondrej Prenek <ondra.prenek@gmail.com>
…0921) This PR documents Ray Data job-level checkpointing functionality added in ray-project#59409. Adds documentation explaining job-level checkpointing and its application to offline batch inference, including configuration examples. **Sections modified:** - Execution Configurations - End-to-end: Offline Batch Inference Supersedes ray-project#60289 Fixes ray-project#60250 --------- Signed-off-by: Abhishek Kumar <anonyomoushunter@gmail.com> Signed-off-by: “Alex <alexchien130@gmail.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Abhishek Kumar <anonyomoushunter@gmail.com> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
…0921) This PR documents Ray Data job-level checkpointing functionality added in ray-project#59409. Adds documentation explaining job-level checkpointing and its application to offline batch inference, including configuration examples. **Sections modified:** - Execution Configurations - End-to-end: Offline Batch Inference Supersedes ray-project#60289 Fixes ray-project#60250 --------- Signed-off-by: Abhishek Kumar <anonyomoushunter@gmail.com> Signed-off-by: “Alex <alexchien130@gmail.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Abhishek Kumar <anonyomoushunter@gmail.com> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
…0921) This PR documents Ray Data job-level checkpointing functionality added in ray-project#59409. Adds documentation explaining job-level checkpointing and its application to offline batch inference, including configuration examples. **Sections modified:** - Execution Configurations - End-to-end: Offline Batch Inference Supersedes ray-project#60289 Fixes ray-project#60250 --------- Signed-off-by: Abhishek Kumar <anonyomoushunter@gmail.com> Signed-off-by: “Alex <alexchien130@gmail.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Abhishek Kumar <anonyomoushunter@gmail.com> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Balaji Veeramani <balaji@anyscale.com> Signed-off-by: Muhammad Saif <2024BBIT200@student.Uet.edu.pk>
…0921) This PR documents Ray Data job-level checkpointing functionality added in ray-project#59409. Adds documentation explaining job-level checkpointing and its application to offline batch inference, including configuration examples. **Sections modified:** - Execution Configurations - End-to-end: Offline Batch Inference Supersedes ray-project#60289 Fixes ray-project#60250 --------- Signed-off-by: Abhishek Kumar <anonyomoushunter@gmail.com> Signed-off-by: “Alex <alexchien130@gmail.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Abhishek Kumar <anonyomoushunter@gmail.com> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
…0921) This PR documents Ray Data job-level checkpointing functionality added in ray-project#59409. Adds documentation explaining job-level checkpointing and its application to offline batch inference, including configuration examples. **Sections modified:** - Execution Configurations - End-to-end: Offline Batch Inference Supersedes ray-project#60289 Fixes ray-project#60250 --------- Signed-off-by: Abhishek Kumar <anonyomoushunter@gmail.com> Signed-off-by: “Alex <alexchien130@gmail.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Abhishek Kumar <anonyomoushunter@gmail.com> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Balaji Veeramani <balaji@anyscale.com> Signed-off-by: Adel Nour <ans9868@nyu.edu>
…0921) This PR documents Ray Data job-level checkpointing functionality added in ray-project#59409. Adds documentation explaining job-level checkpointing and its application to offline batch inference, including configuration examples. **Sections modified:** - Execution Configurations - End-to-end: Offline Batch Inference Supersedes ray-project#60289 Fixes ray-project#60250 --------- Signed-off-by: Abhishek Kumar <anonyomoushunter@gmail.com> Signed-off-by: “Alex <alexchien130@gmail.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Abhishek Kumar <anonyomoushunter@gmail.com> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
…0921) This PR documents Ray Data job-level checkpointing functionality added in ray-project#59409. Adds documentation explaining job-level checkpointing and its application to offline batch inference, including configuration examples. **Sections modified:** - Execution Configurations - End-to-end: Offline Batch Inference Supersedes ray-project#60289 Fixes ray-project#60250 --------- Signed-off-by: Abhishek Kumar <anonyomoushunter@gmail.com> Signed-off-by: “Alex <alexchien130@gmail.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Abhishek Kumar <anonyomoushunter@gmail.com> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Balaji Veeramani <balaji@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
…0921) This PR documents Ray Data job-level checkpointing functionality added in ray-project#59409. Adds documentation explaining job-level checkpointing and its application to offline batch inference, including configuration examples. **Sections modified:** - Execution Configurations - End-to-end: Offline Batch Inference Supersedes ray-project#60289 Fixes ray-project#60250 --------- Signed-off-by: Abhishek Kumar <anonyomoushunter@gmail.com> Signed-off-by: “Alex <alexchien130@gmail.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Abhishek Kumar <anonyomoushunter@gmail.com> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Balaji Veeramani <balaji@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
…0921) This PR documents Ray Data job-level checkpointing functionality added in ray-project#59409. Adds documentation explaining job-level checkpointing and its application to offline batch inference, including configuration examples. **Sections modified:** - Execution Configurations - End-to-end: Offline Batch Inference Supersedes ray-project#60289 Fixes ray-project#60250 --------- Signed-off-by: Abhishek Kumar <anonyomoushunter@gmail.com> Signed-off-by: “Alex <alexchien130@gmail.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Abhishek Kumar <anonyomoushunter@gmail.com> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
This PR documents Ray Data job-level checkpointing functionality added in #59409.
Adds documentation explaining job-level checkpointing and its application to offline batch inference, including configuration examples.
Sections modified:
Supersedes #60289
Fixes #60250