Skip to content

Support autocheckpointing in CheckpointManager#5753

Merged
jonb377 merged 1 commit intomasterfrom
jonbolin/autochkpt
Nov 8, 2023
Merged

Support autocheckpointing in CheckpointManager#5753
jonb377 merged 1 commit intomasterfrom
jonbolin/autochkpt

Conversation

@jonb377
Copy link
Copy Markdown
Collaborator

@jonb377 jonb377 commented Nov 1, 2023

When a preemption is detected, the CheckpointManager should decide to automatically take a checkpoint. This change enables that functionality by querying the new _sync_point_reached APIs, introduced in #5733, in each should_save call.

@jonb377 jonb377 requested a review from alanwaketan November 1, 2023 19:06
@jonb377 jonb377 self-assigned this Nov 1, 2023
@jonb377 jonb377 force-pushed the jonbolin/autochkpt branch from cf4861f to d4c757a Compare November 6, 2023 03:59
@jonb377 jonb377 requested a review from yeounoh November 7, 2023 01:28
Copy link
Copy Markdown
Contributor

@yeounoh yeounoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Collaborator

@alanwaketan alanwaketan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@jonb377
Copy link
Copy Markdown
Collaborator Author

jonb377 commented Nov 7, 2023

Thanks @alanwaketan and @yeounoh! I'll merge after TPU CI.

@jonb377 jonb377 merged commit 4664380 into master Nov 8, 2023
@jonb377 jonb377 deleted the jonbolin/autochkpt branch November 8, 2023 04:50
mbzomowski pushed a commit to mbzomowski-test-org/xla that referenced this pull request Nov 16, 2023
ManfeiBai pushed a commit that referenced this pull request Nov 29, 2023
ManfeiBai pushed a commit that referenced this pull request Nov 29, 2023
chunnienc pushed a commit to chunnienc/xla that referenced this pull request Dec 14, 2023
golechwierowicz pushed a commit that referenced this pull request Jan 12, 2024
bhavya01 pushed a commit that referenced this pull request Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants