Skip to content
This repository was archived by the owner on Nov 3, 2023. It is now read-only.

Fix for fractional GPU#125

Merged
amogkam merged 16 commits intoray-project:mainfrom
amogkam:fractional-gpu-fix
Apr 11, 2022
Merged

Fix for fractional GPU#125
amogkam merged 16 commits intoray-project:mainfrom
amogkam:fractional-gpu-fix

Conversation

@amogkam
Copy link
Copy Markdown
Collaborator

@amogkam amogkam commented Feb 19, 2022

Closes #124.

Fixes device calculation to take into account fractional GPUs. But also raises a warning advising against this in the multi-worker case as sharing GPUs across workers will often lead to failures with NCCL training.

Test was run manually and passes.

@amogkam amogkam mentioned this pull request Feb 19, 2022
Copy link
Copy Markdown
Member

@bveeramani bveeramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick but otherwise LGTM

@amogkam amogkam requested a review from bveeramani April 7, 2022 01:50
Copy link
Copy Markdown
Member

@bveeramani bveeramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@amogkam amogkam merged commit c8bcae7 into ray-project:main Apr 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fractional GPU training

3 participants