Skip to content

Use cu118 with cudnn >= 8.6 in docker file#23339

Merged
ydshieh merged 2 commits intomainfrom
new_cudnn
May 12, 2023
Merged

Use cu118 with cudnn >= 8.6 in docker file#23339
ydshieh merged 2 commits intomainfrom
new_cudnn

Conversation

@ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented May 12, 2023

What does this PR do?

We use TF 2.12 after #22759 and #23293. But TF 2.12 requires CUDA 11.8 and CUDNN 8.6 (or up) to work.
Currently, our CI have errors with

Loaded runtime CuDNN library: 8.5.0 but source was compiled with: 8.6.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.

`UNIMPLEMENTED: DNN library is not found.`.

This PR uses new base image for some docker files. We also have to use cu118 for the torch installation with this new base image.
Other docker files (those with deepspeed stuff) are not changed in this PR - better to see what happens with this change and apply to other files.

Running some previous failing tests and they pass now. Still need to watch if the whole suite (doctest) pass on Monday.

@ydshieh ydshieh requested a review from amyeroberts May 12, 2023 16:15
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented May 12, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating!

Given we're fixing TF and PT versions here, I'm assuming this is fine. I just have a question about future support - do we make any promises about maintaining support for cu117?

@ydshieh
Copy link
Collaborator Author

ydshieh commented May 12, 2023

We have the so called past CI which runs previous torch and tensorflow versions together the environment we set for them.

In this particular case, since torch version is not changed but using cu118 file, we don't really have any extra CI for torch 2.0 with cu117 after this PR. So no real promise. But this is already the case for our CI, we always fix a cuda and cudnn environment until we really have to change 🙂

@ydshieh ydshieh merged commit cf11493 into main May 12, 2023
@ydshieh ydshieh deleted the new_cudnn branch May 12, 2023 19:58
gojiteji pushed a commit to gojiteji/transformers that referenced this pull request Jun 5, 2023
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants