Conversation
| @@ -1,19 +0,0 @@ | |||
| upstream CI will fail without this | |||
There was a problem hiding this comment.
Do you know why we were able to remove this patch? Is it because we updated the compiler in the CI?
There was a problem hiding this comment.
I think we need to kick off upstream CI build targetting this branch and see whether CI will pass
There was a problem hiding this comment.
yeah turns out I do still need those patches... otherwise the training job hangs.
| @@ -1,14 +0,0 @@ | |||
| diff --git a/xla/service/gpu/gpu_executable.cc b/xla/service/gpu/gpu_executable.cc | |||
There was a problem hiding this comment.
Same question as above
| "//openxla_patches:gpu_build_file.diff", | ||
| ], | ||
| strip_prefix = "xla-97a5f819faf9ff793b7ba68ff1f31f74f9459c18", | ||
| strip_prefix = "xla-7a19856d74569fd1f765cd03bdee84e3b1fdc579", |
There was a problem hiding this comment.
Can you also update the libtpu dependency in setup.py to the same date as this commit?
|
tested on v4-8: with command result: Old: === |
| "@tsl//tsl/platform:casts", | ||
| "@tsl//tsl/platform:errors", | ||
| - ] + if_cuda([ | ||
| + ] + if_cuda_or_rocm([ |
There was a problem hiding this comment.
Thanks!
this patch looks like for openxla/xla@9938bdb, so curious about the reason to skip the modify of load("//xla/stream_executor:build_defs.bzl", "if_cuda_or_rocm", "if_gpu_is_configured")?
since GPU CI failed with the same issue: RuntimeError: torch_xla/csrc/device.cpp:72 : Invalid device specification: CUDA:0, are they related too?
There was a problem hiding this comment.
No particular reason.
I started importing on Oct 3 and this change is Oct 4.
6c59c2c to
3f57cd1
Compare
b97aa10 to
2dc72ab
Compare
2dc72ab to
af8bb2f
Compare
| "//openxla_patches:gpu_topk_rewriter.diff", | ||
| ], | ||
| strip_prefix = "xla-97a5f819faf9ff793b7ba68ff1f31f74f9459c18", | ||
| strip_prefix = "xla-51b59cfb1999c6f1b3ec59851675044b2c502aae", |
There was a problem hiding this comment.
Thanks for moving the head to this commit!
| base_dir = os.path.dirname(os.path.abspath(__file__)) | ||
|
|
||
| _libtpu_version = '0.1.dev20230825' | ||
| _libtpu_version = '0.1.dev20231009' |
There was a problem hiding this comment.
I suspect this should be 0.1.dev20231010 in order to include the open xla commit you specified.
alanwaketan
left a comment
There was a problem hiding this comment.
LGTM. Let me enable TPU CI and wait until it finishes.
Open XLA pin update - updated to 20231010
Open XLA pin update - updated to 20231010
Open XLA pin update - updated to 20231010
Open XLA pin update - updated to 20231010
Open XLA pin update - updated to 20231010
Open XLA pin update - updated to 20231010
No description provided.