pin update by lsy323 · Pull Request #8908 · pytorch/xla

lsy323 · 2025-03-31T20:17:46Z

Accommodate the following changes:

xla::Shape::rank() is renamed to xla::Shape::dimensions_size
change of xla::Shape ctor
PJRT compilation API change

ysiraichi · 2025-04-02T17:42:34Z

I was able to compile by adding the following patch to OpenXLA:

iff --git a/xla/service/gpu/model/gpu_collective_performance_model.cc b/xla/service/gpu/model/gpu_collective_performance_model.cc
index 496969f545..2d9f73ee36 100644
--- a/xla/service/gpu/model/gpu_collective_performance_model.cc
+++ b/xla/service/gpu/model/gpu_collective_performance_model.cc
@@ -34,7 +34,7 @@ limitations under the License.

 #if GOOGLE_CUDA
 #include "third_party/gpus/cuda/include/cuda.h"
-#include "third_party/gpus/cuda/nvml/include/nvml.h"
+#include "third_party/gpus/cuda/include/nvml.h"
 #endif  // GOOGLE_CUDA
 namespace xla {
 namespace gpu {
diff --git a/xla/service/gpu/model/gpu_collective_performance_model.h b/xla/service/gpu/model/gpu_collective_performance_model.h
index 01c3f3eb45..f44057602b 100644
--- a/xla/service/gpu/model/gpu_collective_performance_model.h
+++ b/xla/service/gpu/model/gpu_collective_performance_model.h
@@ -32,7 +32,7 @@ limitations under the License.
 #include <dlfcn.h>
 #endif

-#include "third_party/gpus/cuda/nvml/include/nvml.h"
+#include "third_party/gpus/cuda/include/nvml.h"
 // Below is a list of function pointers to be used
 // for querying device properties through nvml library.
 #define NVML_FUNCTOR(name, rettype, args) \

lsy323 · 2025-04-02T18:52:22Z

I was able to compile by adding the following patch to OpenXLA:

iff --git a/xla/service/gpu/model/gpu_collective_performance_model.cc b/xla/service/gpu/model/gpu_collective_performance_model.cc
index 496969f545..2d9f73ee36 100644
--- a/xla/service/gpu/model/gpu_collective_performance_model.cc
+++ b/xla/service/gpu/model/gpu_collective_performance_model.cc
@@ -34,7 +34,7 @@ limitations under the License.

 #if GOOGLE_CUDA
 #include "third_party/gpus/cuda/include/cuda.h"
-#include "third_party/gpus/cuda/nvml/include/nvml.h"
+#include "third_party/gpus/cuda/include/nvml.h"
 #endif  // GOOGLE_CUDA
 namespace xla {
 namespace gpu {
diff --git a/xla/service/gpu/model/gpu_collective_performance_model.h b/xla/service/gpu/model/gpu_collective_performance_model.h
index 01c3f3eb45..f44057602b 100644
--- a/xla/service/gpu/model/gpu_collective_performance_model.h
+++ b/xla/service/gpu/model/gpu_collective_performance_model.h
@@ -32,7 +32,7 @@ limitations under the License.
 #include <dlfcn.h>
 #endif

-#include "third_party/gpus/cuda/nvml/include/nvml.h"
+#include "third_party/gpus/cuda/include/nvml.h"
 // Below is a list of function pointers to be used
 // for querying device properties through nvml library.
 #define NVML_FUNCTOR(name, rettype, args) \

Thank you @ysiraichi! I added this patch for now.

lsy323 · 2025-04-02T21:36:59Z

Persistent cache test is failing on GPU, due to deserialization issue. Skipping the test for now and will file a Github Issue for this.

[ RUN      ] PersistentCacheTest.test_persistent_cache_mp
E0402 19:38:44.492582203   21190 server_chttp2.cc:40]        ***"created":"@1743622724.492561846","description":"Only 1 addresses added out of total 2 resolved","file":"external/com_github_grpc_grpc/src/core/ext/transport/chttp2/server/chttp2_server.cc","file_line":404,"referenced_errors":[***"created":"@1743622724.492558996","description":"Address family not supported by protocol","errno":97,"file":"external/com_github_grpc_grpc/src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":420,"os_error":"Address family not supported by protocol","syscall":"socket","target_address":"[::1]:8547"***]***
E0402 19:38:58.918947906   22327 server_chttp2.cc:40]        ***"created":"@1743622738.918928838","description":"Only 1 addresses added out of total 2 resolved","file":"external/com_github_grpc_grpc/src/core/ext/transport/chttp2/server/chttp2_server.cc","file_line":404,"referenced_errors":[***"created":"@1743622738.918926006","description":"Address family not supported by protocol","errno":97,"file":"external/com_github_grpc_grpc/src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":420,"os_error":"Address family not supported by protocol","syscall":"socket","target_address":"[::1]:8547"***]***
2025-04-02 19:39:05.879197: W torch_xla/csrc/runtime/pjrt_computation_client.cc:680] Failed to deserialize executable: UNIMPLEMENTED: Deserializing serialized executable not supported.
2025-04-02 19:39:05.880553: W torch_xla/csrc/runtime/pjrt_computation_client.cc:680] Failed to deserialize executable: UNIMPLEMENTED: Deserializing serialized executable not supported.
2025-04-02 19:39:05.887184: W torch_xla/csrc/runtime/pjrt_computation_client.cc:680] Failed to deserialize executable: UNIMPLEMENTED: Deserializing serialized executable not supported.
2025-04-02 19:39:05.892825: W torch_xla/csrc/runtime/pjrt_computation_client.cc:680] Failed to deserialize executable: UNIMPLEMENTED: Deserializing serialized executable not supported.

ManfeiBai

Thanks for the amazing work, its a really huge change adopt PR, LGTM

vanbasten23 · 2025-04-03T22:07:39Z

Thanks @lsy323 for updating the pin.

Regarding the paged_attention hang, could you update this line

xla/torch_xla/experimental/custom_kernel.py

Line 1212 in c044c69

step = torch.zeros((1,), dtype=torch.int32).to("xla")

to step = torch.ones((1,), dtype=torch.int32).to("xla")? It should make the test pass. I tested locally.

lsy323 · 2025-04-03T23:32:11Z

Thanks @lsy323 for updating the pin.

Regarding the paged_attention hang, could you update this line

xla/torch_xla/experimental/custom_kernel.py

Line 1212 in c044c69

step = torch.zeros((1,), dtype=torch.int32).to("xla")

to step = torch.ones((1,), dtype=torch.int32).to("xla")? It should make the test pass. I tested locally.

Thanks @vanbasten23! Updated the PR. Also do you mind elaborating a bit on this?

#8908 accidentally enabled some pallas tests on CPU, which is not supported

vanbasten23 · 2025-04-04T16:50:42Z

Thanks @lsy323 for updating the pin.
Regarding the paged_attention hang, could you update this line

xla/torch_xla/experimental/custom_kernel.py

Line 1212 in c044c69

step = torch.zeros((1,), dtype=torch.int32).to("xla")

to step = torch.ones((1,), dtype=torch.int32).to("xla")? It should make the test pass. I tested locally.

Thanks @vanbasten23! Updated the PR. Also do you mind elaborating a bit on this?

Yeah, jax-ml/jax@8c73799 made a change (it's not a bug but a valid change). As a result, the torch_xla wrapper needs to change accordingly.

lsy323 added 5 commits March 31, 2025 20:17

pin update

cb8b7b1

fix test build

f64cf83

enable operand shape in hlo print

aff424f

remove invalid test

2fb9bc6

skip hanging pallas test

6dfbb92

ysiraichi reviewed Apr 1, 2025

View reviewed changes

Comment thread WORKSPACE

lsy323 added 2 commits April 1, 2025 20:54

not use hermetic cuda

8b8af83

fix a xla::shape init

e10c96d

ysiraichi mentioned this pull request Apr 2, 2025

torch.distributed.all_reduce not converted to stableHLO #8854

Closed

add gpu patch

4b0c946

skip persistent cache test for GPU

94d9ecc

This was referenced Apr 3, 2025

Persistent cache doesn't work for GPU/TPU #8930

Open

[Pallas] Paged attention hanging #8931

Closed

lsy323 marked this pull request as ready for review April 3, 2025 19:20

ManfeiBai reviewed Apr 3, 2025

View reviewed changes

Comment thread test/cpp/test_lazy.cpp

tengyifei requested changes Apr 3, 2025

View reviewed changes

Comment thread test/test_pallas.py Outdated

ManfeiBai approved these changes Apr 3, 2025

View reviewed changes

qihqi approved these changes Apr 3, 2025

View reviewed changes

lsy323 requested a review from tengyifei April 3, 2025 22:54

lsy323 enabled auto-merge (squash) April 3, 2025 22:54

fix pallas test

77492c2

tengyifei approved these changes Apr 4, 2025

View reviewed changes

lsy323 merged commit 6400e16 into master Apr 4, 2025
18 of 21 checks passed

lsy323 deleted the lsiyuan/pin-update-0331 branch April 4, 2025 01:00

tengyifei added a commit that referenced this pull request Apr 4, 2025

Fix up pin update

72692bf

#8908 accidentally enabled some pallas tests on CPU, which is not supported

tengyifei mentioned this pull request Apr 4, 2025

Fix up pin update #8935

Merged

vanbasten23 reviewed Apr 4, 2025

View reviewed changes

Comment thread test/test_pallas.py

LPanosTT mentioned this pull request Apr 7, 2025

User built torch-xla wheel fails on import #8940

Closed

ysiraichi mentioned this pull request Apr 9, 2025

GPU master test failure: test_python_ops.py #8951

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pin update#8908

pin update#8908
lsy323 merged 10 commits intomasterfrom
lsiyuan/pin-update-0331

lsy323 commented Mar 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

ysiraichi commented Apr 2, 2025

Uh oh!

lsy323 commented Apr 2, 2025

Uh oh!

lsy323 commented Apr 2, 2025

Uh oh!

Uh oh!

Uh oh!

ManfeiBai left a comment

Uh oh!

vanbasten23 commented Apr 3, 2025

Uh oh!

lsy323 commented Apr 3, 2025

Uh oh!

Uh oh!

vanbasten23 commented Apr 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

lsy323 commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ysiraichi commented Apr 2, 2025

Uh oh!

lsy323 commented Apr 2, 2025

Uh oh!

lsy323 commented Apr 2, 2025

Uh oh!

Uh oh!

Uh oh!

ManfeiBai left a comment

Choose a reason for hiding this comment

Uh oh!

vanbasten23 commented Apr 3, 2025

Uh oh!

lsy323 commented Apr 3, 2025

Uh oh!

Uh oh!

vanbasten23 commented Apr 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

lsy323 commented Mar 31, 2025 •

edited

Loading