Skip to content

Error Handling: refactor XlaCoordinator to use status types.#9386

Merged
ysiraichi merged 22 commits intomasterfrom
ysiraichi/status-qol-functions
Jul 1, 2025
Merged

Error Handling: refactor XlaCoordinator to use status types.#9386
ysiraichi merged 22 commits intomasterfrom
ysiraichi/status-qol-functions

Conversation

@ysiraichi
Copy link
Copy Markdown
Collaborator

@ysiraichi ysiraichi commented Jun 18, 2025

This PR does 3 different things:

  1. Adds functions and macros to be used with Status and StatusOr<T> constructs, so as to make it easier to propagate non-ok status.
  2. Introduces XLA_SHOW_CPP_ERROR_CONTEXT environment variable for toggling error context.
  3. Refactors XlaCoordinator to use the new status propagation constructs as an example

Mainly, inspired by tensorflow implementation, it introduces the following status propagation functions:

  • XLA_ASSIGN_OR_RETURN(LHS, REXPR, ...) , which either assigns the value held by REXPR to LHS if it holds a non-ok status code, or return the non-ok status.
  • XLA_RETURN_IF_ERROR(EXPR, ...), which early-returns if EXPR is a non-ok status, propagating the error to the caller.
  • XLA_ERROR_WITH_LOCATION(STATUS), which builds a new absl::Status from STATUS, by maybe appending the current file location

C++ Error Handling

Idea: by default, print only the user targeted message, with no extra C++ details. Then, by setting the XLA_SHOW_CPP_ERROR_CONTEXT environment variable, also print extra context information, such as C++ source location, and other contextual messages.

  • Callers may overwrite the error message by specifying the last optional argument of XLA_ASSIGN_OR_RETURN and XLA_RETURN_IF_ERROR. Rationale: they might have more context to create a more user-friendly message.
  • If XLA_SHOW_CPP_ERROR_CONTEXT is set, the callee overwritten error messages are shown in the following lines

Example errors:

  • When XLA_SHOW_CPP_ERROR_CONTEXT=0 (default):
RuntimeError: <message>
  • When XLA_SHOW_CPP_ERROR_CONTEXT=1:
RuntimeError: <message> (at <file>:<line>)
From Error: <previous-message> (at <file>:<line>)
...

@ysiraichi
Copy link
Copy Markdown
Collaborator Author

Should be merged only after #9384.

@ysiraichi ysiraichi marked this pull request as draft June 18, 2025 20:45
@ysiraichi ysiraichi force-pushed the ysiraichi/status-qol-functions branch 3 times, most recently from 3bd1d2c to 1b2ebca Compare June 19, 2025 16:15
@ysiraichi ysiraichi force-pushed the ysiraichi/status-for-getcomputationclient branch from a351a69 to 7e0da02 Compare June 19, 2025 16:34
@ysiraichi ysiraichi force-pushed the ysiraichi/status-qol-functions branch 5 times, most recently from eaac1c3 to 87d2d12 Compare June 21, 2025 16:08
@zhanyong-wan
Copy link
Copy Markdown
Collaborator

This PR is a draft? Is it ready for review? If yes, please take it out of draft. Thanks!

@ysiraichi
Copy link
Copy Markdown
Collaborator Author

This is a draft. I'm still working on it.

@ysiraichi ysiraichi force-pushed the ysiraichi/status-qol-functions branch 3 times, most recently from 510dcce to a1b26a4 Compare June 23, 2025 18:31
@ysiraichi ysiraichi force-pushed the ysiraichi/status-qol-functions branch 2 times, most recently from a78bc1a to 4b71836 Compare June 25, 2025 14:56
@ysiraichi ysiraichi changed the base branch from ysiraichi/status-for-getcomputationclient to master June 25, 2025 14:56
@ysiraichi ysiraichi changed the title Add Status and StatusOr<T> QOL functions. Error Handling: refactor XlaCoordinator to use status types. Jun 25, 2025
@ysiraichi ysiraichi force-pushed the ysiraichi/status-qol-functions branch from 4b71836 to 55214ac Compare June 25, 2025 15:38
Comment thread torch_xla/csrc/runtime/xla_coordinator.cpp
@ysiraichi ysiraichi marked this pull request as ready for review June 25, 2025 16:51
Copy link
Copy Markdown
Collaborator

@zhanyong-wan zhanyong-wan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Comment thread torch_xla/csrc/status.h Outdated
Comment thread torch_xla/csrc/runtime/env_vars.h Outdated
Comment thread torch_xla/csrc/runtime/xla_coordinator.h
Comment thread torch_xla/csrc/runtime/xla_coordinator.h
Comment thread torch_xla/csrc/runtime/xla_coordinator.h Outdated
Comment thread torch_xla/csrc/status.h
Comment thread torch_xla/csrc/status.cpp Outdated
Comment thread torch_xla/csrc/status.cpp Outdated
Comment thread torch_xla/csrc/status.cpp Outdated
Comment thread torch_xla/csrc/status.cpp Outdated
@ysiraichi ysiraichi force-pushed the ysiraichi/status-qol-functions branch from 61b33fe to 83c3e27 Compare June 27, 2025 20:26
Copy link
Copy Markdown
Collaborator

@zhanyong-wan zhanyong-wan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Yukio, in the future, to make it easier for reviewers to track comments, could you leave the comment threads open so that the reviewers can resolve them after verifying that they are addressed properly? Thanks!

Copy link
Copy Markdown
Collaborator

@zhanyong-wan zhanyong-wan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment thread torch_xla/csrc/status.h Outdated
Comment thread torch_xla/csrc/status.h
Comment thread torch_xla/csrc/status.h Outdated
Comment thread torch_xla/csrc/status.h Outdated
Comment thread torch_xla/csrc/status.h Outdated
Comment thread torch_xla/csrc/status.cpp Outdated
Comment thread torch_xla/csrc/runtime/xla_coordinator.h Outdated
Comment thread torch_xla/csrc/status.h Outdated
Comment thread test/cpp/test_status.cpp Outdated
Comment thread test/cpp/test_status.cpp Outdated
Comment thread test/cpp/test_status.cpp Outdated
Comment thread test/cpp/BUILD Outdated
Copy link
Copy Markdown
Collaborator

@zhanyong-wan zhanyong-wan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment thread .github/scripts/run_tests.sh Outdated
@ysiraichi
Copy link
Copy Markdown
Collaborator Author

@zhanyong-wan @ghpvnist Could you resolve the comments, so that I can merge this once the CI is green?

@zhanyong-wan
Copy link
Copy Markdown
Collaborator

@zhanyong-wan @ghpvnist Could you resolve the comments, so that I can merge this once the CI is green?

@ysiraichi , when the reviewer approves the PR, it means that they trust the author can address the open comments satisfactorily. In that case, you can just resolve open comments yourself after addressing them. Thanks!

@ysiraichi ysiraichi merged commit 5b8ddb1 into master Jul 1, 2025
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants