Skip to content

Error Handling: propagate status for ReleaseGilAndTransferData and XlaDataToTensors.#9431

Merged
ysiraichi merged 2 commits intomasterfrom
ysiraichi/propagate-status-for-oom
Jul 29, 2025
Merged

Error Handling: propagate status for ReleaseGilAndTransferData and XlaDataToTensors.#9431
ysiraichi merged 2 commits intomasterfrom
ysiraichi/propagate-status-for-oom

Conversation

@ysiraichi
Copy link
Copy Markdown
Collaborator

This PR refactors our error handling by replacing GetValueOrThrow with proper status propagation using absl::StatusOr<T> and XLA_ASSIGN_OR_RETURN macros.

Key Changes:

  • ReleaseGilAndTransferData Function:

    • Updated the function signature to return absl::StatusOr<std::vector<xla::Literal>>.
    • Replaced GetComputationClientOrDie() with GetComputationClient().
    • Utilized XLA_ASSIGN_OR_RETURN for client acquisition and TransferFromDevice calls.
    • Updated callers in tensor_util.cpp and xla_graph_executor.cpp to handle the new StatusOr<T> return type.
  • XlaDataToTensors Function:

    • Modified the function signature to return absl::StatusOr<std::vector<at::Tensor>>.
    • Replaced GetValueOrThrow with XLA_ASSIGN_OR_RETURN for the ReleaseGilAndTransferData call.
    • Updated all callers (including XLATensor::ToTensor, test_xla_sharding.cpp, init_python_bindings.cpp, and xla_backend_impl.cpp) to correctly handle the StatusOr<T> return type.
    • Added necessary status.h includes to xla_backend_impl.cpp and test_xla_sharding.cpp.

These modifications align with existing status propagation patterns in the codebase, as seen in pjrt_registry.cpp, and maintain API-level backward compatibility while improving internal error handling within the tensor conversion pipeline.

@ysiraichi

This comment was marked as outdated.

@ysiraichi ysiraichi force-pushed the ysiraichi/propagate-status-for-oom branch from 9d505e7 to 5d4742b Compare July 1, 2025 16:44
@ysiraichi ysiraichi force-pushed the ysiraichi/status-for-oom-errors branch from 247fdf5 to b390a61 Compare July 1, 2025 18:11
@ysiraichi ysiraichi force-pushed the ysiraichi/propagate-status-for-oom branch from 5d4742b to 40a75d7 Compare July 1, 2025 18:11
@ysiraichi ysiraichi force-pushed the ysiraichi/status-for-oom-errors branch from b390a61 to 821c384 Compare July 1, 2025 18:15
@ysiraichi ysiraichi force-pushed the ysiraichi/propagate-status-for-oom branch 2 times, most recently from b0e25da to 97ef4c1 Compare July 3, 2025 14:41
@ysiraichi ysiraichi force-pushed the ysiraichi/status-for-oom-errors branch from 821c384 to 08c5ecd Compare July 3, 2025 14:41
@ysiraichi ysiraichi force-pushed the ysiraichi/propagate-status-for-oom branch from 97ef4c1 to de09876 Compare July 3, 2025 15:42
@ysiraichi ysiraichi force-pushed the ysiraichi/status-for-oom-errors branch 3 times, most recently from 34abde6 to 103cd0f Compare July 24, 2025 12:40
@ysiraichi ysiraichi force-pushed the ysiraichi/propagate-status-for-oom branch from de09876 to bfeb70a Compare July 25, 2025 14:51
@ysiraichi ysiraichi changed the base branch from ysiraichi/status-for-oom-errors to master July 25, 2025 15:11
@ysiraichi ysiraichi marked this pull request as ready for review July 28, 2025 12:46
@ysiraichi ysiraichi force-pushed the ysiraichi/propagate-status-for-oom branch from bfeb70a to f64ae9e Compare July 28, 2025 12:48
Copy link
Copy Markdown
Collaborator

@zhanyong-wan zhanyong-wan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ysiraichi ysiraichi merged commit 1ed6b46 into master Jul 29, 2025
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants