Skip to content

[torch] Preload rocprofiler-sdk to fix nightly smoketests#4065

Merged
darren-amd merged 2 commits intomainfrom
users/darren-amd/torch-add-rocprofsdk
Mar 19, 2026
Merged

[torch] Preload rocprofiler-sdk to fix nightly smoketests#4065
darren-amd merged 2 commits intomainfrom
users/darren-amd/torch-add-rocprofsdk

Conversation

@darren-amd
Copy link
Copy Markdown
Contributor

Motivation

Fixes #3962

Technical Details

  • Adds rocprofiler-sdk to LINUX_LIBRARY_PRELOADS in build_prod_wheels.py so that librocprofiler-sdk.so is loaded
  • Registers rocprofiler-sdk as a LibraryEntry in _dist_info.py so the rocm_sdk package can resolve the name to the actual .so file.

Test Plan

  • Verify that ROCm builds, the nightly smoke tests pass and that running the torch tests do not crash

Test Result

Submission Checklist

@darren-amd
Copy link
Copy Markdown
Contributor Author

Thanks Scott! Waiting on CI to pass before merging.

@darren-amd darren-amd merged commit 6a55012 into main Mar 19, 2026
158 checks passed
@darren-amd darren-amd deleted the users/darren-amd/torch-add-rocprofsdk branch March 19, 2026 21:13
@github-project-automation github-project-automation bot moved this from TODO to Done in TheRock Triage Mar 19, 2026
chiranjeevipattigidi pushed a commit that referenced this pull request Mar 23, 2026
## Motivation

Fixes #3962

- The `rocprofiler-sdk` shared library is not being preloaded, causing
`librocprofiler-sdk.so.1` to be missing at runtime. This is because the
PyTorch `kineto` submodule was bumped which switched from `roctracer` to
`rocprofiler-sdk`: pytorch/pytorch#177101
- `test_mempool_expandable` was enabled on ROCm by
pytorch/pytorch#173330. This test was failing as
it requires the rocm[devel] packages but was causing a crash:
https://github.com/ROCm/TheRock/actions/runs/23164829934/job/67321547840.
This test is currently already skipped for other torch versions.
- Also skip `test_mempool_empty_cache_inactive`,
`test_mempool_limited_memory_with_allocator`,
`test_deleted_mempool_not_used_on_oom`, and
`test_mempool_ctx_multithread` as these also require building
`dummy_allocator` and are skipped in other torch versions.

## Technical Details

- Adds `rocprofiler-sdk` to `LINUX_LIBRARY_PRELOADS` in
`build_prod_wheels.py` so that `librocprofiler-sdk.so` is loaded
- Registers `rocprofiler-sdk` as a `LibraryEntry` in `_dist_info.py` so
the `rocm_sdk` package can resolve the name to the actual `.so` file.

## Test Plan

- Verify that ROCm builds, the nightly smoke tests pass and that running
the torch tests do not crash

## Test Result

- ROCm builds successfully:
https://github.com/ROCm/TheRock/actions/runs/23152017500
- Smoke tests pass for torch nightly and the runner is not crashing:
https://github.com/ROCm/TheRock/actions/runs/23253453219

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Issue] Linux PyTorch smoketests failing due to Missing librocprofiler-sdk.so.1 library during GPU info retrieval

2 participants