[dashboard] Fix retrieving IP address from the GPUProfilingManager on the dashboard agent#53807
Conversation
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
There was a problem hiding this comment.
Pull Request Overview
This PR resolves an issue with retrieving the IP address on MacOS by changing the implementation to pass the IP address directly from the dashboard agent instead of determining it locally. Key changes include:
- Passing a new ip_address parameter to GpuProfilingManager in the tests.
- Updating reporter_agent.py to construct GpuProfilingManager with the ip_address.
- Removing the socket-based IP address resolution logic from gpu_profile_manager.py in favor of the passed ip_address.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| python/ray/dashboard/modules/reporter/tests/test_gpu_profiler_manager.py | Tests updated to pass a mock IP address instead of relying on local resolution. |
| python/ray/dashboard/modules/reporter/reporter_agent.py | GpuProfilingManager construction updated to include the ip_address parameter. |
| python/ray/dashboard/modules/reporter/gpu_profile_manager.py | init modified to accept ip_address and removed socket usage for IP retrieval. |
Comments suppressed due to low confidence (1)
python/ray/dashboard/modules/reporter/gpu_profile_manager.py:63
- [nitpick] Update the constructor's docstring to document the new ip_address parameter for improved clarity for future maintainers.
def __init__(self, profile_dir_path: str, ip_address: str):
| MOCK_IP_ADDRESS = "127.0.0.1" | ||
|
|
There was a problem hiding this comment.
[nitpick] Consider refactoring the use of MOCK_IP_ADDRESS into a dedicated fixture to reduce repetition and make the tests more maintainable.
| MOCK_IP_ADDRESS = "127.0.0.1" | |
| @pytest.fixture | |
| def mock_ip_address(): | |
| yield "127.0.0.1" |
edoakes
left a comment
There was a problem hiding this comment.
@justinvyu separately, the naming in these files is sloppy/inconsistent, suggest to clean it up:
- Should be
GPUProfilingManager, notGpuProfilingManagerbecauseGPUis an initialism - The file is called
gpu_profile_manager.py, the class isGpuProfilingManager, and the test isgpu_profiler_manager.py. Stick to "profile" or "profiling" and make it consistent.
python/ray/dashboard/modules/reporter/tests/test_gpu_profiler_manager.py
Outdated
Show resolved
Hide resolved
|
@justinvyu please also include details about the error this fixes (including the traceback) and any manual testing steps you took to verify that it fixes the issue. |
|
@edoakes I'm not able to reproduce this on my machine -- I tried the |
Ah I'm realizing now I was running it against the version of that test in this branch: #53802 Not sure the difference, but it only failed on that branch for me as well. |
|
Prior to this change, it failed ~1/10 times with this traceback: With the change in this PR, I haven't observed any failures after >100 runs. |
|
|
||
|
|
||
| class GpuProfilingManager: | ||
| class GPUProfilingManager: |
There was a problem hiding this comment.
this renaming seems to not related to the bug fixing?
There was a problem hiding this comment.
Yeah it's unrelated to make some naming more conventional
There was a problem hiding this comment.
maybe consider splitting the rename to another PR? or at least when cherrypicking, only cherrypick the fix without the renaming?
There was a problem hiding this comment.
Sounds good.
cc @edoakes Going to revert the naming changes for this PR and do them in a followup to have a minimal fix for cherry-pick.
There was a problem hiding this comment.
yep, better PR hygiene to split behavior changes and refactoring
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
5073a7a to
3343023
Compare
…on the dashboard agent (ray-project#53807) The current code to get the current IP address errors for some versions of MacOS. Instead, we can pass in the IP address from the dashboard agent directly. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com>
…on the dashboard agent (#53807) The current code to get the current IP address errors for some versions of MacOS. Instead, we can pass in the IP address from the dashboard agent directly. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
…on the dashboard agent (#53807) The current code to get the current IP address errors for some versions of MacOS. Instead, we can pass in the IP address from the dashboard agent directly. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
…ingManager on the dashboard agent (ray-project#53817) This is a cherry-pick of ray-project#53807 Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Summary
The current code to get the current IP address errors for some versions of MacOS. Instead, we can pass in the IP address from the dashboard agent directly.
This code flakily fails on some Mac versions with this error: