Skip to content

[Core] Don't try to gather check_parent_task on Windows.#13700

Merged
simon-mo merged 1 commit intoray-project:masterfrom
clarkzinzow:dashboard/hotfix/windows-check-parent-task
Jan 27, 2021
Merged

[Core] Don't try to gather check_parent_task on Windows.#13700
simon-mo merged 1 commit intoray-project:masterfrom
clarkzinzow:dashboard/hotfix/windows-check-parent-task

Conversation

@clarkzinzow
Copy link
Copy Markdown
Contributor

@clarkzinzow clarkzinzow commented Jan 25, 2021

Added Windows check to the site of the check_parent_task gather.

Why are these changes needed?

check_parent_task is only defined for non-Windows systems due to this commit, but is still gathered in Windows systems here, resulting in an UnboundLocalError:

Traceback (most recent call last):
  File "d:\a\ray\ray\python\ray\new_dashboard/agent.py", line 311, in <module>
    loop.run_until_complete(agent.run())
  File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\asyncio\base_events.py", line 587, in run_until_complete
    return future.result()
  File "d:\a\ray\ray\python\ray\new_dashboard/agent.py", line 187, in run
    await asyncio.gather(check_parent_task,
UnboundLocalError: local variable 'check_parent_task' referenced before assignment

This PR adds that Windows check to the site of the check_parent_task gather as well.

Related issue number

Hotfix, no issue.

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@clarkzinzow clarkzinzow changed the title [Core] Don't gather check_parent_task on Windows, since it's undefined. [Core] Don't try to gather check_parent_task on Windows. Jan 25, 2021
@amogkam amogkam linked an issue Jan 26, 2021 that may be closed by this pull request
2 tasks
@simon-mo
Copy link
Copy Markdown
Contributor

Hmm seems like creating even more spam:


(pid=None) --- Logging error ---

(pid=None) Traceback (most recent call last):

(pid=None)   File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\logging\handlers.py", line 69, in emit

(pid=None)     if self.shouldRollover(record):

(pid=None)   File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\logging\handlers.py", line 183, in shouldRollover

(pid=None)     self.stream = self._open()

(pid=None)   File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\logging\__init__.py", line 1116, in _open

(pid=None)     return open(self.baseFilename, self.mode, encoding=self.encoding)

(pid=None) NameError: name 'open' is not defined

(pid=None) Call stack:

(pid=None)   File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\site-packages\aiohttp\client.py", line 320, in __del__

(pid=None)     self._loop.call_exception_handler(context)

(pid=None)   File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\asyncio\base_events.py", line 1645, in call_exception_handler

(pid=None)     self.default_exception_handler(context)

(pid=None)   File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\asyncio\base_events.py", line 1619, in default_exception_handler

(pid=None)     logger.error('\n'.join(log_lines), exc_info=exc_info)

(pid=None) Message: 'Unclosed client session\nclient_session: <aiohttp.client.ClientSession object at 0x000002AADBACD788>'

(pid=None) Arguments: ()

(pid=None) --- Logging error ---

(pid=None) Traceback (most recent call last):

(pid=None)   File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\logging\handlers.py", line 69, in emit

(pid=None)     if self.shouldRollover(record):

(pid=None)   File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\logging\handlers.py", line 183, in shouldRollover

(pid=None)     self.stream = self._open()

(pid=None)   File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\logging\__init__.py", line 1116, in _open

(pid=None)     return open(self.baseFilename, self.mode, encoding=self.encoding)

(pid=None) NameError: name 'open' is not defined

(pid=None) Call stack:

(pid=None)   File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\site-packages\aiohttp\client.py", line 320, in __del__

(pid=None)     self._loop.call_exception_handler(context)

(pid=None)   File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\asyncio\base_events.py", line 1645, in call_exception_handler

(pid=None)     self.default_exception_handler(context)

(pid=None)   File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\asyncio\base_events.py", line 1619, in default_exception_handler

(pid=None)     logger.error('\n'.join(log_lines), exc_info=exc_info)

(pid=None) Message: 'Unclosed client session\nclient_session: <aiohttp.client.ClientSession object at 0x000002AADBACD788>'

(pid=None) Arguments: ()

(pid=None) --- Logging error ---

(pid=None) Traceback (most recent call last):

(pid=None)   File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\logging\handlers.py", line 69, in emit

(pid=None)     if self.shouldRollover(record):

(pid=None)   File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\logging\handlers.py", line 183, in shouldRollover

(pid=None)     self.stream = self._open()

(pid=None)   File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\logging\__init__.py", line 1116, in _open

(pid=None)     return open(self.baseFilename, self.mode, encoding=self.encoding)

(pid=None) NameError: name 'open' is not defined

(pid=None) Call stack:

(pid=None)   File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\site-packages\aiohttp\client.py", line 320, in __del__

(pid=None)     self._loop.call_exception_handler(context)

(pid=None)   File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\asyncio\base_events.py", line 1645, in call_exception_handler

(pid=None)     self.default_exception_handler(context)

(pid=None)   File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\asyncio\base_events.py", line 1619, in default_exception_handler

(pid=None)     logger.error('\n'.join(log_lines), exc_info=exc_info)

(pid=None) Message: 'Unclosed client session\nclient_session: <aiohttp.client.ClientSession object at 0x000002AADBACD788>'

(pid=None) Arguments: ()

2021-01-26 00:57:37,125	WARNING worker.py:1107 -- The agent on node fv-az177-56 failed with the following error:
Traceback (most recent call last):
  File "d:\a\ray\ray\python\ray\new_dashboard/agent.py", line 313, in <module>
    loop.run_until_complete(agent.run())
  File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\asyncio\base_events.py", line 587, in run_until_complete
    return future.result()
  File "d:\a\ray\ray\python\ray\new_dashboard/agent.py", line 137, in run
    modules = self._load_modules()
  File "d:\a\ray\ray\python\ray\new_dashboard/agent.py", line 91, in _load_modules
    c = cls(self)
  File "d:\a\ray\ray\python\ray\new_dashboard\modules\reporter\reporter_agent.py", line 72, in __init__
    self._metrics_agent = MetricsAgent(dashboard_agent.metrics_export_port)
  File "d:\a\ray\ray\python\ray\metrics_agent.py", line 76, in __init__
    namespace="ray", port=metrics_export_port)))
  File "d:\a\ray\ray\python\ray\prometheus_exporter.py", line 334, in new_stats_exporter
    options=option, gatherer=option.registry, collector=collector)
  File "d:\a\ray\ray\python\ray\prometheus_exporter.py", line 266, in __init__
    self.serve_http()
  File "d:\a\ray\ray\python\ray\prometheus_exporter.py", line 321, in serve_http
    port=self.options.port, addr=str(self.options.address))
  File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\site-packages\prometheus_client\exposition.py", line 79, in start_wsgi_server
    httpd = make_server(addr, port, app, ThreadingWSGIServer, handler_class=_SilentHandler)
  File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\wsgiref\simple_server.py", line 153, in make_server
    server = server_class((host, port), handler_class)
  File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\socketserver.py", line 452, in __init__
    self.server_bind()
  File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\wsgiref\simple_server.py", line 50, in server_bind
    HTTPServer.server_bind(self)
  File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\http\server.py", line 137, in server_bind
    socketserver.TCPServer.server_bind(self)
  File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\socketserver.py", line 466, in server_bind
    self.socket.bind(self.server_address)
OSError: [WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions

@fyrestone
Copy link
Copy Markdown
Contributor

fyrestone commented Jan 27, 2021

OSError: [WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions

This error may be caused by the port conflict. It is not related to the check_parent_task.

@clarkzinzow clarkzinzow force-pushed the dashboard/hotfix/windows-check-parent-task branch from 1ae59bb to a33b688 Compare January 27, 2021 06:18
@clarkzinzow clarkzinzow force-pushed the dashboard/hotfix/windows-check-parent-task branch from a33b688 to ec76a37 Compare January 27, 2021 06:19
@clarkzinzow clarkzinzow requested a review from fyrestone January 27, 2021 06:20
Copy link
Copy Markdown
Contributor

@fyrestone fyrestone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Hope we can fix the root cause as soon as possible.

@simon-mo simon-mo merged commit 2d34e95 into ray-project:master Jan 27, 2021
fishbone added a commit to fishbone/ray that referenced this pull request Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

new_dashboard metrics agent crashed in Windows CI

3 participants