-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Closed
Description
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Redhat 7.6
- Ray installed from (source or binary): binary
- Ray version: 0.8.0.dev4 (whl from 9/4/2019)
- Python version: 3.7.3
- Exact command to reproduce:
Describe the problem
Periodically the log_monitor.py on a node fails. I can run code a dozen times without this error but have seen it twice now on different runs, same code being run each time.
Source code / logs
The following TypeError is raised as seen from the driver stdout:
2019-09-06 12:11:34,299 WARNING worker.py:1797 -- The log monitor on node machine23 failed with the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/ray/log_monitor.py", line 300, in <module>
log_monitor.run()
File "/opt/conda/lib/python3.7/site-packages/ray/log_monitor.py", line 250, in run
self.open_closed_files()
File "/opt/conda/lib/python3.7/site-packages/ray/log_monitor.py", line 139, in open_closed_files
self.close_all_files()
File "/opt/conda/lib/python3.7/site-packages/ray/log_monitor.py", line 92, in close_all_files
os.kill(file_info.worker_pid, 0)
TypeError: an integer is required (got type str)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels