-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Description
What is the problem?
ray latest master
python 3.7.6
OS: redhat 7.7
I have a script that if I run it it works fine, the second time it runs fine to completion (no changes in parameters), but either on the 3rd or fourth time I get the following segfault:
$/opt/conda/bin/python test_program.py
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0602 18:09:25.494333 1188864 1188864 global_state_accessor.cc:25] Redis server address = 192.1681.1.183:6379, is test flag = 0
I0602 18:09:25.495889 1188864 1188864 redis_client.cc:141] RedisClient connected.
I0602 18:09:25.503971 1188864 1188864 redis_gcs_client.cc:88] RedisGcsClient Connected.
I0602 18:09:25.504889 1188864 1188864 service_based_gcs_client.cc:75] ServiceBasedGcsClient Connected.
E0602 18:09:25.512972839 1188864 server_chttp2.cc:40] {"created":"@1591121365.512899346","description":"No address added out of total 1 resolved","file":"external/com_github_grpc_grpc/src/core/ext/transport/chttp2/server/chttp2_server.cc","file_line":394,"referenced_errors":[{"created":"@1591121365.512896692","description":"Failed to add any wildcard listeners","file":"external/com_github_grpc_grpc/src/core/lib/iomgr/tcp_server_posix.cc","file_line":341,"referenced_errors":[{"created":"@1591121365.512886167","description":"Unable to configure socket","fd":15,"file":"external/com_github_grpc_grpc/src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":208,"referenced_errors":[{"created":"@1591121365.512882261","description":"Address already in use","errno":98,"file":"external/com_github_grpc_grpc/src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":181,"os_error":"Address already in use","syscall":"bind"}]},{"created":"@1591121365.512896152","description":"Unable to configure socket","fd":15,"file":"external/com_github_grpc_grpc/src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":208,"referenced_errors":[{"created":"@1591121365.512894081","description":"Address already in use","errno":98,"file":"external/com_github_grpc_grpc/src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":181,"os_error":"Address already in use","syscall":"bind"}]}]}]}
*** Aborted at 1591121365 (unix time) try "date -d @1591121365" if you are using GNU date ***
PC: @ 0x0 (unknown)
*** SIGSEGV (@0x0) received by PID 1188864 (TID 0x7f27706b6740) from PID 0; stack trace: ***
@ 0x7f2770297630 (unknown) @ 0x7f2768732002 grpc::ServerInterface::RegisteredAsyncRequest::IssueRequest() @ 0x7f27683e0c09 ray::rpc::CoreWorkerService::WithAsyncMethod_AssignTask<>::RequestAssignTask() @ 0x7f276841a1ab ray::rpc::ServerCallFactoryImpl<>::CreateCall() @ 0x7f276868ddc1 ray::rpc::GrpcServer::Run() @ 0x7f276842ea92 ray::CoreWorker::CoreWorker() @ 0x7f27684327f4 ray::CoreWorkerProcess::CreateWorker() @ 0x7f2768432d7f ray::CoreWorkerProcess::CoreWorkerProcess() @ 0x7f27684333cb ray::CoreWorkerProcess::Initialize() @ 0x7f276839d3c4 __pyx_pw_3ray_7_raylet_10CoreWorker_1__cinit__() @ 0x7f276839e4a5 __pyx_tp_new_3ray_7_raylet_CoreWorker() @ 0x563da5958dc9 _PyObject_FastCallKeywords @ 0x563da59a8e8f _PyEval_EvalFrameDefault @ 0x563da58fe030 _PyEval_EvalCodeWithName @ 0x563da5943917 _PyFunction_FastCallKeywords @ 0x563da59a50a6 _PyEval_EvalFrameDefault @ 0x563da58fd6f9 _PyEval_EvalCodeWithName @ 0x563da5943917 _PyFunction_FastCallKeywords @ 0x563da59a50a6 _PyEval_EvalFrameDefault @ 0x563da58fd6f9 _PyEval_EvalCodeWithName @ 0x563da58fe5f4 PyEval_EvalCodeEx @ 0x563da58fe61c PyEval_EvalCode @ 0x563da59ff974 run_mod @ 0x563da5a09cf1 PyRun_FileExFlags @ 0x563da5a09ee3 PyRun_SimpleFileExFlags @ 0x563da5a0af95 pymain_main @ 0x563da5a0b0bc _Py_UnixMain @ 0x7f276fedc545 __libc_start_main @ 0x563da59b3990 (unknown)Segmentation fault
At this point ray is in a bad state and I need to restart the cluster. Any help would be appreciated.
thanks,
Luke