Skip to content

Commit d85ed28

Browse files
authored
[core] upgrade grpc to 1.58.0 to fix getenv races (#61195)
## Description grpc 1.57.1 will call `GetEnv("GRPC_EXPERIMENTAL_PICKFIRST_LB_CONFIG")` on every grpc channel establishment for parsing load-balancing policy. This causes race conditions between user tasks as they are allowed to do setenv at anytime. This PR upgrades the grpc lib to 1.58.0 to get rid of the `GetEnv("GRPC_EXPERIMENTAL_PICKFIRST_LB_CONFIG")`. ``` (gdb) bt #0 __pthread_kill_implementation (no_tid=0, signo=11, threadid=129183804413504) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=11, threadid=129183804413504) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=129183804413504, signo=signo@entry=11) at ./nptl/pthread_kill.c:89 #3 0x00007580a7545476 in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26 #4 <signal handler called> #5 __pthread_kill_implementation (no_tid=0, signo=11, threadid=129183804413504) at ./nptl/pthread_kill.c:44 #6 __pthread_kill_internal (signo=11, threadid=129183804413504) at ./nptl/pthread_kill.c:78 #7 __GI___pthread_kill (threadid=129183804413504, signo=signo@entry=11) at ./nptl/pthread_kill.c:89 #8 0x00007580a7545476 in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26 #9 <signal handler called> #10 __GI_getenv (name=0x7580a6a078c2 "PC_EXPERIMENTAL_PICKFIRST_LB_CONFIG") at ./stdlib/getenv.c:84 #11 0x00007580a67e8b8a in grpc_core::GetEnv(char const*) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #12 0x00007580a649601f in grpc_core::ShufflePickFirstEnabled() () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #13 0x00007580a64960ed in grpc_core::json_detail::FinishedJsonObjectLoader<grpc_core::(anonymous namespace)::PickFirstConfig, 1ul, void>::LoadInto(grpc_core::experimental::Json const&, grpc_core::JsonArgs const&, void*, grpc_core::ValidationErrors*) const () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #14 0x00007580a6787384 in grpc_core::json_detail::LoadWrapped::LoadInto(grpc_core::experimental::Json const&, grpc_core::JsonArgs const&, void*, grpc_core::ValidationErrors*) const () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #15 0x00007580a6497b07 in grpc_core::(anonymous namespace)::PickFirstFactory::ParseLoadBalancingConfig(grpc_core::experimental::Json const&) const () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #16 0x00007580a67c18a7 in grpc_core::LoadBalancingPolicyRegistry::ParseLoadBalancingConfig(grpc_core::experimental::Json const&) const () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #17 0x00007580a66ad9b8 in grpc_core::ClientChannel::OnResolverResultChangedLocked(grpc_core::Resolver::Result) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #18 0x00007580a66ae452 in grpc_core::ClientChannel::ResolverResultHandler::ReportResult(grpc_core::Resolver::Result) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #19 0x00007580a63bc603 in grpc_core::PollingResolver::OnRequestCompleteLocked(grpc_core::Resolver::Result) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #20 0x00007580a63bcb2d in std::_Function_handler<void (), grpc_core::PollingResolver::OnRequestComplete(grpc_core::Resolver::Result)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #21 0x00007580a67cbf46 in grpc_core::WorkSerializer::WorkSerializerImpl::Run(std::function<void ()>, grpc_core::DebugLocation const&) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #22 0x00007580a67cc0ea in grpc_core::WorkSerializer::Run(std::function<void ()>, grpc_core::DebugLocation const&) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #23 0x00007580a63bd117 in grpc_core::PollingResolver::OnRequestComplete(grpc_core::Resolver::Result) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #24 0x00007580a63b3f86 in grpc_core::(anonymous namespace)::AresClientChannelDNSResolver::AresRequestWrapper::OnHostnameResolved(void*, absl::lts_20230802::Status) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #25 0x00007580a67c44c4 in grpc_core::ExecCtx::Flush() () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #26 0x00007580a63408a2 in grpc_core::ExecCtx::~ExecCtx() () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #27 0x00007580a6740343 in grpc_call_start_batch () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #28 0x00007580a5e281e9 in grpc::internal::CallOpSet<grpc::internal::CallOpSendInitialMetadata, grpc::internal::CallOpSendMessage, grpc::internal::CallOpRecvInitialMetadata, grpc::internal::CallOpRecvMessage<google::protobuf::MessageLite>, grpc::internal::CallOpClientSendClose, grpc::internal::CallOpClientRecvStatus>::ContinueFillOpsAfterInterception() () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #29 0x00007580a5e2d809 in grpc::internal::BlockingUnaryCallImpl<google::protobuf::MessageLite, google::protobuf::MessageLite>::BlockingUnaryCallImpl(grpc::ChannelInterface*, grpc::inte--Type <RET> for more, q to quit, c to continue without paging--c rnal::RpcMethod const&, grpc::ClientContext*, google::protobuf::MessageLite const&, google::protobuf::MessageLite*) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #30 0x00007580a62d76ea in opentelemetry::proto::collector::metrics::v1::MetricsService::Stub::Export(grpc::ClientContext*, opentelemetry::proto::collector::metrics::v1::ExportMetricsServiceRequest const&, opentelemetry::proto::collector::metrics::v1::ExportMetricsServiceResponse*) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #31 0x00007580a62ca40c in opentelemetry::v1::exporter::otlp::OtlpGrpcClient::DelegateExport(opentelemetry::proto::collector::metrics::v1::MetricsService::StubInterface*, std::unique_ptr<grpc::ClientContext, std::default_delete<grpc::ClientContext> >&&, std::unique_ptr<google::protobuf::Arena, std::default_delete<google::protobuf::Arena> >&&, opentelemetry::proto::collector::metrics::v1::ExportMetricsServiceRequest&&, opentelemetry::proto::collector::metrics::v1::ExportMetricsServiceResponse*) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #32 0x00007580a62c23ed in opentelemetry::v1::exporter::otlp::OtlpGrpcMetricExporter::Export(opentelemetry::v1::sdk::metrics::ResourceMetrics const&) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #33 0x00007580a62c0334 in (anonymous namespace)::OpenTelemetryMetricExporter::Export(opentelemetry::v1::sdk::metrics::ResourceMetrics const&) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #34 0x00007580a62e5fdf in opentelemetry::v1::sdk::metrics::PeriodicExportingMetricReader::CollectAndExportOnce()::{lambda()#1}::operator()() const::{lambda(opentelemetry::v1::sdk::metrics::ResourceMetrics&)#1}::operator()(opentelemetry::v1::sdk::metrics::ResourceMetrics&) const () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #35 0x00007580a62ee7a6 in opentelemetry::v1::sdk::metrics::MetricReader::Collect(opentelemetry::v1::nostd::function_ref<bool (opentelemetry::v1::sdk::metrics::ResourceMetrics&)>) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #36 0x00007580a62e5085 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<opentelemetry::v1::sdk::metrics::PeriodicExportingMetricReader::CollectAndExportOnce()::{lambda()#1}> > >::_M_run() () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #37 0x00007580a6997be0 in execute_native_thread_routine () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so #38 0x00007580a7597ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 #39 0x00007580a76298d0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 ``` Signed-off-by: Rueian Huang <rueiancsie@gmail.com>
1 parent 8ed1a05 commit d85ed28

1 file changed

Lines changed: 5 additions & 5 deletions

File tree

bazel/ray_deps_setup.bzl

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -277,8 +277,8 @@ def ray_deps_setup():
277277
auto_http_archive(
278278
name = "com_github_grpc_grpc",
279279
# NOTE: If you update this, also update @boringssl's hash.
280-
url = "https://github.com/grpc/grpc/archive/refs/tags/v1.57.1.tar.gz",
281-
sha256 = "0762f809b9de845e6a7c809cabccad6aa4143479fd43b396611fe5a086c0aeeb",
280+
url = "https://github.com/grpc/grpc/archive/refs/tags/v1.58.0.tar.gz",
281+
sha256 = "ec64fdab22726d50fc056474dd29401d914cc616f53ab8f2fe4866772881d581",
282282
patches = [
283283
"@io_ray//thirdparty/patches:grpc-cython-copts.patch",
284284
"@io_ray//thirdparty/patches:grpc-zlib-fdopen.patch",
@@ -321,11 +321,11 @@ def ray_deps_setup():
321321
auto_http_archive(
322322
# This rule is used by @com_github_grpc_grpc, and using a GitHub mirror
323323
# provides a deterministic archive hash for caching. Explanation here:
324-
# https://github.com/grpc/grpc/blob/1ff1feaa83e071d87c07827b0a317ffac673794f/bazel/grpc_deps.bzl#L189
324+
# https://github.com/grpc/grpc/blob/v1.58.0/bazel/grpc_deps.bzl
325325
# Ensure this rule matches the rule used by grpc's bazel/grpc_deps.bzl
326326
name = "boringssl",
327-
sha256 = "0675a4f86ce5e959703425d6f9063eaadf6b61b7f3399e77a154c0e85bad46b1",
328-
url = "https://github.com/google/boringssl/archive/342e805bc1f5dfdd650e3f031686d6c939b095d9.tar.gz",
327+
sha256 = "b21994a857a7aa6d5256ffe355c735ad4c286de44c6c81dfc04edc41a8feaeef",
328+
url = "https://github.com/google/boringssl/archive/2ff4b968a7e0cfee66d9f151cb95635b43dc1d5b.tar.gz",
329329
)
330330

331331
# The protobuf version we use to auto generate python and java code.

0 commit comments

Comments
 (0)