Skip to content

Crash on iOS when GRPC_ARG_DNS_ENABLE_SRV_QUERIES is enabled #40141

@jdv85

Description

@jdv85

What version of gRPC and what language are you using?

gRPC 1.69.0 and 1.73.1, C++

What operating system (Linux, Windows,...) and version?

iPhone simulator iOS 18.1, iPhone 16 running iOS 18.5

What runtime / compiler are you using (e.g. python version or version of gcc)

Xcode 16.1:

zsh 5853 % c++ -v
Apple clang version 16.0.0 (clang-1600.0.26.4)
Target: arm64-apple-darwin24.5.0
Thread model: posix
InstalledDir: /Applications/Xcode-16.1.0.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

What did you do?

After upgrading gRPC from 1.48.4, our application started crashing on a nullptr dereference in grpc_event_engine::experimental::DNSServiceResolver::LookupSRV().

As far as I can tell, this is caused by setting GRPC_ARG_DNS_ENABLE_SRV_QUERIES to 1 before trying to open a gRPC channel. Because c-ares is not supported on iOS1, this triggers an error path that uses a moved-from std::shared_ptr.

I have reproduced the issue in a new Xcode project created with the iOS application template, renaming main.m to main.mm (making it Objective-C++) and using this code:

#import <UIKit/UIKit.h>
#import "AppDelegate.h"

#include <grpcpp/grpcpp.h>

int main(int argc, char * argv[]) {

    grpc::ChannelArguments args{};
    args.SetInt(GRPC_ARG_DNS_ENABLE_SRV_QUERIES, 1);
    
    grpc::SslCredentialsOptions opts{};
    auto creds          = grpc::SslCredentials(opts);
    auto chan           = grpc::CreateCustomChannel("grpc.io", creds, args);
    
    chan->WaitForConnected(gpr_time_add(gpr_now(GPR_CLOCK_MONOTONIC), gpr_time_from_millis(60'000, GPR_TIMESPAN)));
    
    if (chan->GetState(false) != grpc_connectivity_state::GRPC_CHANNEL_READY) {
        NSLog(@"not ready");
    } else {
        NSLog(@"ready");
    }

    NSString * appDelegateClassName;
    @autoreleasepool {
        // Setup code that might create autoreleased objects goes here.
        appDelegateClassName = NSStringFromClass([AppDelegate class]);
    }
    
    return UIApplicationMain(argc, argv, nil, appDelegateClassName);
}

It happens both in the iPhone simulator and on a physical phone.

What did you expect to see?

gRPC should not crash.

What did you see instead?

gRPC crashes on a background thread in grpc_event_engine::experimental::DNSServiceResolver::LookupSRV().

As far as I can tell, the problem is that the DNSServiceResolver constructor moves from engine_ to the DNSServiceResolverImpl constructor. This causes DNSServiceResolver::LookupSRV() to reference a nullptr.

Here's an example stack trace of the crashing thread:

Thread 28 Crashed:
0   dc                            	0x000000010509c6e8 grpc_event_engine::experimental::DNSServiceResolver::LookupSRV(absl::lts_20240116::AnyInvocable<void (absl::lts_20240116::StatusOr<std::__1::vector<grpc_event_engine::experimental::EventEngine::DNSResolver::SRVRecord, std::__1::allocator<grpc_event_engine::experimental::EventEngine::DNSResolver::SRVRecord>>>)>, std::__1::basic_string_view<char, std::__1::char_traits<char>>) (in dc) + 132 + 1984232
1   dc                            	0x000000010521d228 grpc_core::(anonymous namespace)::EventEngineClientChannelDNSResolver::StartRequest() (in dc) + 1024 + 3559976
2   dc                            	0x000000010522adc0 grpc_core::PollingResolver::StartResolvingLocked() (in dc) + 36 + 3616192
3   dc                            	0x0000000104fb52e4 grpc_core::ClientChannelFilter::CreateResolverLocked() (in dc) + 560 + 1037028
4   dc                            	0x0000000104fc29c8 std::__1::__function::__func<grpc_core::ClientChannelFilter::CheckConnectivityState(bool)::$_11, std::__1::allocator<grpc_core::ClientChannelFilter::CheckConnectivityState(bool)::$_11>, void ()>::operator()() (in dc) + 68 + 1092040
5   dc                            	0x00000001052a1900 grpc_core::WorkSerializer::DispatchingWorkSerializer::Run() (in dc) + 212 + 4102400
6   dc                            	0x00000001050ad22c grpc_event_engine::experimental::WorkStealingThreadPool::ThreadState::Step() (in dc) + 448 + 2052652
7   dc                            	0x00000001050acc3c grpc_event_engine::experimental::WorkStealingThreadPool::ThreadState::ThreadBody() (in dc) + 376 + 2051132
8   dc                            	0x00000001050ad6a4 grpc_event_engine::experimental::WorkStealingThreadPool::WorkStealingThreadPoolImpl::StartThread()::$_0::__invoke(void*) (in dc) + 20 + 2053796
9   dc                            	0x000000010535eefc grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::'lambda'(void*)::__invoke(void*) (in dc) + 140 + 4878076
10  libsystem_pthread.dylib       	0x0000000223e6e7d0 _pthread_start + 136 (pthread.c:931)
11  libsystem_pthread.dylib       	0x0000000223e6e480 thread_start + 8 (:-1)

Proposed fix

I suggest copying instead of moving from engine_:
0001-Fix-nullptr-dereference-on-iOS.patch

Another approach would be to implement LookupSRV() and LookupTXT() on DNSServiceResolverImpl and let them return the error codes.

Footnotes

  1. See https://github.com/grpc/grpc/blob/9192ddf70f55c04cc93067c1dd3aaada1bc894c7/include/grpc/support/port_platform.h#L286-L288

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions