Skip to content

[Fixit][CI] Retry external DNS lookups on transient NOT_FOUND in cf_engine_test#42185

Closed
pawbhard wants to merge 4 commits into
grpc:masterfrom
pawbhard:retry_dns
Closed

[Fixit][CI] Retry external DNS lookups on transient NOT_FOUND in cf_engine_test#42185
pawbhard wants to merge 4 commits into
grpc:masterfrom
pawbhard:retry_dns

Conversation

@pawbhard

Copy link
Copy Markdown
Contributor

…ngine_test

TestResolveRemote, TestResolveIPv4Remote, and TestResolveIPv6Remote depend on external DNS services (localtest.me, nip.io, sslip.io). On Mac CI pool machines these lookups occasionally fail with kNotFound when the upstream resolver cannot reach the authoritative DNS servers, causing flaky test failures.

Add LookupWithRetry helper that retries up to 3 times on kNotFound. If all attempts fail the test is skipped rather than failed, since the failure is infrastructure unavailability not a code regression. Retrying only on kNotFound is safe: that status code is only produced by DNSServiceResolverImpl when the DNS server responds with NXDOMAIN for both A and AAAA; bugs in the resolver itself map to kUnknown and will still surface as failures.

…ngine_test

TestResolveRemote, TestResolveIPv4Remote, and TestResolveIPv6Remote depend on
external DNS services (localtest.me, nip.io, sslip.io). On Mac CI pool machines
these lookups occasionally fail with kNotFound when the upstream resolver cannot
reach the authoritative DNS servers, causing flaky test failures.

Add LookupWithRetry helper that retries up to 3 times on kNotFound. If all
attempts fail the test is skipped rather than failed, since the failure is
infrastructure unavailability not a code regression. Retrying only on kNotFound
is safe: that status code is only produced by DNSServiceResolverImpl when the
DNS server responds with NXDOMAIN for both A and AAAA; bugs in the resolver
itself map to kUnknown and will still surface as failures.
@pawbhard pawbhard requested a review from rishesh007 April 20, 2026 11:06
@pawbhard pawbhard self-assigned this Apr 20, 2026
@pawbhard pawbhard added the release notes: no Indicates if PR should not be in release notes label Apr 20, 2026
…_engine_test

After exhausting retries, fail the test with ASSERT_TRUE rather than silently
skipping it, so persistent DNS infrastructure issues are surfaced rather than
hidden.
@pawbhard pawbhard requested a review from rishesh007 April 20, 2026 12:22
asheshvidyut pushed a commit to asheshvidyut/grpc that referenced this pull request Apr 23, 2026
…ngine_test (grpc#42185)

…ngine_test

TestResolveRemote, TestResolveIPv4Remote, and TestResolveIPv6Remote depend on external DNS services (localtest.me, nip.io, sslip.io). On Mac CI pool machines these lookups occasionally fail with kNotFound when the upstream resolver cannot reach the authoritative DNS servers, causing flaky test failures.

Add LookupWithRetry helper that retries up to 3 times on kNotFound. If all attempts fail the test is skipped rather than failed, since the failure is infrastructure unavailability not a code regression. Retrying only on kNotFound is safe: that status code is only produced by DNSServiceResolverImpl when the DNS server responds with NXDOMAIN for both A and AAAA; bugs in the resolver itself map to kUnknown and will still surface as failures.

Closes grpc#42185

COPYBARA_INTEGRATE_REVIEW=grpc#42185 from pawbhard:retry_dns 36bfcfd
PiperOrigin-RevId: 902619277
asheshvidyut pushed a commit to a-detiste/grpc that referenced this pull request Jun 10, 2026
…ngine_test (grpc#42185)

…ngine_test

TestResolveRemote, TestResolveIPv4Remote, and TestResolveIPv6Remote depend on external DNS services (localtest.me, nip.io, sslip.io). On Mac CI pool machines these lookups occasionally fail with kNotFound when the upstream resolver cannot reach the authoritative DNS servers, causing flaky test failures.

Add LookupWithRetry helper that retries up to 3 times on kNotFound. If all attempts fail the test is skipped rather than failed, since the failure is infrastructure unavailability not a code regression. Retrying only on kNotFound is safe: that status code is only produced by DNSServiceResolverImpl when the DNS server responds with NXDOMAIN for both A and AAAA; bugs in the resolver itself map to kUnknown and will still surface as failures.

Closes grpc#42185

COPYBARA_INTEGRATE_REVIEW=grpc#42185 from pawbhard:retry_dns 36bfcfd
PiperOrigin-RevId: 902619277
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants