Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dns: default to verbatim=true in dns.lookup() #37931

Open
wants to merge 8 commits into
base: master
from

Conversation

@treysis
Copy link

@treysis treysis commented Mar 26, 2021

Switch the default from false (reorder the result so that IPv4 addresses come before IPv6 addresses) to true (return them exactly as the resolver sent them to us.)

After many tests failed due to differently resolving localhost to ::1 or 127.0.0.1 and after it became impossible to keep track of what worked and what didn't in the CI jobs during running the test suite, this is a fresh start from what we learned in, and effectively a reissue of #37681.

Fixes: #31566
Refs: #6307
Refs: #20710
Reissue of #31567

Switch the default from false (reorder the result so that IPv4 addresses
come before IPv6 addresses) to true (return them exactly as the resolver
sent them to us.)

Fixes: #31566
Refs: #6307
Refs: #20710
Reissue of #31567
Reissue of #37681
@treysis treysis force-pushed the treysis:verbatim-true-reissue branch from 5bef0de to 90a5e93 Mar 26, 2021
@aduh95 aduh95 added the semver-major label Mar 26, 2021
@treysis
Copy link
Author

@treysis treysis commented Mar 27, 2021

So, I went again through any single test that failed on my machine and tried to fix it. I found the following tests to fail and did the following to fix them. Let's also have a CI job to see what problems exist on the other setups.

Explanations:

  • test/parallel/test-cluster-message.js: add family: 4 to connect because without it will default to localhost which could resolve to ::1 if IPv6 is available, while server listen address is hardcoded legacy IPv4.

  • test/parallel/test-http-localaddress.js: add family: 4, to connect because localhost could resolve to ::1 if IPv6 is available, while server listen address is hardcoded legacy IPv4.

  • test/parallel/test-http-upgrade-client.js: add family: 4, to http.get because http.get without specified host uses localhost which could resolve to ::1 if IPv6 is available, while server listen address is hardcoded legacy IPv4.

  • test/parallel/test-http2-connect-options.js: add family: 4 to http2.connect options because localhost might resolve to ::1 if IPv6 is available, causing EINVAL when binding to 127.0.0.2, and server listen address is hardcoded legacy IPv4.

  • test/parallel/test-https-localaddress.js: add family: 4 to https.request options because localhost might resolve to ::1 if IPv6 is available, causing EINVAL when binding to 127.0.0.2, and server listen address is hardcoded legacy IPv4.

  • test/common/inspector-helper.js: add family: 4 to http.get options to make sure to connect via legacy IPv4 localhost, because inspector by default listens on legacy IPv4 localhost. Fixes:

    • test/parallel/test-inspect-async-hook-setup-at-inspect.js
    • test/parallel/test-inspector-esm.js
    • test/parallel/test-inspector-inspect-brk-node.js
    • test/parallel/test-inspector-multisession-ws.js
    • test/parallel/test-inspector-reported-host.js
    • test/parallel/test-inspector-wait-for-connection.js
    • test/parallel/test-inspector-waiting-for-disconnect.js
    • test/sequential/test-inspector.js
    • test/sequential/test-inspector-async-hook-setup-at-inspect-brk.js
    • test/sequential/test-inspector-async-hook-setup-at-signal.js
    • test/sequential/test-inspector-async-stack-traces-promise-then.js
    • test/sequential/test-inspector-async-stack-traces-set-interval.js
    • test/sequential/test-inspector-break-e.js
    • test/sequential/test-inspector-break-when-eval.js
    • test/sequential/test-inspector-console.js
    • test/sequential/test-inspector-debug-brk-flag.js
    • test/sequential/test-inspector-debug-end.js
    • test/sequential/test-inspector-exception.js
    • test/sequential/test-inspector-not-blocked-on-idle.js
    • test/sequential/test-inspector-scriptparsed-context.js
    • test/sequential/test-inspector-stop-profile-after-done.js
    • test/sequential/test-inspector-stress-http.js
  • test/parallel/test-net-dns-lookup.js: listen on common.localhostIPv4 and add family: 4 to connect to force legacy IPv4, because it is required by the assertion tests. Could be changed to DualStack compatibility by using assert.match(ip, /^(127\.0\.0\.1|::1)$/); and assert.match(type.toString, /^(4|6)$/);.

  • test/parallel/test-net-remote-address-port.js: add ::1 to remoteAddrCandidates. Why does this fix it?

  • test/parallel/test-tcp-wrap-listen.js: add ternary test with hasIPv6 and use bind or bind6 accordingly.

  • test/parallel/test-timers-socket-timeout-removes-other-socket-unref-timer.js: add family: 4 to connect because it will default to localhost being ::1 if IPv6 is available, and listen address is hardcoded to be legacy IPv4 address for localhost.

  • test/parallel/test-tls-client-getephemeralkeyinfo.js: add family: 4 to connect because without it will default to localhost which could resolve to ::1 if IPv6 is available, while server listen address is hardcoded legacy IPv4.

  • test/parallel/test-tls-client-mindhsize.js: add family: 4 to connect because without it will default to localhost which could resolve to ::1 if IPv6 is available, while server listen address is hardcoded legacy IPv4.

  • test/parallel/test-tls-wrap-econnreset-localaddress.js: add family: 4 to connect because without it will default to localhost which could resolve to ::1 if IPv6 is available, while server listen address is hardcoded legacy IPv4.

  • test/sequential/test-https-connect-localport.js: remove family: 4 from connect because listen uses localhost which maybe resolves to ::1? NEEDS TESTING! If it fails --> force legacy IPv4 with common.localhostIPv4.

  • test/sequential/test-inspector-open.js: add family: 4 to connect because without it will default to localhost which could resolve to ::1 if IPv6 is available, while inspector is listening on legacy IPv4 localhost by default.

  • test/sequential/test-net-connect-local-error.js: add family: 4 to optionsIPv4 and family: 6 to optionsIPv6, because it seems connecting on IPv4 will default to localhost which then resolves to ::1 if IPv6 is available and will cause an error.

Explanations:
-------------
test/parallel/test-cluster-message.js: add `family: 4` to `connect` because without it will default to `localhost` which could resolve to `::1` if IPv6 is available, while server listen address is hardcoded legacy IPv4.
test/parallel/test-http-localaddress.js: add `family: 4,` to `connect` because `localhost` could resolve to `::1` if IPv6 is available, while server listen address is hardcoded legacy IPv4.
test/parallel/test-http-upgrade-client.js: add `family: 4,` to `http.get` because `http.get` without specified host uses `localhost` which could resolve to `::1` if IPv6 is available, while server listen address is hardcoded legacy IPv4.
test/parallel/test-http2-connect-options.js: add `family: 4` to `http2.connect` `options` because `localhost` might resolve to `::1` if IPv6 is available, causing `EINVAL` when binding to `127.0.0.2`, and server listen address is hardcoded legacy IPv4.
test/parallel/test-https-localaddress.js: add `family: 4` to `https.request` `options` because `localhost` might resolve to `::1` if IPv6 is available, causing `EINVAL` when binding to `127.0.0.2`, and server listen address is hardcoded legacy IPv4.

test/common/inspector-helper.js: add `family: 4` to `http.get` options to make sure to connect via legacy IPv4 `localhost`, because inspector by default listens on legacy IPv4 `localhost`.
	Fixes:
		test/parallel/test-inspect-async-hook-setup-at-inspect.js
		test/parallel/test-inspector-esm.js
		test/parallel/test-inspector-inspect-brk-node.js
		test/parallel/test-inspector-multisession-ws.js
		test/parallel/test-inspector-reported-host.js
		test/parallel/test-inspector-wait-for-connection.js
		test/parallel/test-inspector-waiting-for-disconnect.js
		test/sequential/test-inspector.js
		test/sequential/test-inspector-async-hook-setup-at-inspect-brk.js
		test/sequential/test-inspector-async-hook-setup-at-signal.js
		test/sequential/test-inspector-async-stack-traces-promise-then.js
		test/sequential/test-inspector-async-stack-traces-set-interval.js
		test/sequential/test-inspector-break-e.js
		test/sequential/test-inspector-break-when-eval.js
		test/sequential/test-inspector-console.js
		test/sequential/test-inspector-debug-brk-flag.js
		test/sequential/test-inspector-debug-end.js
		test/sequential/test-inspector-exception.js
		test/sequential/test-inspector-not-blocked-on-idle.js
		test/sequential/test-inspector-scriptparsed-context.js
		test/sequential/test-inspector-stop-profile-after-done.js
		test/sequential/test-inspector-stress-http.js

test/parallel/test-net-dns-lookup.js: listen on `common.localhostIPv4` and add `family: 4` to `connect` to force legacy IPv4, because it is required by the assertion tests. Could be changed to DualStack compatibility by using `assert.match(ip, /^(127\.0\.0\.1|::1)$/);` and `assert.match(type.toString, /^(4|6)$/);`.
test/parallel/test-net-remote-address-port.js: add `::1` to `remoteAddrCandidates`. Why does this fix it?
test/parallel/test-tcp-wrap-listen.js: add ternary test with `hasIPv6` and use `bind` or `bind6` accordingly.
test/parallel/test-timers-socket-timeout-removes-other-socket-unref-timer.js: add `family: 4` to `connect` because it will default to localhost being `::1` if IPv6 is available, and listen address is hardcoded to be legacy IPv4 address for `localhost`.
test/parallel/test-tls-client-getephemeralkeyinfo.js: add `family: 4` to `connect` because without it will default to `localhost` which could resolve to `::1` if IPv6 is available, while server listen address is hardcoded legacy IPv4.
test/parallel/test-tls-client-mindhsize.js: add `family: 4` to `connect` because without it will default to `localhost` which could resolve to `::1` if IPv6 is available, while server listen address is hardcoded legacy IPv4.
test/parallel/test-tls-wrap-econnreset-localaddress.js: add `family: 4` to `connect` because without it will default to `localhost` which could resolve to `::1` if IPv6 is available, while server listen address is hardcoded legacy IPv4.
test/sequential/test-https-connect-localport.js: remove `family: 4` from connect because listen uses `localhost` which maybe resolves to `::1`? NEEDS TESTING! If it fails --> force legacy IPv4 with `common.localhostIPv4`.
test/sequential/test-inspector-open.js: add `family: 4` to `connect` because without it will default to `localhost` which could resolve to `::1` if IPv6 is available, while inspector is listening on legacy IPv4 `localhost` by default.
test/sequential/test-net-connect-local-error.js: add `family: 4` to `optionsIPv4` and `family: 6` to `optionsIPv6`, because it seems connecting on IPv4 will default to `localhost` which then resolves to `::1` if IPv6 is available and will cause an error.
doc/api/dns.md Outdated Show resolved Hide resolved
@treysis treysis force-pushed the treysis:verbatim-true-reissue branch from 5f24bef to bdedf3a Mar 27, 2021
@treysis
Copy link
Author

@treysis treysis commented Mar 27, 2021

Alright, so the preliminary results confirm something that I was suspecting. While we could replace 127.0.0.1 in probably all instances with localhost, this will break many tests on SmartOS. The reason is what we have found out and discussed in the previous PR: SmartOS can resolve localhost to ::1 when trying to open a listening socket. But it will refuse to resolve localhost to ::1 when opening a connection if there is no configured IPv6 address other than ::1, and instead resolves to 127.0.0.1.

Thus, I see two options:

  1. Apply a more general approach using localhost in favor of 127.0.0.1 and mark all those tests flaky for SmartOS, or
  2. Force IPv4 on all systems where SmartOS otherwise fails.

In my opinion, the way to go currently is approach 2. Approach 1 should be something implemented in a separate PR that tries to be more DualStack-aware. In approach 1 it would probably also make sense to review the implementation of DNS resolution on listening and connecting and to make sure they resolve to the same values.

@treysis
Copy link
Author

@treysis treysis commented Mar 27, 2021

SmartOS fixes:

  • test/parallel/test-net-connect-options-port.js: add family: 4 at various places to force localhost or default localhost to be resolved to legacy IPv4 addresses.
  • test/parallel/test-net-pingpong.js: change localhost to 127.0.0.1 because test is aimed at legacy IPv4, so it can be forced to literal IPv4 address without problem. NEEDS TESTING: does pingPongTest(0); pass?
  • test/parallel/test-net-remote-address-port.js: change localhost to 127.0.0.1 and force IPv4 by adding family: 4 because SmartOS will resolve localhost or no address automatically to legacy IPv4 address (127.0.0.1).
  • test/parallel/test-net-writable.js: listen on common.localhostIPv4 and add family: 4 to connect to force IPv4. because SmartOS will resolve localhost or no address automatically to legacy IPv4 address (127.0.0.1).
  • test/sequential/test-https-connect-localport.js: revert previous change, i.e. readd family: 4 to https.get. Change localhost on listen to common.localhostIPv4 to force legacy IPv4.
Explanations:
test/parallel/test-net-connect-options-port.js: add `family: 4` at various places to force `localhost` or default `localhost` to be resolved to legacy IPv4 addresses.
test/parallel/test-net-pingpong.js: change `localhost` to `127.0.0.1` because test is aimed at legacy IPv4, so it can be forced to literal IPv4 address without problem. NEEDS TESTING: does `pingPongTest(0);` pass?
test/parallel/test-net-remote-address-port.js: change `localhost` to `127.0.0.1` and force IPv4 by adding `family: 4` because SmartOS will resolve `localhost` or no address automatically to legacy IPv4 address (`127.0.0.1`).
test/parallel/test-net-writable.js: listen on `common.localhostIPv4` and add `family: 4` to connect to force IPv4. because SmartOS will resolve `localhost` or no address automatically to legacy IPv4 address (`127.0.0.1`).
test/sequential/test-https-connect-localport.js: revert previous change, i.e. readd `family: 4` to `https.get`. Change `localhost` on listen to `common.localhostIPv4` to force legacy IPv4.
@treysis treysis force-pushed the treysis:verbatim-true-reissue branch from 130edc7 to a45f066 Mar 27, 2021
@aduh95
Copy link
Contributor

@aduh95 aduh95 commented Mar 27, 2021

FWIW I don't think modifying the tests just so they can pass on SmartOS is OK. Depending on the cause of the failure:

  1. If this is an issue with our CI configuration, we should mark the tests as flaky on SmartOS and try to fix them later.
  2. If this is an issue with SmartOS itself, we should report the issue to them and hold on landing this until a fix is available.

Either way, I don't think modifying failing tests is the right thing to do here. @nodejs/platform-smartos wdyt?

The changes in bdedf3a LGTM though (thanks for the detailed explanation BTW, that's really good work).

@treysis
Copy link
Author

@treysis treysis commented Mar 27, 2021

  1. If this is an issue with our CI configuration, we should mark the tests as flaky on SmartOS and try to fix them later.
  2. If this is an issue with SmartOS itself, we should report the issue to them and hold on landing this until a fix is available.

I think it's both. Refusing to use ::1 when connecting to localhost while no other IPv6 address configured seems to be inherent to SmartOS. I don't know if this might even be a general thing on Unix>BSD>Solaris. I created a VM with SmartOS and ping -s localhost would use 127.0.0.1, while ping -s -A inet6 localhost would fail with unknown host despite having ::1 localhost in /etc/hosts, and ifconfig showing ::1 being assigned to lo. It would work as soon as there's another IPv6 address configured. I don't think anyone is really going to change this.

On the other hand, I don't understand why it is binding fine to ::1 when specifying localhost in this situation. In my opinion this shouldn't be happening, and I am not sure yet if this is related to NodeJS, or SmartOS.

The changes in bdedf3a LGTM though (thanks for the detailed explanation BTW, that's really good work).

Without the explanations I probably would have lost track myself. It was necessary to have an overview over what needs to be addressed. But thanks for the acknowledgement.

@treysis
Copy link
Author

@treysis treysis commented Mar 28, 2021

Do the Linux CIs have IPv6 connectivity? Even though I don't really want to admit it...if I remove the public IPv6 address on my system, I observe the same problem as on SmartOS: listening on localhost means binding to ::1, connecting to localhost means connecting to 127.0.0.1.

I am not sure if this is related to the DNS module or to the resolver of the OS. If I do ping localhost it will always do ping ::1 on my system (Linux Mint), no matter if public IPv6 configured or not.

Edit: Thinking a bit about it I think we should adjust the tests. There is the chance (in theory) that having an IPv6 address configured or not might change during runtime. Since we can't listen DualStack on localhost, it makes sense to force the few tests to IPv4. I actually also don't see any reason why those tests use localhost instead of the hardcoded 127.0.0.1 that most of the other tests use anyways.

Edit2: It's related to getaddrinfo() and is the expected behavior if loopback is the only IPv6 address on the system: https://man7.org/linux/man-pages/man3/getaddrinfo.3.html

@treysis
Copy link
Author

@treysis treysis commented Mar 28, 2021

I think we should ditch AF_ADDRCONFIG. This was already suggested by @bnoordhuis in the previous discussions, and has been suggested in other projects not related to NodeJS as well.

On another node, @bnoordhuis also suggested ditching common.localhostIPv4 and replace it by hardcoded 127.0.0.1, as this method is just unnecessary.

A commit will follow shortly.

@treysis
Copy link
Author

@treysis treysis commented Mar 28, 2021

Can we get another CI job?

@treysis
Copy link
Author

@treysis treysis commented Mar 28, 2021

Great! So, what else needs to be discussed now in order to merge this?

What should be mentioned in the documentation: usage of localhost should be discouraged. Use 127.0.0.1 or ::1.

@aduh95
aduh95 approved these changes Mar 29, 2021
test/sequential/test-https-connect-localport.js Outdated Show resolved Hide resolved
Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>
@aduh95
Copy link
Contributor

@aduh95 aduh95 commented Mar 29, 2021

What should be mentioned in the documentation: usage of localhost should be discouraged. Use 127.0.0.1 or ::1.

nit: usage of localhost without specifying the family option (until Dual Stack Network is implemented in Node.js)

@aduh95
Copy link
Contributor

@aduh95 aduh95 commented Mar 29, 2021

Ping @nodejs/dns @nodejs/tsc for reviews

@guybedford guybedford force-pushed the nodejs:master branch from dc5a5da to 8e46568 Mar 29, 2021
@Trott
Trott approved these changes Apr 2, 2021
@Trott Trott added the needs-citgm label Apr 2, 2021
@treysis
Copy link
Author

@treysis treysis commented Apr 3, 2021

CITGM? Sure, why not!

@Trott
Copy link
Member

@Trott Trott commented Apr 5, 2021

CITGM: https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker/2666/

Because many failures are expected and can be ignored, here's a run against master for comparison:

CITGM on master: https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker/2667/ (queued, will 404 until the job starts, which might not be for a while)

@Trott
Copy link
Member

@Trott Trott commented Apr 5, 2021

@nodejs/tsc This needs another TSC review. It's a straightforward (but arguably breaking) change. Please take a look?

@treysis
Copy link
Author

@treysis treysis commented Apr 5, 2021

CITGM: https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker/2666/

Yeah, because tests were crafted in a hacky way by assuming localhost == 127.0.0.1 in all circumstances.

I wonder if it would be better to change the PR in a way to always assume 127.0.0.1 for localhost.

However, in general I feel like this task has grown to an extent where I feel like I can't solve this on my own anymore.

If it was for me, I'd simply merge the PR as it is now and let all the maintainers of modules figure out how to make their modules compatible with the internet of today (which is IPv6). There is still LTS and they have plenty of time to adjust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

5 participants