Skip to content

DnsNameResolver: Allways call bind() during bootstrap#13817

Merged
normanmaurer merged 1 commit into4.1from
dns_address
Jan 31, 2024
Merged

DnsNameResolver: Allways call bind() during bootstrap#13817
normanmaurer merged 1 commit into4.1from
dns_address

Conversation

@normanmaurer
Copy link
Copy Markdown
Member

Motivation:

We should always call bind(...) during bootstrap to allow easier debugging as if not explicit binding the OS will do it anyway. Before calling explicit bind we would not have limited visiblity when logging:

10:18:45.466 [main] DEBUG i.netty.resolver.dns.DnsQueryContext - [id: 0x798ced86] WRITE: UDP, [10844: /127.0.0.1:49443], DefaultDnsQuestion(somehost.netty.io. IN TXT)

After explicit bind the local address / port is included as well:

10:21:30.366 [nioEventLoopGroup-2-1] DEBUG i.netty.resolver.dns.DnsQueryContext - [id: 0xa118f365, L:/[0:0:0:0:0:0:0:0]:54165] WRITE: UDP, [15418: /127.0.0.1:56439], DefaultDnsQuestion(lastname.com. IN AAAA)

Modifications:

  • Always explicitly call bind(...)

Result:

Better debuggability

@normanmaurer
Copy link
Copy Markdown
Member Author

@idelpivnitskiy PTAL... I think we talked about this before.

@normanmaurer normanmaurer added this to the 4.1.107.Final milestone Jan 30, 2024
Motivation:

We should always call `bind(...)` during bootstrap to allow easier debugging as if not explicit binding the OS will do it anyway.
Before calling explicit bind we would not have limited visiblity when logging:

```
10:18:45.466 [main] DEBUG i.netty.resolver.dns.DnsQueryContext - [id: 0x798ced86] WRITE: UDP, [10844: /127.0.0.1:49443], DefaultDnsQuestion(somehost.netty.io. IN TXT)
```

After explicit bind the local address / port is included as well:

```
10:21:30.366 [nioEventLoopGroup-2-1] DEBUG i.netty.resolver.dns.DnsQueryContext - [id: 0xa118f365, L:/[0:0:0:0:0:0:0:0]:54165] WRITE: UDP, [15418: /127.0.0.1:56439], DefaultDnsQuestion(lastname.com. IN AAAA)
```

Modifications:

- Always explicitly call bind(...)

Result:

Better debuggability
@normanmaurer normanmaurer merged commit 394a20f into 4.1 Jan 31, 2024
@normanmaurer normanmaurer deleted the dns_address branch January 31, 2024 09:19
normanmaurer added a commit that referenced this pull request Jan 31, 2024
Motivation:

We should always call `bind(...)` during bootstrap to allow easier
debugging as if not explicit binding the OS will do it anyway. Before
calling explicit bind we would not have limited visiblity when logging:

```
10:18:45.466 [main] DEBUG i.netty.resolver.dns.DnsQueryContext - [id: 0x798ced86] WRITE: UDP, [10844: /127.0.0.1:49443], DefaultDnsQuestion(somehost.netty.io. IN TXT)
```

After explicit bind the local address / port is included as well:

```
10:21:30.366 [nioEventLoopGroup-2-1] DEBUG i.netty.resolver.dns.DnsQueryContext - [id: 0xa118f365, L:/[0:0:0:0:0:0:0:0]:54165] WRITE: UDP, [15418: /127.0.0.1:56439], DefaultDnsQuestion(lastname.com. IN AAAA)
```

Modifications:

- Always explicitly call bind(...)

Result:

Better debuggability
Copy link
Copy Markdown
Member

@idelpivnitskiy idelpivnitskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks a lot for adding this!

franz1981 pushed a commit to franz1981/netty that referenced this pull request Feb 9, 2024
Motivation:

We should always call `bind(...)` during bootstrap to allow easier
debugging as if not explicit binding the OS will do it anyway. Before
calling explicit bind we would not have limited visiblity when logging:

```
10:18:45.466 [main] DEBUG i.netty.resolver.dns.DnsQueryContext - [id: 0x798ced86] WRITE: UDP, [10844: /127.0.0.1:49443], DefaultDnsQuestion(somehost.netty.io. IN TXT)
```

After explicit bind the local address / port is included as well:

```
10:21:30.366 [nioEventLoopGroup-2-1] DEBUG i.netty.resolver.dns.DnsQueryContext - [id: 0xa118f365, L:/[0:0:0:0:0:0:0:0]:54165] WRITE: UDP, [15418: /127.0.0.1:56439], DefaultDnsQuestion(lastname.com. IN AAAA)
```

Modifications:

- Always explicitly call bind(...)

Result:

Better debuggability
@llchry
Copy link
Copy Markdown

llchry commented May 11, 2024

Before the modification, no port is bound when the new DnsNameResolver () object is created. A random UDP port is bound only after reslove is invoked. After the modification, a UDP port is bound as long as a DnsNameResolver object is created. However, in software such as vert.x, there are many new DnsNameResolver objects that find that the address has been resolved without triggering reslove. I think this modification leads to a lot of unnecessary UDP listening in use. Is this reasonable?

@idelpivnitskiy
Copy link
Copy Markdown
Member

I like that the address is bind by default, but maybe there should be a way to opt-out for use-cases like mentioned above

@normanmaurer
Copy link
Copy Markdown
Member Author

@idelpivnitskiy @llchry I am happy to review a PR to disable it as opt-out.

@llchry
Copy link
Copy Markdown

llchry commented May 12, 2024

@idelpivnitskiy @llchry I am happy to review a PR to disable it as opt-out.

how about revoke this PR?

@idelpivnitskiy
Copy link
Copy Markdown
Member

my2c: the default behavior should be to bind to an address because it significantly complicates debuggability of any netty library that uses dns resolver if the port is not visible in logs. Also, per my observations, most libs create a single resolver (and reuse it to get advantage of a shared cache) and therefore bind only a single local port. Projects that behave differently may have a way to opt-out.

@llchry
Copy link
Copy Markdown

llchry commented May 15, 2024

If I'm using netty directly, netty provides various configurations that allow for various cases to be handled. When using some other dependent library, it's possible that these configurations are hidden.

fluetm added a commit to fluetm/netty that referenced this pull request Aug 17, 2024
@fluetm
Copy link
Copy Markdown

fluetm commented Aug 20, 2024

This caused an odd breakage for us in a very specific platform: When running Redisson in a JDK 8 container in Kubernetes, Redisson attempts to resolve the Redis hostname to make a connection and the channel to the DNS name server wasn't open...then this future gets added and hangs forever: https://github.com/netty/netty/blob/netty-4.1.108.Final/resolver-dns/src/main/java/io/netty/resolver/dns/DnsQueryContext.java#L253-L267

We're not sure why, but running the same code in a JDK 17 container in Kubernetes works fine. It also works outside of Kubernetes on both JDK 8 and 17.

@amitlpande
Copy link
Copy Markdown

amitlpande commented Sep 12, 2024

Hello,

When we performed netty DNS resolver upgrade (to 4.1.108), we also noticed that our Java process is using multiple UDP ports.
In our application, we use ReactorNettyTcpClient to connect to message queue broker using STOMP.
While debugging I confirmed that the number of UDP ports in use is equal to number of DNSNameResolver instances.
However, it wasn't very clear - how these instances are created. In our application we have only one instance of ReactorNettyTcpClient, yet the number of UDP ports being used varies. So, wanted to know if there is any way to determine how these DNSResolver instances are created. Is there any setting to control the number of UDP ports created?
How long these UDP ports will stay in use?

Here is the stack trace:

"tcp-client-loop-nio-2@40650" tid=0xb8 nid=NA runnable
java.lang.Thread.State: RUNNABLE
at io.netty.resolver.dns.DnsNameResolver.(DnsNameResolver.java:440)
at io.netty.resolver.dns.DnsNameResolverBuilder.build(DnsNameResolverBuilder.java:594)
at io.netty.resolver.dns.DnsAddressResolverGroup.newNameResolver(DnsAddressResolverGroup.java:114)
at io.netty.resolver.dns.DnsAddressResolverGroup.newResolver(DnsAddressResolverGroup.java:92)
at io.netty.resolver.dns.DnsAddressResolverGroup.newResolver(DnsAddressResolverGroup.java:77)
at io.netty.resolver.AddressResolverGroup.getResolver(AddressResolverGroup.java:70)
- locked <0xa2e1> (a java.util.IdentityHashMap)
at reactor.netty.transport.TransportConnector.doResolveAndConnect(TransportConnector.java:303)
at reactor.netty.transport.TransportConnector.lambda$connect$6(TransportConnector.java:165)
at reactor.netty.transport.TransportConnector$$Lambda/0x000000080248a500.apply(Unknown Source:-1)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:132)
at reactor.netty.transport.TransportConnector$MonoChannelPromise._subscribe(TransportConnector.java:638)
at reactor.netty.transport.TransportConnector$MonoChannelPromise.lambda$subscribe$0(TransportConnector.java:550)
at reactor.netty.transport.TransportConnector$MonoChannelPromise$$Lambda/0x000000080248a740.run(Unknown Source:-1)
at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.runWith(Unknown Source:-1)
at java.lang.Thread.run(Unknown Source:-1)

Security scanners reporting UDP port usage which needs explanation.
One thing to note that prior to 4.1.107, our application didn't see any usage of UDP ports. So, is there some setting that can be used to not use UDP ports at all?

Appreciate your help.
Thanks,
Amit

@idelpivnitskiy
Copy link
Copy Markdown
Member

idelpivnitskiy commented Sep 12, 2024

So, wanted to know if there is any way to determine how these DNSResolver instances are created.

That will be a question to reactor-netty maintainers. Not sure what they do at that layer, but typically it's enough to have a single DnsNameResolver instance for the entire application because that way all libraries can benefit from the same DNS cache. The only need for more resolvers (that come to my mind) would be if users either need different settings for different use-cases or if they have more than 65k DNS queries in-flight and need more instances to avoid exhausting available IDs.

Is there any setting to control the number of UDP ports created?

Each DnsNameResolver will have only 1 UDP port, and potentially 1 TCP port if TCP-fallback is enabled.

How long these UDP ports will stay in use?

For a lifetime of DnsNameResolver instance.

@amitlpande
Copy link
Copy Markdown

Thank you so much @idelpivnitskiy. I will reach out to reactor-netty maintainers and will update this thread as well.

@violetagg
Copy link
Copy Markdown
Member

@idelpivnitskiy

If we have this configuration below, the DnsAddressResolverGroup will create DnsAddressResolver per event loop.
When you say

but typically it's enough to have a single DnsNameResolver instance for the entire application

what kind of configuration do you mean because Bootstrap can be configured only with DnsAddressResolverGroup?

EventLoopGroup group = new NioEventLoopGroup();
DnsNameResolverBuilder dnsNameResolverBuilder = new DnsNameResolverBuilder(group.next());
dnsNameResolverBuilder.channelType(NioDatagramChannel.class);
Bootstrap b = new Bootstrap();
try {
	b.group(group)
		.channel(NioSocketChannel.class)
		.handler(new ChannelInitializer<SocketChannel>() {

			@Override
			protected void initChannel(SocketChannel socketChannel) {
				socketChannel.pipeline().addLast(new HttpClientCodec(), new HttpSnoopClientHandler());
			}
		})
		.resolver(new DnsAddressResolverGroup(dnsNameResolverBuilder));

	sendRequest(b);
	sendRequest(b);
}
finally {
	group.shutdownGracefully();
}

(the code snippet is extracted from https://gist.github.com/violetagg/ec457f300be46f457c30f358855ca13b)

When connect is invoked, Bootstrap will invoke resolve with a concrete event loop

resolver = ExternalAddressResolver.getOrDefault(externalResolver).getResolver(eventLoop);

DnsAddressResolverGroup will check its cache whether there is a resolver for this event loop and if there is no it will create a new resolver

As a result DnsAddressResolverGroup will hold a cache with resolver per event loop.

@idelpivnitskiy
Copy link
Copy Markdown
Member

@violetagg thanks for clarifying. If you are using AddressResolverGroup as part of the Bootstrap configuration, it will create a new resolver per every EventLoop thread. This is another good use-case for multiple resolvers that helps to avoid thread hops and keep running everything on the same EventLoop per every TCP connection.

@amitlpande then in this setup it's expected to have number of DNS resolvers equal to the total number of EventLoop (I/O) threads used for reactor. In result you will have the same number of bind UDP ports.

@amitlpande
Copy link
Copy Markdown

amitlpande commented Sep 24, 2024

@idelpivnitskiy Yes, I agree. However, is there a scope for refactoring like I mentioned in - #14338 (comment)?

Also, as suggested on the thread, It might be an acceptable stopgap solution to downgrade if there are no public vulnerabilities in the downgraded version.

@idelpivnitskiy
Copy link
Copy Markdown
Member

idelpivnitskiy commented Sep 25, 2024

I think either (1) netty should have a way to opt out from this behavior when users have a reason for that or (2) whoever has access to configure Bootstrap.resolver(AddressResolverGroup) may supply a special AddressResolverGroup implementation that will ignore the key and always return a single instance of underlying resolver. However, each approach will have side-effects:

  1. Harder to debug DNS issues.
  2. More thread hops -> potential perf impact for high throughput use-cases: from connection event loop to DNS-resolver event loop then back to connection event loop.

@idelpivnitskiy
Copy link
Copy Markdown
Member

Also, as suggested on the thread, It might be an acceptable stopgap solution to downgrade if there are no public vulnerabilities in the downgraded version.

That's not a good solution. Downgrade is highly inconvenient, unpredictable (someone else can upgrade forward), exposes to potential future vulnerabilities, or inability to use new features directly or via other dependencies that use netty on the same classpath.

idelpivnitskiy added a commit to idelpivnitskiy/netty that referenced this pull request Sep 26, 2024
Motivation:

We prefer to bind by default for better debug logging (see netty#13817).
However, some users expressed concerns with this behavior change
because it eagerly binds a UDP port even when allocated resolver
is not used.

Modifications:

- Keep default behavior to bind, but allow configuring `null` for
`DnsNameResolverBuilder.localAddress(...)` to preserve pre-existing
behavior.

Result:

Users can opt-in for pre-existing behavior of lazy port binding on
resolve.
normanmaurer pushed a commit that referenced this pull request Sep 26, 2024
Motivation:

We prefer to bind by default for better debug logging (see #13817).
However, some users expressed concerns with this behavior change because
it eagerly binds a UDP port even when allocated resolver is not used.

Modifications:

- Keep default behavior to bind, but allow configuring `null` for
`DnsNameResolverBuilder.localAddress(...)` to preserve pre-existing
behavior.

Result:

Users can force pre-existing behavior of lazy port binding on resolve.
normanmaurer pushed a commit that referenced this pull request Sep 26, 2024
Motivation:

We prefer to bind by default for better debug logging (see #13817).
However, some users expressed concerns with this behavior change because
it eagerly binds a UDP port even when allocated resolver is not used.

Modifications:

- Keep default behavior to bind, but allow configuring `null` for
`DnsNameResolverBuilder.localAddress(...)` to preserve pre-existing
behavior.

Result:

Users can force pre-existing behavior of lazy port binding on resolve.
normanmaurer pushed a commit that referenced this pull request Sep 27, 2024
Motivation:

We prefer to bind by default for better debug logging (see #13817).
However, some users expressed concerns with this behavior change because
it eagerly binds a UDP port even when allocated resolver is not used.

Modifications:

- Keep default behavior to bind, but allow configuring `null` for
`DnsNameResolverBuilder.localAddress(...)` to preserve pre-existing
behavior.

Result:

Users can force pre-existing behavior of lazy port binding on resolve.
@xupy521
Copy link
Copy Markdown

xupy521 commented Oct 11, 2024

此外,正如线程所建议的,如果降级版本中没有公开的漏洞,降级可能是一种可以接受的权宜之计。

这不是一个好的解决方案。降级非常不方便、不可预测(其他人可以继续升级)、暴露潜在的未来漏洞,或者无法直接使用新功能或通过在同一类路径上使用 netty 的其他依赖项使用新功能。

Is there a non-unhealthy version of the solution?

@idelpivnitskiy
Copy link
Copy Markdown
Member

#14375 made it possible for users to use DnsNameResolverBuilder.localAddress(null) to skip default behavior of binding to a local port. This was released in 4.1.114.Final: https://netty.io/news/2024/10/01/4-1-114-Final.html

@yzfeng2020
Copy link
Copy Markdown

Hi, I understand that:

  • The intention is to improve debuggability.

  • There’s already a mechanism to opt out of this behavior.

  • It is best practice to share one resolver (or the number of event loops) per application, which is a pattern many libraries already follow.

My concerns are specifically about why this behavior is now default:

  • Constructing a resolver reserves a UDP port for the lifetime of the resolver. Users might have valid reasons to create and use resolvers as needed, but with this default behavior, they could run out of ports even if those resolvers are not continuously needed.

  • In cases where end users have poorly optimized code that inadvertently creates many resolvers, this default behavior could exacerbate resource exhaustion. It might not be ideal from an application standpoint.

  • The opt-out option may be obscured by other libraries that use Netty internally, making it harder for developers to disable this behavior when necessary.

Based on my past experience with libraries using Netty (maybe I miss bigger pictures) —which tend to allocate resources more lazily—I'm curious: is it a new pattern of Netty development, that resources could be allocated more proactively?

Thanks for your consideration!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants