Fix infinite recursion bug when NS record points to CNAME#1422
Conversation
When resolving A and AAAA records for a name server name found in an NS
resource record, if that name happens to be an alias, then there is a
chance that we recurse indefinitely.
This happens because when Zonemaster does a follow-up query to try
resolving the alias target, it forgets that it was actually in the
process of resolving the name server name. It will try resolving the
name server name again, hit the alias again, then follow it again until
the heat death of the universe.
This commit ensures that the appropriate context is retained when
chasing an alias. It also modifies the loop detection slightly in order
to act on that previous context. Testing shows that there is no need, in
this particular case, to look at the CNAME chain itself because the
$state->{in_progress} hash contains the same data.
|
For MethodsV2 there are five test scenarios where the name server name is an alias via CNAME:
We have not seen any issues with those. @marc-vanderwal, can you see what is missing to trigger the bug fixed in this PR? Test scenario specification: https://github.com/zonemaster/zonemaster/blob/master/docs/public/specifications/test-zones/MethodsV2/methodsv2.md |
|
Thanks for looking into that! The scenario that triggers the bug is:
I’ll see if I can make a new scenario for that and reproduce the bug. For the time being, given the severity of the bug that this PR fixes, I propose that this PR is merged as is; I’ll create an issue for the missing unit test and see if I can work on that. |
|
I managed to reproduce the bug in a new scenario that I named |
To ensure that this problem will not happen ever again, I’ve added a test scenario (currently under review in zonemaster/zonemaster) and implemented it here.
|
@matsduf I’ve added unit tests for this fix based on the scenario that I’d like to add (see zonemaster/zonemaster#1348). Can you review again, please? |
tgreenx
left a comment
There was a problem hiding this comment.
LGTM. Note that I've changed the base branch to release-v2024.2.1.
|
Okay, here we go then! |
Purpose
This PR fixes an infinite recursion bug that could be triggered when, somewhere in the delegation chain, an NS record points to an alias.
Problem description
In a nutshell: suppose that
a.exampleis a zone whose authoritative nameserver isns.a.example. Butns.a.exampleis an alias forreal-ns.a.example. When the recursor tries to resolvens.a.exampleto IP addresses, it chases the alias: this causes it to restart a query forreal-ns.a.example. In the process, it needs to resolvens.a.exampleto an IP address so it can query that name server forreal-ns.a.example. However, the recursor threw away the context that led it to chasereal-ns.a.examplein the first place, that is: resolvingns.a.example.Because of that, Zonemaster tries to resolve
ns.a.example, which is aliased toreal-ns.a.example, but to resolvereal-ns.a.example, it needs to resolvens.a.example, which is aliased toreal-ns.a.example, and to resolve that name, it needs to resolvens.a.example, which is aliased toreal-ns.a.example, and so on. It’s an infinite loop that isn’t technically a CNAME chain nor an NS chain, but some hybrid of both.Solution
All we need to do is to keep the original query’s context around when performing a subordinate query caused by an alias (CNAME record). That way, when chasing
real-ns.a.example, the recursor notices that it needs to resolvens.a.example, which is a resolution that it is already attempting to perform, and therefore bails out instead of looping indefinitely.Context
Fixes #1421.
Changes
in_progresstable from the original query’s state to the subordinate query’s.CNAME_LOOP_OUTERmessages from being generated when detecting a CNAME loop across zones. We need to change the condition for generating those messages so that it uses thein_progresstable.How to test this PR
Unit tests for MethodsV2 have been updated and should complete without error.
Another option is to run
zonemaster-cli --test basic02 chi-eureseine.fr. The program should complete withLooks OK., instead of printing a cascade ofDeep recursionwarnings and requiring user intervention to kill the program.