Skip to content

Fix infinite recursion bug when NS record points to CNAME#1422

Merged
marc-vanderwal merged 2 commits into
zonemaster:release-v2024.2.1from
marc-vanderwal:bugfix/#1421
Feb 4, 2025
Merged

Fix infinite recursion bug when NS record points to CNAME#1422
marc-vanderwal merged 2 commits into
zonemaster:release-v2024.2.1from
marc-vanderwal:bugfix/#1421

Conversation

@marc-vanderwal

@marc-vanderwal marc-vanderwal commented Jan 28, 2025

Copy link
Copy Markdown
Contributor

Purpose

This PR fixes an infinite recursion bug that could be triggered when, somewhere in the delegation chain, an NS record points to an alias.

Problem description

In a nutshell: suppose that a.example is a zone whose authoritative nameserver is ns.a.example. But ns.a.example is an alias for real-ns.a.example. When the recursor tries to resolve ns.a.example to IP addresses, it chases the alias: this causes it to restart a query for real-ns.a.example. In the process, it needs to resolve ns.a.example to an IP address so it can query that name server for real-ns.a.example. However, the recursor threw away the context that led it to chase real-ns.a.example in the first place, that is: resolving ns.a.example.

Because of that, Zonemaster tries to resolve ns.a.example, which is aliased to real-ns.a.example, but to resolve real-ns.a.example, it needs to resolve ns.a.example, which is aliased to real-ns.a.example, and to resolve that name, it needs to resolve ns.a.example, which is aliased to real-ns.a.example, and so on. It’s an infinite loop that isn’t technically a CNAME chain nor an NS chain, but some hybrid of both.

Solution

All we need to do is to keep the original query’s context around when performing a subordinate query caused by an alias (CNAME record). That way, when chasing real-ns.a.example, the recursor notices that it needs to resolve ns.a.example, which is a resolution that it is already attempting to perform, and therefore bails out instead of looping indefinitely.

Context

Fixes #1421.

Changes

  • When initializing the state structure for a subordinate query caused by chasing an alias, copy the in_progress table from the original query’s state to the subordinate query’s.
  • The first change prevents CNAME_LOOP_OUTER messages from being generated when detecting a CNAME loop across zones. We need to change the condition for generating those messages so that it uses the in_progress table.

How to test this PR

Unit tests for MethodsV2 have been updated and should complete without error.

Another option is to run zonemaster-cli --test basic02 chi-eureseine.fr. The program should complete with Looks OK., instead of printing a cascade of Deep recursion warnings and requiring user intervention to kill the program.

When resolving A and AAAA records for a name server name found in an NS
resource record, if that name happens to be an alias, then there is a
chance that we recurse indefinitely.

This happens because when Zonemaster does a follow-up query to try
resolving the alias target, it forgets that it was actually in the
process of resolving the name server name. It will try resolving the
name server name again, hit the alias again, then follow it again until
the heat death of the universe.

This commit ensures that the appropriate context is retained when
chasing an alias. It also modifies the loop detection slightly in order
to act on that previous context. Testing shows that there is no need, in
this particular case, to look at the CNAME chain itself because the
$state->{in_progress} hash contains the same data.
@marc-vanderwal marc-vanderwal added T-Bug Type: Bug in software or error in test case description V-Patch Versioning: The change gives an update of patch in version. labels Jan 28, 2025
@marc-vanderwal marc-vanderwal added this to the v2024.2.1 milestone Jan 28, 2025
@matsduf

matsduf commented Jan 31, 2025

Copy link
Copy Markdown
Contributor

For MethodsV2 there are five test scenarios where the name server name is an alias via CNAME:

  • CHILD-NS-CNAME-1
  • CHILD-NS-CNAME-2
  • CHILD-NS-CNAME-3
  • PARENT-NS-CNAME-1
  • PARENT-NS-CNAME-2

We have not seen any issues with those. @marc-vanderwal, can you see what is missing to trigger the bug fixed in this PR?

Test scenario specification: https://github.com/zonemaster/zonemaster/blob/master/docs/public/specifications/test-zones/MethodsV2/methodsv2.md

matsduf
matsduf previously approved these changes Jan 31, 2025
@marc-vanderwal

marc-vanderwal commented Feb 3, 2025

Copy link
Copy Markdown
Contributor Author

Thanks for looking into that!

The scenario that triggers the bug is:

  • the zone being tested must have at least one in-bailiwick name server name pointing to an in-zone alias, similar to CHILD-NS-CNAME-1;
  • for that name server, no glue records must be returned by the parent zone;
  • for that same name server, the target of the alias must have either A records only, or AAAA records only. This is contrary to CHILD-NS-CNAME-1, where the alias targets have both A and AAAA records.

I’ll see if I can make a new scenario for that and reproduce the bug. For the time being, given the severity of the bug that this PR fixes, I propose that this PR is merged as is; I’ll create an issue for the missing unit test and see if I can work on that.

@marc-vanderwal

marc-vanderwal commented Feb 3, 2025

Copy link
Copy Markdown
Contributor Author

I managed to reproduce the bug in a new scenario that I named CHILD-NS-CNAME-4. And I can verify that the problem disappears when I apply the fix in this PR.

To ensure that this problem will not happen ever again, I’ve added a
test scenario (currently under review in zonemaster/zonemaster) and
implemented it here.
@marc-vanderwal

Copy link
Copy Markdown
Contributor Author

@matsduf I’ve added unit tests for this fix based on the scenario that I’d like to add (see zonemaster/zonemaster#1348). Can you review again, please?

@tgreenx tgreenx changed the base branch from develop to release-v2024.2.1 February 3, 2025 13:08

@tgreenx tgreenx left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Note that I've changed the base branch to release-v2024.2.1.

@marc-vanderwal

Copy link
Copy Markdown
Contributor Author

Okay, here we go then!

@marc-vanderwal marc-vanderwal merged commit e617a27 into zonemaster:release-v2024.2.1 Feb 4, 2025
@marc-vanderwal marc-vanderwal linked an issue Feb 4, 2025 that may be closed by this pull request
@marc-vanderwal marc-vanderwal deleted the bugfix/#1421 branch February 5, 2025 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

T-Bug Type: Bug in software or error in test case description V-Patch Versioning: The change gives an update of patch in version.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NS record pointing to alias (CNAME) can cause infinite recursion

3 participants