Introduce a domain-specific language for scenario-based unit testing of Zonemaster-Engine by marc-vanderwal · Pull Request #1467 · zonemaster/zonemaster-engine

marc-vanderwal · 2025-07-30T13:03:50Z

Purpose

In order to support the ongoing effort of testing Zonemaster’s test cases with a scenario-based approach, this PR introduces a domain-specific language (DSL) aimed at easing the writing and maintenance of the corresponding .t files.

The design goals of the DSL are as follows:

Be easy to read and understand for both experienced developers and occasional contributors;
Be concise, by eliminating as much boilerplate and internal repetition as possible;
Provide opportunities for the .t files to be self-documenting;
Allow for extensibility should the need arise;
Stay in the same spirit as Test::More;
Provide “escape hatches” by letting unit test writers use custom callbacks when built-in facilities are insufficient;
Provide good and useful diagnostics when a test fails.

I’ve converted a handful of scenario-based .t files in the new DSL in order to test some edge cases. However, some features of the DSL are currently untested:

Testing for message arguments (no scenario-based .t file currently does that);
Fake DS records (no scenario-based .t file needs undelegated DS records).

Because this PR introduces no change visible to the end user, I haven’t set any of the versioning tags (V-Major, V-Minor or V-Patch).

Main differences with TestUtil.pm

The DSL strives to stay compatible with TestUtil.pm in terms of features. However, the DSL proved to be a good opportunity to improve on two areas.

Firstly, the DSL allows to test not only for the presence or absence of a message, but also the presence of a message satisfying certain constraints on its arguments. Diagnostic output in case such a test fails is hopefully detailed enough for pinpointing the cause of the failure.

Secondly, while TestUtil.pm only has a boolean property stating whether or not a scenario is testable, the DSL makes a distinction between two cases that I believe should be handled differently:

scenarios marked not_testable, that cannot be tested because the infrastructure (e.g. authoritative name servers configured to exhibit erroneous behavior or zones badly configured on purpose) is not yet available;
scenarios marked todo, that cannot be tested out because of a bug in Zonemaster-Engine that is yet to be fixed.

The difference mainly lies in what happens when the test suite is run, and what happens if ZONEMASTER_RECORD=1 is set in the environment. Scenarios marked not_testable are entirely skipped (as when todo_skip in Test::More is used) and therefore are not recorded in the .data file. Thus, the data file is not polluted with packets that do not correspond to the scenario’s specification. Scenarios marked todo are treated as if they were in a TODO block in Test::More: they are executed, but failures are ignored, while the corresponding packets are saved in the data file, so that someone wishing to fix the implementation can test their fix without having to rerecord the .data file.

Diagnostics

When testing for the presence of a message of a certain tag and with certain arguments, it can be tricky to narrow down what exactly happened. The DSL compiles to unit tests that strive to provide helpful diagnostics in this case, as in this sample output (with a test that was made to fail intentionally):

t/Test-basic02.t .. 13/?
    #   Failed test 'Messages of tag 'B02_AUTH_RESPONSE_SOA' exist with specified arguments'
    #   at t/Test-basic02.t line 112.
    # Looked for a message whose tag is 'B02_AUTH_RESPONSE_SOA'
    #   where 'domain' equals 'mixed-1.basic02.xa'
    #     and 'ns_list' equals 'ns1.mixed-1.basic02.xa/127.12.2.31;ns1.mixed-1.basic02.xa/fda1:b2:c3:0:127:12:2:31'
    #     and contains no argument other than those listed above
    # Here are all messages that unsuccessfully matched:
    #   B02_AUTH_RESPONSE_SOA domain="mixed-1.basic02.xa"; ns_list=ns1.mixed-1.basic02.xa/127.12.2.31;ns1.mixed-1.basic02.xa/fda1:b2:c3:0:127:12:2:31
#   Failed scenario 'MIXED-1'
#   at t/Test-basic02.t line 117.
# Looks like you failed 1 test of 26.
t/Test-basic02.t .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/26 subtests

Test Summary Report
-------------------
t/Test-basic02.t (Wstat: 256 Tests: 26 Failed: 1)
  Failed test:  16
  Non-zero exit status: 1
Files=1, Tests=26,  1 wallclock secs ( 0.03 usr  0.00 sys +  1.06 cusr  0.08 csys =  1.17 CPU)
Result: FAIL

Context

Hackathon session during last face-to-face meeting, and zonemaster/zonemaster#1401.

Changes

Introduce a DSL
Port a subset of scenario-based *.t files (i.e. using TestUtil::perform_testcase_testing()) to the DSL

Advice for reviewers

This PR contains new code (for the DSL implementation, compiler and documentation) and rewrites a few .t files. For the .t files, it is better to ignore the diff and look at the end result only. The original and the DSL rewrite of the .t files should be semantically equivalent.

If so far the DSL looks fine, is it worth the effort to convert the rest of the scenario-based .t files in this format? In the current state, this PR has converted about half of the scenario-based .t files so far in order to try and explore edge cases.

How to test this PR

Unit tests should pass.

If wishing to test this PR manually, you should run the test suite with a tool such as prove and enable the -v (verbose) option in order to obtain the full diagnostic output. Otherwise, try modifying a test so that it fails and look at the diagnostics output: it should point to the line in the .t file where the error occurred.

matsduf · 2025-07-30T17:47:25Z

* Fake DS records (no scenario-based `.t` file needs undelegated DS records).

No, not currently, but we should soon be there.

matsduf · 2025-07-30T17:49:49Z

Because this PR introduces no change visible to the end user, I haven’t set any of the versioning tags (V-Major, V-Minor or V-Patch).

It should still be a V-Patch as all changes requires increase in version number.

marc-vanderwal · 2025-08-07T06:23:27Z

I’ve done a rather large refactoring of the DSL implementation. All scenarios are now run in the order they were defined in the .t file. This lets me change the root hints while a .t file is running, which I’ll need for the unit tests I’m planning for Zone11 when I update its implementation.

marc-vanderwal · 2025-08-07T06:28:44Z

Tests fail on Perl 5.26 because, for some reason, a test marked as TODO fails as expected but that failure isn’t squelched by the todo keyword in the DSL. How odd.

EDIT: turns out that older versions of Test::Builder do not set TODO status properly if todo_start() is passed undef as reason string.

MichaelTimbert

Tested and working. "make test" and "prove" work, all unit tests pass.
Typos and errors are catch up, and the error is clearly explained.
Unless we forget "no_more_scenarios;", in which case we only get the error "No subtests run" error.

matsduf

perldoc t/TestUtil/DSL/Compiler.pm gives

POD ERRORS
       Hey! The above document had some coding errors, which are explained
       below:

       Around line 25:
           Non-ASCII character seen before =encoding in 'passing C<-v>'.
           Assuming UTF-8

matsduf

Also perldoc t/TestUtil/DSL.pm :

POD ERRORS
       Hey! The above document had some coding errors, which are explained
       below:

       Around line 392:
           Non-ASCII character seen before =encoding in 'address…>]'. Assuming
           UTF-8

matsduf

This seems to be wrong. The scenario is run when running without the variable.

$ ZONEMASTER_SELECTED_SCENARIOS="ZONE-ERR-GRANDPARENT-3" prove -v t/Test-basic01.t 
t/Test-basic01.t .. 
ok 1 - use Zonemaster::Engine::Nameserver;
ok 2 - use Zonemaster::Engine::Test::Basic;
ZONE-ERR-GRANDPARENT-3: no such scenario at /usr/home/matsd/git/zonemaster-engine/t/TestUtil/DSL.pm line 163.
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 255 just after 2.
Dubious, test returned 255 (wstat 65280, 0xff00)
All 2 subtests passed 

Test Summary Report
-------------------
t/Test-basic01.t (Wstat: 65280 (exited 255) Tests: 2 Failed: 0)
  Non-zero exit status: 255
  Parse errors: No plan found in TAP output
Files=1, Tests=2,  1 wallclock secs ( 0.02 usr  0.00 sys +  0.27 cusr  0.12 csys =  0.41 CPU)
Result: FAIL

matsduf · 2025-10-09T15:01:36Z

I see pros and cons with the new model. It is maybe more cumbersome to write, but it is easier to read, and then probably easier to update and correct.

Is it possible to change root hints within a t file with the new model? If so it make make it easier to handle test cases that use different root hints for different scenarios. With the current model you have to create separate files.

In today's model there is a shared cache between all scenarios in the same t file. If root hints are changed, the cache should probably be emptied.

marc-vanderwal · 2025-10-15T15:02:10Z

Main changes since the previous iteration:

addressed POD errors, caused by a missing =encoding utf8 statement;
fixed ZONEMASTER_SELECTED_SCENARIOS environment variable not working as intended, due to a regression introduced while refactoring;
make sure root_hints also calls Zonemaster::Engine::Resolver::clear_cache() when used;
added a new keyword, clear_cache, that calls Zonemaster::Engine::Resolver::clear_cache(), then used it in t/Test-connectivity04.t in order to merge t/Test-connectivity04-A.t into the former.

Reviews welcome!

matsduf · 2025-10-15T16:07:35Z

* make sure `root_hints` also calls `Zonemaster::Engine::Resolver::clear_cache()` when used;

* added a new keyword, `clear_cache`, that calls `Zonemaster::Engine::Resolver::clear_cache()`, then used it in `t/Test-connectivity04.t` in order to merge `t/Test-connectivity04-A.t` into the former.

Will clear_cache() also clear blacklisting?

matsduf

Could you give an example of how the following could be used:

       The fourth form takes a tag name and a coderef: it searches for all
       messages whose tag matches, evaluates the coderef with @_ set to the
       list of messages matching the tag and expects the coderef to return a
       true value. This form is useful for situations where the third form
       falls short.

           expect <tag> => sub { ... };

##########

Repeated uses of "expect" are cumulative.

Does that mean like OR between the expect statements?

marc-vanderwal · 2025-10-16T05:50:51Z

Will clear_cache() also clear blacklisting?

The cache in Zonemaster::Engine::Recursor caches all packets, including failure to respond. So yes, even that is cleared. Otherwise, I would not have been able to merge t/Test-connectivity04-A.t into t/Test-connectivity04.t.

There is a proto-DSL already, in t/TestUtil.pm, that helps us test Zonemaster test cases with respect to predefined scenarios. But we need to go further. Scenarios are defined in terms of a zone name (which names a zone that is configured in a predetermined way), some possible extra configuration for the test case (such as “fake NS” and “fake DS” records), and a list of messages that we expect Zonemaster to emit when testing the zone name. Currently, when using t/TestUtil.pm, we list scenarios in a array of arrays. It has been working fine for now, but it’s lacking a feature: testing message arguments in addition to message tags. How do we specify predicates that message *arguments* should satisfy? This is where the current method shows its limit. The DSL is meant to be easy to read, not only for those who routinely write such unit tests, but also for more casual readers. Its system of keywords leaves room for extensibility when the need for new features arises. It provides shorthands that aid writing unit tests that look similar, without compromising readability. And last but not least, the syntax allows for specifying properties that message arguments are to satisfy when looking for messages of a certain tag (which isn’t fully implemented yet). Care has been taken to retain maximum compatibility with the features of TestUtil.pm’s perform_testcase_testing() function in most areas. In addition for its improved facilities for testing message arguments, the DSL introduces another extension: a distinction between “todo” tests (i.e. not testable because of a bug in the implementation, but otherwise the scenario is fine) and “not_testable” tests (i.e. not testable because the scenario is not implemented or cannot be implemented with current means).

Functions like Test::More’s ok() automatically generate diagnostic messages pointing to the caller’s location when a test fails. It works fine in simple cases where Test::More is used directly in a .t file, but it stops working here because of the DSL’s design. As a result, test failures were always traced to some line in either t/TestUtil/DSL.pm or t/TestUtil/DSL/Compiler.pm, which makes no sense. In order to remedy this, we need to augment the AST with context information about the place where scenario blocks and expect/forbid keywords are used. Then, at execution time, we employ some rather inelegant hacks in order to print the appropriate context information in lieu of Test::More’s automatically generated context information should a test fail.

Some refactoring was needed in order to make sure that todo tests were handled appropriately. The previous implementation did not call todo_start() at the right moment. Because of that, a test marked todo which did pass was not reported at the end of the test harness’s output.

An issue, zonemaster/zonemaster#1401, proposes to “[u]pdate a test scenario with requirements for message arguments”. While as of writing this commit for the first time (2025-07-30), no scenario has been updated yet, the tooling is implemented. Message arguments can be tested as it is specified in the DSL’s documentation. If such a test fails, we try our best to print as helpful a message as possible: the test prints what exactly it was looking for and which messages it has examined. This debugging output could be improved by explaining why a given message has failed a match, but the current level of detail is hopefully plenty enough for debugging purposes. Below is a sample TAP output detailing a failure of such a test. Long lines are folded in order not to exceed 78 columns (the offendings lines end with a backslash): t/Test-basic02.t .. 10/? # Failed test 'Messages of tag 'B02_AUTH_RESPONSE_SOA' exist with\ # specified arguments' # at t/Test-basic02.t line 112. # Looked for a message whose tag is 'B02_AUTH_RESPONSE_SOA' # where 'domain' equals 'mixed-1.basic02.xa' # and 'ns_list' equals 'ns1.mixed-1.basic02.xa/127.12.2.31;\ # ns1.mixed-1.basic02.xa/fda1:b2:c3:0:127:12:2:31' # and contains no argument other than those listed above # Here are all messages that unsuccessfully matched: # B02_AUTH_RESPONSE_SOA domain="mixed-1.basic02.xa"; \ # ns_list=ns1.mixed-1.basic02.xa/127.12.2.31;\ # ns1.mixed-1.basic02.xa/fda1:b2:c3:0:127:12:2:31 # Failed scenario 'MIXED-1' # at t/Test-basic02.t line 117. # Looks like you failed 1 test of 26. t/Test-basic02.t .. Dubious, test returned 1 (wstat 256, 0x100) Failed 1/26 subtests

To avoid surprises in the semantics of the DSL, the scenarios should be run in the order they were defined in the .t file. Doing this might let us simplify the implementation later.

Refactor the way subtsets are selected or disabled using environment variables. We also enforce the existence of scenario names listed in those environment variables, which we previously didn’t.

Re-engineer the AST data structure generated by TestUtil::DSL so that multiple root_hints are allowed. Subsequent root_hints entirely override previous ones.

In order to avoid surprises for the end user, it’s better to respect the order in which the expect and forbid keywords were defined in the scenarios too. It simplifies the DSL’s implementation as well.

Add a keyword to tell Zonemaster::Engine::Resolver to clear the cache between tests. That can be useful in some corner cases where caching of resolution failures interferes with the correct operation of the unit test, which is the case of t/Test-connectivity04.t, among others.

This one is interesting because it has a scenario, PTR-IS-ILLEGAL-CNAME, which expects one tag, forbids one tag and leaves two others hanging.

Rewriting this test file to use the DSL helped me find a typo in the original t/Test-basic02.t file that caused the GOOD-2 scenario to point to good-1.basic02.xa instead of good-2.basic02.xa (see PR zonemaster#1465). It’s a good demonstration of why it can be important to design data structures and formats in such a way that internal repetition is avoided unless absolutely necessary: when done right, it makes subtle errors like these more obvious.

In this specific case, the unit test was broken in two files because caching of resolution failures (“blacklisting”) caused issues with one of the scenarios. Thanks to the DSL’s “clear_cache” keyword, however, we can put the affected scenario in the same file.

This one is interesting because it has a few scenarios that are marked as not testable. We do the same here.

Another rewrite of a unit test that led me to uncover a typo in the original unit test. Actually, this time, the specification was incorrect (see zonemaster/zonemaster#1414).

This one is fairly straightforward, but many scenarios are marked as not testable.

marc-vanderwal · 2025-10-16T06:14:03Z

I’ve reworded the documentation for the expect keyword and provided some additional examples.

matsduf · 2025-10-20T14:06:36Z

@MichaelTimbert, please re-review.

MichaelTimbert

Good for me.

matsduf · 2025-10-28T08:41:32Z

@marc-vanderwal, please merge.

marc-vanderwal added this to the v2025.2 milestone Jul 30, 2025

marc-vanderwal requested review from MichaelTimbert, matsduf, mattias-p and tgreenx July 30, 2025 13:03

marc-vanderwal force-pushed the feature/unit-test-dsl branch from faf92cb to 0bfcd28 Compare July 30, 2025 13:30

matsduf added the V-Patch Versioning: The change gives an update of patch in version. label Jul 30, 2025

marc-vanderwal commented Jul 31, 2025

View reviewed changes

Comment thread t/Test-consistency06.t Outdated

marc-vanderwal commented Aug 5, 2025

View reviewed changes

Comment thread t/TestUtil/DSL.pm Outdated

Comment thread t/TestUtil/DSL.pm Outdated

marc-vanderwal force-pushed the feature/unit-test-dsl branch from 0bfcd28 to f38d969 Compare August 7, 2025 06:15

marc-vanderwal force-pushed the feature/unit-test-dsl branch 2 times, most recently from 11fadfe to 04fef77 Compare August 7, 2025 08:08

matsduf added the RC-None Release category: Not to be included in Changes file. label Aug 10, 2025

MichaelTimbert previously approved these changes Oct 9, 2025

View reviewed changes

matsduf requested changes Oct 9, 2025

View reviewed changes

marc-vanderwal dismissed MichaelTimbert’s stale review via 1db250b October 15, 2025 14:50

marc-vanderwal force-pushed the feature/unit-test-dsl branch 2 times, most recently from 1db250b to 141f600 Compare October 15, 2025 14:58

marc-vanderwal requested a review from matsduf October 15, 2025 15:04

matsduf requested changes Oct 15, 2025

View reviewed changes

Comment thread t/Test-connectivity04.t

marc-vanderwal added 16 commits October 16, 2025 08:12

Refactor: run scenarios in definition order

e2c96dc

To avoid surprises in the semantics of the DSL, the scenarios should be run in the order they were defined in the .t file. Doing this might let us simplify the implementation later.

Refactor: selecting and disabling subtests

33ae3db

Refactor the way subtsets are selected or disabled using environment variables. We also enforce the existence of scenario names listed in those environment variables, which we previously didn’t.

Refactor: allow multiple root_hints

721b579

Re-engineer the AST data structure generated by TestUtil::DSL so that multiple root_hints are allowed. Subsequent root_hints entirely override previous ones.

Refactor: run expect/forbid in definition order

c81dd13

In order to avoid surprises for the end user, it’s better to respect the order in which the expect and forbid keywords were defined in the scenarios too. It simplifies the DSL’s implementation as well.

Port t/Test-address03.t to DSL

1854e25

This one is interesting because it has a scenario, PTR-IS-ILLEGAL-CNAME, which expects one tag, forbids one tag and leaves two others hanging.

Port t/Test-basic01.t to DSL

345c17b

Port t/Test-consistency05.t to DSL

2d181a4

This one is interesting because it has a few scenarios that are marked as not testable. We do the same here.

Port t/Test-consistency06.t to DSL

1afe3a0

Another rewrite of a unit test that led me to uncover a typo in the original unit test. Actually, this time, the specification was incorrect (see zonemaster/zonemaster#1414).

Port t/Test-delegation01.t to DSL

22c65a5

This one is fairly straightforward, but many scenarios are marked as not testable.

marc-vanderwal force-pushed the feature/unit-test-dsl branch from 141f600 to 22c65a5 Compare October 16, 2025 06:13

matsduf approved these changes Oct 16, 2025

View reviewed changes

MichaelTimbert approved these changes Oct 28, 2025

View reviewed changes

marc-vanderwal merged commit 2caef2f into zonemaster:develop Oct 28, 2025
3 checks passed

This was referenced Oct 29, 2025

Update DNSSEC05 implementation #1473

Merged

Implement new MethodV2: Get-Parent-NS-Names-and-IPs #1475

Merged

marc-vanderwal deleted the feature/unit-test-dsl branch October 29, 2025 13:53

marc-vanderwal mentioned this pull request Nov 6, 2025

Update Zone11 implementation #1477

Merged

matsduf mentioned this pull request Nov 7, 2025

Update Address01 implementation #1453

Merged

Uh oh!

Conversation

marc-vanderwal commented Jul 30, 2025

Purpose

Main differences with TestUtil.pm

Diagnostics

Context

Changes

Advice for reviewers

How to test this PR

Uh oh!

matsduf commented Jul 30, 2025

Uh oh!

matsduf commented Jul 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marc-vanderwal commented Aug 7, 2025

Uh oh!

marc-vanderwal commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MichaelTimbert left a comment

Choose a reason for hiding this comment

Uh oh!

matsduf left a comment

Choose a reason for hiding this comment

Uh oh!

matsduf left a comment

Choose a reason for hiding this comment

Uh oh!

matsduf left a comment

Choose a reason for hiding this comment

Uh oh!

matsduf commented Oct 9, 2025

Uh oh!

marc-vanderwal commented Oct 15, 2025

Uh oh!

matsduf commented Oct 15, 2025

Uh oh!

matsduf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

marc-vanderwal commented Oct 16, 2025

Uh oh!

marc-vanderwal commented Oct 16, 2025

Uh oh!

matsduf commented Oct 20, 2025

Uh oh!

MichaelTimbert left a comment

Choose a reason for hiding this comment

Uh oh!

matsduf commented Oct 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

marc-vanderwal commented Aug 7, 2025 •

edited

Loading