Skip to content

Introduce a domain-specific language for scenario-based unit testing of Zonemaster-Engine#1467

Merged
marc-vanderwal merged 16 commits into
zonemaster:developfrom
marc-vanderwal:feature/unit-test-dsl
Oct 28, 2025
Merged

Introduce a domain-specific language for scenario-based unit testing of Zonemaster-Engine#1467
marc-vanderwal merged 16 commits into
zonemaster:developfrom
marc-vanderwal:feature/unit-test-dsl

Conversation

@marc-vanderwal

Copy link
Copy Markdown
Contributor

Purpose

In order to support the ongoing effort of testing Zonemaster’s test cases with a scenario-based approach, this PR introduces a domain-specific language (DSL) aimed at easing the writing and maintenance of the corresponding .t files.

The design goals of the DSL are as follows:

  • Be easy to read and understand for both experienced developers and occasional contributors;
  • Be concise, by eliminating as much boilerplate and internal repetition as possible;
  • Provide opportunities for the .t files to be self-documenting;
  • Allow for extensibility should the need arise;
  • Stay in the same spirit as Test::More;
  • Provide “escape hatches” by letting unit test writers use custom callbacks when built-in facilities are insufficient;
  • Provide good and useful diagnostics when a test fails.

I’ve converted a handful of scenario-based .t files in the new DSL in order to test some edge cases. However, some features of the DSL are currently untested:

  • Testing for message arguments (no scenario-based .t file currently does that);
  • Fake DS records (no scenario-based .t file needs undelegated DS records).

Because this PR introduces no change visible to the end user, I haven’t set any of the versioning tags (V-Major, V-Minor or V-Patch).

Main differences with TestUtil.pm

The DSL strives to stay compatible with TestUtil.pm in terms of features. However, the DSL proved to be a good opportunity to improve on two areas.

Firstly, the DSL allows to test not only for the presence or absence of a message, but also the presence of a message satisfying certain constraints on its arguments. Diagnostic output in case such a test fails is hopefully detailed enough for pinpointing the cause of the failure.

Secondly, while TestUtil.pm only has a boolean property stating whether or not a scenario is testable, the DSL makes a distinction between two cases that I believe should be handled differently:

  • scenarios marked not_testable, that cannot be tested because the infrastructure (e.g. authoritative name servers configured to exhibit erroneous behavior or zones badly configured on purpose) is not yet available;
  • scenarios marked todo, that cannot be tested out because of a bug in Zonemaster-Engine that is yet to be fixed.

The difference mainly lies in what happens when the test suite is run, and what happens if ZONEMASTER_RECORD=1 is set in the environment. Scenarios marked not_testable are entirely skipped (as when todo_skip in Test::More is used) and therefore are not recorded in the .data file. Thus, the data file is not polluted with packets that do not correspond to the scenario’s specification. Scenarios marked todo are treated as if they were in a TODO block in Test::More: they are executed, but failures are ignored, while the corresponding packets are saved in the data file, so that someone wishing to fix the implementation can test their fix without having to rerecord the .data file.

Diagnostics

When testing for the presence of a message of a certain tag and with certain arguments, it can be tricky to narrow down what exactly happened. The DSL compiles to unit tests that strive to provide helpful diagnostics in this case, as in this sample output (with a test that was made to fail intentionally):

t/Test-basic02.t .. 13/?
    #   Failed test 'Messages of tag 'B02_AUTH_RESPONSE_SOA' exist with specified arguments'
    #   at t/Test-basic02.t line 112.
    # Looked for a message whose tag is 'B02_AUTH_RESPONSE_SOA'
    #   where 'domain' equals 'mixed-1.basic02.xa'
    #     and 'ns_list' equals 'ns1.mixed-1.basic02.xa/127.12.2.31;ns1.mixed-1.basic02.xa/fda1:b2:c3:0:127:12:2:31'
    #     and contains no argument other than those listed above
    # Here are all messages that unsuccessfully matched:
    #   B02_AUTH_RESPONSE_SOA domain="mixed-1.basic02.xa"; ns_list=ns1.mixed-1.basic02.xa/127.12.2.31;ns1.mixed-1.basic02.xa/fda1:b2:c3:0:127:12:2:31
#   Failed scenario 'MIXED-1'
#   at t/Test-basic02.t line 117.
# Looks like you failed 1 test of 26.
t/Test-basic02.t .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/26 subtests

Test Summary Report
-------------------
t/Test-basic02.t (Wstat: 256 Tests: 26 Failed: 1)
  Failed test:  16
  Non-zero exit status: 1
Files=1, Tests=26,  1 wallclock secs ( 0.03 usr  0.00 sys +  1.06 cusr  0.08 csys =  1.17 CPU)
Result: FAIL

Context

Hackathon session during last face-to-face meeting, and zonemaster/zonemaster#1401.

Changes

  • Introduce a DSL
  • Port a subset of scenario-based *.t files (i.e. using TestUtil::perform_testcase_testing()) to the DSL

Advice for reviewers

This PR contains new code (for the DSL implementation, compiler and documentation) and rewrites a few .t files. For the .t files, it is better to ignore the diff and look at the end result only. The original and the DSL rewrite of the .t files should be semantically equivalent.

If so far the DSL looks fine, is it worth the effort to convert the rest of the scenario-based .t files in this format? In the current state, this PR has converted about half of the scenario-based .t files so far in order to try and explore edge cases.

How to test this PR

Unit tests should pass.

If wishing to test this PR manually, you should run the test suite with a tool such as prove and enable the -v (verbose) option in order to obtain the full diagnostic output. Otherwise, try modifying a test so that it fails and look at the diagnostics output: it should point to the line in the .t file where the error occurred.

@matsduf

matsduf commented Jul 30, 2025

Copy link
Copy Markdown
Contributor
* Fake DS records (no scenario-based `.t` file needs undelegated DS records).

No, not currently, but we should soon be there.

@matsduf

matsduf commented Jul 30, 2025

Copy link
Copy Markdown
Contributor

Because this PR introduces no change visible to the end user, I haven’t set any of the versioning tags (V-Major, V-Minor or V-Patch).

It should still be a V-Patch as all changes requires increase in version number.

@matsduf matsduf added the V-Patch Versioning: The change gives an update of patch in version. label Jul 30, 2025
Comment thread t/Test-consistency06.t Outdated
Comment thread t/TestUtil/DSL.pm Outdated
Comment thread t/TestUtil/DSL.pm Outdated
@marc-vanderwal marc-vanderwal force-pushed the feature/unit-test-dsl branch from 0bfcd28 to f38d969 Compare August 7, 2025 06:15
@marc-vanderwal

Copy link
Copy Markdown
Contributor Author

I’ve done a rather large refactoring of the DSL implementation. All scenarios are now run in the order they were defined in the .t file. This lets me change the root hints while a .t file is running, which I’ll need for the unit tests I’m planning for Zone11 when I update its implementation.

@marc-vanderwal

marc-vanderwal commented Aug 7, 2025

Copy link
Copy Markdown
Contributor Author

Tests fail on Perl 5.26 because, for some reason, a test marked as TODO fails as expected but that failure isn’t squelched by the todo keyword in the DSL. How odd.

EDIT: turns out that older versions of Test::Builder do not set TODO status properly if todo_start() is passed undef as reason string.

@marc-vanderwal marc-vanderwal force-pushed the feature/unit-test-dsl branch 2 times, most recently from 11fadfe to 04fef77 Compare August 7, 2025 08:08
@matsduf matsduf added the RC-None Release category: Not to be included in Changes file. label Aug 10, 2025
MichaelTimbert
MichaelTimbert previously approved these changes Oct 9, 2025

@MichaelTimbert MichaelTimbert left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and working. "make test" and "prove" work, all unit tests pass.
Typos and errors are catch up, and the error is clearly explained.
Unless we forget "no_more_scenarios;", in which case we only get the error "No subtests run" error.

@matsduf matsduf left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perldoc t/TestUtil/DSL/Compiler.pm gives

POD ERRORS
       Hey! The above document had some coding errors, which are explained
       below:

       Around line 25:
           Non-ASCII character seen before =encoding in 'passing C<-v>'.
           Assuming UTF-8

@matsduf matsduf left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also perldoc t/TestUtil/DSL.pm :

POD ERRORS
       Hey! The above document had some coding errors, which are explained
       below:

       Around line 392:
           Non-ASCII character seen before =encoding in 'address…>]'. Assuming
           UTF-8

@matsduf matsduf left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be wrong. The scenario is run when running without the variable.

$ ZONEMASTER_SELECTED_SCENARIOS="ZONE-ERR-GRANDPARENT-3" prove -v t/Test-basic01.t 
t/Test-basic01.t .. 
ok 1 - use Zonemaster::Engine::Nameserver;
ok 2 - use Zonemaster::Engine::Test::Basic;
ZONE-ERR-GRANDPARENT-3: no such scenario at /usr/home/matsd/git/zonemaster-engine/t/TestUtil/DSL.pm line 163.
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 255 just after 2.
Dubious, test returned 255 (wstat 65280, 0xff00)
All 2 subtests passed 

Test Summary Report
-------------------
t/Test-basic01.t (Wstat: 65280 (exited 255) Tests: 2 Failed: 0)
  Non-zero exit status: 255
  Parse errors: No plan found in TAP output
Files=1, Tests=2,  1 wallclock secs ( 0.02 usr  0.00 sys +  0.27 cusr  0.12 csys =  0.41 CPU)
Result: FAIL

@matsduf

matsduf commented Oct 9, 2025

Copy link
Copy Markdown
Contributor

I see pros and cons with the new model. It is maybe more cumbersome to write, but it is easier to read, and then probably easier to update and correct.

Is it possible to change root hints within a t file with the new model? If so it make make it easier to handle test cases that use different root hints for different scenarios. With the current model you have to create separate files.

In today's model there is a shared cache between all scenarios in the same t file. If root hints are changed, the cache should probably be emptied.

@marc-vanderwal

Copy link
Copy Markdown
Contributor Author

Main changes since the previous iteration:

  • addressed POD errors, caused by a missing =encoding utf8 statement;
  • fixed ZONEMASTER_SELECTED_SCENARIOS environment variable not working as intended, due to a regression introduced while refactoring;
  • make sure root_hints also calls Zonemaster::Engine::Resolver::clear_cache() when used;
  • added a new keyword, clear_cache, that calls Zonemaster::Engine::Resolver::clear_cache(), then used it in t/Test-connectivity04.t in order to merge t/Test-connectivity04-A.t into the former.

Reviews welcome!

@matsduf

matsduf commented Oct 15, 2025

Copy link
Copy Markdown
Contributor
* make sure `root_hints` also calls `Zonemaster::Engine::Resolver::clear_cache()` when used;

* added a new keyword, `clear_cache`, that calls `Zonemaster::Engine::Resolver::clear_cache()`, then used it in `t/Test-connectivity04.t` in order to merge `t/Test-connectivity04-A.t` into the former.

Will clear_cache() also clear blacklisting?

@matsduf matsduf left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give an example of how the following could be used:

       The fourth form takes a tag name and a coderef: it searches for all
       messages whose tag matches, evaluates the coderef with @_ set to the
       list of messages matching the tag and expects the coderef to return a
       true value. This form is useful for situations where the third form
       falls short.

           expect <tag> => sub { ... };

##########

Repeated uses of "expect" are cumulative.

Does that mean like OR between the expect statements?

Comment thread t/Test-connectivity04.t
@marc-vanderwal

Copy link
Copy Markdown
Contributor Author

Will clear_cache() also clear blacklisting?

The cache in Zonemaster::Engine::Recursor caches all packets, including failure to respond. So yes, even that is cleared. Otherwise, I would not have been able to merge t/Test-connectivity04-A.t into t/Test-connectivity04.t.

There is a proto-DSL already, in t/TestUtil.pm, that helps us test
Zonemaster test cases with respect to predefined scenarios. But we need
to go further.

Scenarios are defined in terms of a zone name (which names a zone that
is configured in a predetermined way), some possible extra configuration
for the test case (such as “fake NS” and “fake DS” records), and a list
of messages that we expect Zonemaster to emit when testing the zone
name.

Currently, when using t/TestUtil.pm, we list scenarios in a array of
arrays. It has been working fine for now, but it’s lacking a feature:
testing message arguments in addition to message tags. How do we specify
predicates that message *arguments* should satisfy? This is where the
current method shows its limit.

The DSL is meant to be easy to read, not only for those who routinely
write such unit tests, but also for more casual readers. Its system of
keywords leaves room for extensibility when the need for new features
arises. It provides shorthands that aid writing unit tests that look
similar, without compromising readability. And last but not least, the
syntax allows for specifying properties that message arguments are to
satisfy when looking for messages of a certain tag (which isn’t fully
implemented yet).

Care has been taken to retain maximum compatibility with the features of
TestUtil.pm’s perform_testcase_testing() function in most areas.

In addition for its improved facilities for testing message arguments,
the DSL introduces another extension: a distinction between “todo”
tests (i.e. not testable because of a bug in the implementation, but
otherwise the scenario is fine) and “not_testable” tests (i.e. not
testable because the scenario is not implemented or cannot be
implemented with current means).
Functions like Test::More’s ok() automatically generate diagnostic
messages pointing to the caller’s location when a test fails. It works
fine in simple cases where Test::More is used directly in a .t file, but
it stops working here because of the DSL’s design.

As a result, test failures were always traced to some line in either
t/TestUtil/DSL.pm or t/TestUtil/DSL/Compiler.pm, which makes no sense.

In order to remedy this, we need to augment the AST with context
information about the place where scenario blocks and expect/forbid
keywords are used. Then, at execution time, we employ some rather
inelegant hacks in order to print the appropriate context information in
lieu of Test::More’s automatically generated context information should
a test fail.
Some refactoring was needed in order to make sure that todo tests were
handled appropriately. The previous implementation did not call
todo_start() at the right moment. Because of that, a test marked todo
which did pass was not reported at the end of the test harness’s output.
An issue, zonemaster/zonemaster#1401, proposes to “[u]pdate a test
scenario with requirements for message arguments”.

While as of writing this commit for the first time (2025-07-30), no
scenario has been updated yet, the tooling is implemented.

Message arguments can be tested as it is specified in the DSL’s
documentation. If such a test fails, we try our best to print as helpful
a message as possible: the test prints what exactly it was looking for
and which messages it has examined.

This debugging output could be improved by explaining why a given
message has failed a match, but the current level of detail is hopefully
plenty enough for debugging purposes.

Below is a sample TAP output detailing a failure of such a test. Long
lines are folded in order not to exceed 78 columns (the offendings lines
end with a backslash):

    t/Test-basic02.t .. 10/?
        #   Failed test 'Messages of tag 'B02_AUTH_RESPONSE_SOA' exist with\
        #   specified arguments'
        #   at t/Test-basic02.t line 112.
        # Looked for a message whose tag is 'B02_AUTH_RESPONSE_SOA'
        #   where 'domain' equals 'mixed-1.basic02.xa'
        #     and 'ns_list' equals 'ns1.mixed-1.basic02.xa/127.12.2.31;\
        #     ns1.mixed-1.basic02.xa/fda1:b2:c3:0:127:12:2:31'
        #     and contains no argument other than those listed above
        # Here are all messages that unsuccessfully matched:
        #   B02_AUTH_RESPONSE_SOA domain="mixed-1.basic02.xa"; \
        #     ns_list=ns1.mixed-1.basic02.xa/127.12.2.31;\
        #     ns1.mixed-1.basic02.xa/fda1:b2:c3:0:127:12:2:31
    #   Failed scenario 'MIXED-1'
    #   at t/Test-basic02.t line 117.
    # Looks like you failed 1 test of 26.
    t/Test-basic02.t .. Dubious, test returned 1 (wstat 256, 0x100)
    Failed 1/26 subtests
To avoid surprises in the semantics of the DSL, the scenarios should be
run in the order they were defined in the .t file. Doing this might
let us simplify the implementation later.
Refactor the way subtsets are selected or disabled using environment
variables. We also enforce the existence of scenario names listed in
those environment variables, which we previously didn’t.
Re-engineer the AST data structure generated by TestUtil::DSL so that
multiple root_hints are allowed. Subsequent root_hints entirely override
previous ones.
In order to avoid surprises for the end user, it’s better to respect the
order in which the expect and forbid keywords were defined in the
scenarios too. It simplifies the DSL’s implementation as well.
Add a keyword to tell Zonemaster::Engine::Resolver to clear the cache
between tests. That can be useful in some corner cases where caching of
resolution failures interferes with the correct operation of the unit
test, which is the case of t/Test-connectivity04.t, among others.
This one is interesting because it has a scenario, PTR-IS-ILLEGAL-CNAME,
which expects one tag, forbids one tag and leaves two others hanging.
Rewriting this test file to use the DSL helped me find a typo in the
original t/Test-basic02.t file that caused the GOOD-2 scenario to point
to good-1.basic02.xa instead of good-2.basic02.xa (see PR zonemaster#1465).

It’s a good demonstration of why it can be important to design data
structures and formats in such a way that internal repetition is avoided
unless absolutely necessary: when done right, it makes subtle errors
like these more obvious.
In this specific case, the unit test was broken in two files because
caching of resolution failures (“blacklisting”) caused issues with one
of the scenarios.

Thanks to the DSL’s “clear_cache” keyword, however, we can put the
affected scenario in the same file.
This one is interesting because it has a few scenarios that are marked
as not testable. We do the same here.
Another rewrite of a unit test that led me to uncover a typo in the
original unit test. Actually, this time, the specification was
incorrect (see zonemaster/zonemaster#1414).
This one is fairly straightforward, but many scenarios are marked as not
testable.
@marc-vanderwal

Copy link
Copy Markdown
Contributor Author

I’ve reworded the documentation for the expect keyword and provided some additional examples.

@matsduf

matsduf commented Oct 20, 2025

Copy link
Copy Markdown
Contributor

@MichaelTimbert, please re-review.

@MichaelTimbert MichaelTimbert left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good for me.

@matsduf

matsduf commented Oct 28, 2025

Copy link
Copy Markdown
Contributor

@marc-vanderwal, please merge.

@marc-vanderwal marc-vanderwal merged commit 2caef2f into zonemaster:develop Oct 28, 2025
3 checks passed
@marc-vanderwal marc-vanderwal deleted the feature/unit-test-dsl branch October 29, 2025 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

RC-None Release category: Not to be included in Changes file. V-Patch Versioning: The change gives an update of patch in version.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants