Introduce a domain-specific language for scenario-based unit testing of Zonemaster-Engine#1467
Conversation
faf92cb to
0bfcd28
Compare
No, not currently, but we should soon be there. |
It should still be a |
0bfcd28 to
f38d969
Compare
|
I’ve done a rather large refactoring of the DSL implementation. All scenarios are now run in the order they were defined in the |
|
Tests fail on Perl 5.26 because, for some reason, a test marked as TODO fails as expected but that failure isn’t squelched by the EDIT: turns out that older versions of Test::Builder do not set TODO status properly if |
11fadfe to
04fef77
Compare
MichaelTimbert
left a comment
There was a problem hiding this comment.
Tested and working. "make test" and "prove" work, all unit tests pass.
Typos and errors are catch up, and the error is clearly explained.
Unless we forget "no_more_scenarios;", in which case we only get the error "No subtests run" error.
matsduf
left a comment
There was a problem hiding this comment.
perldoc t/TestUtil/DSL/Compiler.pm gives
POD ERRORS
Hey! The above document had some coding errors, which are explained
below:
Around line 25:
Non-ASCII character seen before =encoding in 'passing C<-v>'.
Assuming UTF-8
matsduf
left a comment
There was a problem hiding this comment.
Also perldoc t/TestUtil/DSL.pm :
POD ERRORS
Hey! The above document had some coding errors, which are explained
below:
Around line 392:
Non-ASCII character seen before =encoding in 'address…>]'. Assuming
UTF-8
matsduf
left a comment
There was a problem hiding this comment.
This seems to be wrong. The scenario is run when running without the variable.
$ ZONEMASTER_SELECTED_SCENARIOS="ZONE-ERR-GRANDPARENT-3" prove -v t/Test-basic01.t
t/Test-basic01.t ..
ok 1 - use Zonemaster::Engine::Nameserver;
ok 2 - use Zonemaster::Engine::Test::Basic;
ZONE-ERR-GRANDPARENT-3: no such scenario at /usr/home/matsd/git/zonemaster-engine/t/TestUtil/DSL.pm line 163.
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 255 just after 2.
Dubious, test returned 255 (wstat 65280, 0xff00)
All 2 subtests passed
Test Summary Report
-------------------
t/Test-basic01.t (Wstat: 65280 (exited 255) Tests: 2 Failed: 0)
Non-zero exit status: 255
Parse errors: No plan found in TAP output
Files=1, Tests=2, 1 wallclock secs ( 0.02 usr 0.00 sys + 0.27 cusr 0.12 csys = 0.41 CPU)
Result: FAIL
|
I see pros and cons with the new model. It is maybe more cumbersome to write, but it is easier to read, and then probably easier to update and correct. Is it possible to change root hints within a In today's model there is a shared cache between all scenarios in the same |
1db250b to
141f600
Compare
|
Main changes since the previous iteration:
Reviews welcome! |
Will |
matsduf
left a comment
There was a problem hiding this comment.
Could you give an example of how the following could be used:
The fourth form takes a tag name and a coderef: it searches for all
messages whose tag matches, evaluates the coderef with @_ set to the
list of messages matching the tag and expects the coderef to return a
true value. This form is useful for situations where the third form
falls short.
expect <tag> => sub { ... };
##########
Repeated uses of "expect" are cumulative.
Does that mean like OR between the expect statements?
The cache in Zonemaster::Engine::Recursor caches all packets, including failure to respond. So yes, even that is cleared. Otherwise, I would not have been able to merge t/Test-connectivity04-A.t into t/Test-connectivity04.t. |
There is a proto-DSL already, in t/TestUtil.pm, that helps us test Zonemaster test cases with respect to predefined scenarios. But we need to go further. Scenarios are defined in terms of a zone name (which names a zone that is configured in a predetermined way), some possible extra configuration for the test case (such as “fake NS” and “fake DS” records), and a list of messages that we expect Zonemaster to emit when testing the zone name. Currently, when using t/TestUtil.pm, we list scenarios in a array of arrays. It has been working fine for now, but it’s lacking a feature: testing message arguments in addition to message tags. How do we specify predicates that message *arguments* should satisfy? This is where the current method shows its limit. The DSL is meant to be easy to read, not only for those who routinely write such unit tests, but also for more casual readers. Its system of keywords leaves room for extensibility when the need for new features arises. It provides shorthands that aid writing unit tests that look similar, without compromising readability. And last but not least, the syntax allows for specifying properties that message arguments are to satisfy when looking for messages of a certain tag (which isn’t fully implemented yet). Care has been taken to retain maximum compatibility with the features of TestUtil.pm’s perform_testcase_testing() function in most areas. In addition for its improved facilities for testing message arguments, the DSL introduces another extension: a distinction between “todo” tests (i.e. not testable because of a bug in the implementation, but otherwise the scenario is fine) and “not_testable” tests (i.e. not testable because the scenario is not implemented or cannot be implemented with current means).
Functions like Test::More’s ok() automatically generate diagnostic messages pointing to the caller’s location when a test fails. It works fine in simple cases where Test::More is used directly in a .t file, but it stops working here because of the DSL’s design. As a result, test failures were always traced to some line in either t/TestUtil/DSL.pm or t/TestUtil/DSL/Compiler.pm, which makes no sense. In order to remedy this, we need to augment the AST with context information about the place where scenario blocks and expect/forbid keywords are used. Then, at execution time, we employ some rather inelegant hacks in order to print the appropriate context information in lieu of Test::More’s automatically generated context information should a test fail.
Some refactoring was needed in order to make sure that todo tests were handled appropriately. The previous implementation did not call todo_start() at the right moment. Because of that, a test marked todo which did pass was not reported at the end of the test harness’s output.
An issue, zonemaster/zonemaster#1401, proposes to “[u]pdate a test scenario with requirements for message arguments”. While as of writing this commit for the first time (2025-07-30), no scenario has been updated yet, the tooling is implemented. Message arguments can be tested as it is specified in the DSL’s documentation. If such a test fails, we try our best to print as helpful a message as possible: the test prints what exactly it was looking for and which messages it has examined. This debugging output could be improved by explaining why a given message has failed a match, but the current level of detail is hopefully plenty enough for debugging purposes. Below is a sample TAP output detailing a failure of such a test. Long lines are folded in order not to exceed 78 columns (the offendings lines end with a backslash): t/Test-basic02.t .. 10/? # Failed test 'Messages of tag 'B02_AUTH_RESPONSE_SOA' exist with\ # specified arguments' # at t/Test-basic02.t line 112. # Looked for a message whose tag is 'B02_AUTH_RESPONSE_SOA' # where 'domain' equals 'mixed-1.basic02.xa' # and 'ns_list' equals 'ns1.mixed-1.basic02.xa/127.12.2.31;\ # ns1.mixed-1.basic02.xa/fda1:b2:c3:0:127:12:2:31' # and contains no argument other than those listed above # Here are all messages that unsuccessfully matched: # B02_AUTH_RESPONSE_SOA domain="mixed-1.basic02.xa"; \ # ns_list=ns1.mixed-1.basic02.xa/127.12.2.31;\ # ns1.mixed-1.basic02.xa/fda1:b2:c3:0:127:12:2:31 # Failed scenario 'MIXED-1' # at t/Test-basic02.t line 117. # Looks like you failed 1 test of 26. t/Test-basic02.t .. Dubious, test returned 1 (wstat 256, 0x100) Failed 1/26 subtests
To avoid surprises in the semantics of the DSL, the scenarios should be run in the order they were defined in the .t file. Doing this might let us simplify the implementation later.
Refactor the way subtsets are selected or disabled using environment variables. We also enforce the existence of scenario names listed in those environment variables, which we previously didn’t.
Re-engineer the AST data structure generated by TestUtil::DSL so that multiple root_hints are allowed. Subsequent root_hints entirely override previous ones.
In order to avoid surprises for the end user, it’s better to respect the order in which the expect and forbid keywords were defined in the scenarios too. It simplifies the DSL’s implementation as well.
Add a keyword to tell Zonemaster::Engine::Resolver to clear the cache between tests. That can be useful in some corner cases where caching of resolution failures interferes with the correct operation of the unit test, which is the case of t/Test-connectivity04.t, among others.
This one is interesting because it has a scenario, PTR-IS-ILLEGAL-CNAME, which expects one tag, forbids one tag and leaves two others hanging.
Rewriting this test file to use the DSL helped me find a typo in the original t/Test-basic02.t file that caused the GOOD-2 scenario to point to good-1.basic02.xa instead of good-2.basic02.xa (see PR zonemaster#1465). It’s a good demonstration of why it can be important to design data structures and formats in such a way that internal repetition is avoided unless absolutely necessary: when done right, it makes subtle errors like these more obvious.
In this specific case, the unit test was broken in two files because caching of resolution failures (“blacklisting”) caused issues with one of the scenarios. Thanks to the DSL’s “clear_cache” keyword, however, we can put the affected scenario in the same file.
This one is interesting because it has a few scenarios that are marked as not testable. We do the same here.
Another rewrite of a unit test that led me to uncover a typo in the original unit test. Actually, this time, the specification was incorrect (see zonemaster/zonemaster#1414).
This one is fairly straightforward, but many scenarios are marked as not testable.
141f600 to
22c65a5
Compare
|
I’ve reworded the documentation for the |
|
@MichaelTimbert, please re-review. |
|
@marc-vanderwal, please merge. |
Purpose
In order to support the ongoing effort of testing Zonemaster’s test cases with a scenario-based approach, this PR introduces a domain-specific language (DSL) aimed at easing the writing and maintenance of the corresponding
.tfiles.The design goals of the DSL are as follows:
.tfiles to be self-documenting;Test::More;I’ve converted a handful of scenario-based
.tfiles in the new DSL in order to test some edge cases. However, some features of the DSL are currently untested:.tfile currently does that);.tfile needs undelegated DS records).Because this PR introduces no change visible to the end user, I haven’t set any of the versioning tags (V-Major, V-Minor or V-Patch).
Main differences with TestUtil.pm
The DSL strives to stay compatible with TestUtil.pm in terms of features. However, the DSL proved to be a good opportunity to improve on two areas.
Firstly, the DSL allows to test not only for the presence or absence of a message, but also the presence of a message satisfying certain constraints on its arguments. Diagnostic output in case such a test fails is hopefully detailed enough for pinpointing the cause of the failure.
Secondly, while TestUtil.pm only has a boolean property stating whether or not a scenario is testable, the DSL makes a distinction between two cases that I believe should be handled differently:
not_testable, that cannot be tested because the infrastructure (e.g. authoritative name servers configured to exhibit erroneous behavior or zones badly configured on purpose) is not yet available;todo, that cannot be tested out because of a bug in Zonemaster-Engine that is yet to be fixed.The difference mainly lies in what happens when the test suite is run, and what happens if
ZONEMASTER_RECORD=1is set in the environment. Scenarios markednot_testableare entirely skipped (as when todo_skip in Test::More is used) and therefore are not recorded in the.datafile. Thus, the data file is not polluted with packets that do not correspond to the scenario’s specification. Scenarios markedtodoare treated as if they were in a TODO block in Test::More: they are executed, but failures are ignored, while the corresponding packets are saved in the data file, so that someone wishing to fix the implementation can test their fix without having to rerecord the.datafile.Diagnostics
When testing for the presence of a message of a certain tag and with certain arguments, it can be tricky to narrow down what exactly happened. The DSL compiles to unit tests that strive to provide helpful diagnostics in this case, as in this sample output (with a test that was made to fail intentionally):
Context
Hackathon session during last face-to-face meeting, and zonemaster/zonemaster#1401.
Changes
*.tfiles (i.e. usingTestUtil::perform_testcase_testing()) to the DSLAdvice for reviewers
This PR contains new code (for the DSL implementation, compiler and documentation) and rewrites a few
.tfiles. For the.tfiles, it is better to ignore the diff and look at the end result only. The original and the DSL rewrite of the.tfiles should be semantically equivalent.If so far the DSL looks fine, is it worth the effort to convert the rest of the scenario-based
.tfiles in this format? In the current state, this PR has converted about half of the scenario-based.tfiles so far in order to try and explore edge cases.How to test this PR
Unit tests should pass.
If wishing to test this PR manually, you should run the test suite with a tool such as
proveand enable the-v(verbose) option in order to obtain the full diagnostic output. Otherwise, try modifying a test so that it fails and look at the diagnostics output: it should point to the line in the.tfile where the error occurred.