Skip to content

Run several Test Agents against the same queue#1115

Closed
matsduf wants to merge 3 commits into
zonemaster:developfrom
matsduf:multiple-testagent-one-queue
Closed

Run several Test Agents against the same queue#1115
matsduf wants to merge 3 commits into
zonemaster:developfrom
matsduf:multiple-testagent-one-queue

Conversation

@matsduf

@matsduf matsduf commented Jun 30, 2023

Copy link
Copy Markdown
Contributor

Purpose

When a large batch is to be tested, adding several Test Agents increases performance if the server has large capacity. Several servers could also be used. One limitation is that current code only allows one Test Agent per queue. All tests in a batch always belong to the same queue.

This PR makes it possible for several Test Agents to fetch un-run tests from the same queue without the risk of two or more Test Agents testing the same item.

This PR does not handle possible race condition between multiple Test Agents when it comes to process_unfinished_tests(), but that could be worked-around by setting a longer ZONEMASTER_max_zonemaster_execution_time on all Test Agents except one.

This PR also ensures that the progress counter is never set to 0% or 1% during the testing phase.

Changes

  • lib/Zonemaster/Backend/DB.pm
  • lib/Zonemaster/Backend/TestAgent.pm

How to test this PR

  • Run a large batch with one Test Agent and nothing should be changed.
  • Run a large batch with two Test Agents (the same queue) and there should be no cases when the same test is run by more than one worker.

@matsduf matsduf added T-Feature Type: New feature in software or test case description V-Minor Versioning: The change gives an update of minor in version. labels Jun 30, 2023
$self->format_time( time() ),
$test_id,
)) {
return undef;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a little bit hacky to return undef here. To me test_progress is both a getter and a setter and returns the progress value stored in database. Here it would extend its logic to something new. Would it be possible to move this new logic to another method (or maybe rethink the whole test_progress method)?

@matsduf matsduf Jul 3, 2023

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If test_progress() is split into set_test_progress() and get_test_progress() with the following logic, do you think that could work?

set_test_progress() returns

  • 1 if setting succeeded.
  • undef if setting failed (including test_id does not exist)

get_test_progress() returns

  • Actual value if test_id exists
  • undef if test_id does not exist

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that test_progress() is overloaded with meanings and we should clean it up. Having a combined getter and setter isn't too bad IMO. That's idiomatic in Perl anyways. The problem is that in addition to getting/setting progress we're using it to make state transitions. I also find it problematic that it's trying to paper over bugs elsewhere in the code.

Upon closer inspection I came up with these suggestions:

  • Make get_test_request() execute UPDATE SET progress=1 WHERE progress=0 directly instead of it calling test_progress(). That's the only place we want to bump progress from 0 to 1 anyway.
  • Remove the special handling of $progress==1 from test_progress(). Just make it set progress to roughly one percent.
  • Instead of clamping of $progress to [1, 99] in test_progress(), make it die unless 0 <= $progress < 100, and add a special case just for if $progress==0 { $progress=1 }. Clamping 0 up to 1 allows us to have more than 99 test cases. But if we're trying to set progress to any value outside [0, 99] then clearly there are bugs and we can't trust the results.
  • Give the remaining UPDATE in test_progress() a more stringent guard clause making sure that 1 <= old_progress <= new_progress, and make it die if no rows are affected by the UPDATE. If we're trying to update progress for a job that isn't running or if we're trying to reverse progress, then clearly there are bugs and we can't trust the results.
  • Make get_test_request() loop until SELECT returns nothing or until UPDATE succeeds, whichever happens first. I believe spinning is the right thing to do here for as long as there are jobs to allocate.
    • Instead of a loop both queries could be performed in a transaction that locks the selected row for update. This would be more efficient.

@matsduf matsduf requested a review from a user July 3, 2023 15:11
@matsduf

matsduf commented Nov 2, 2023

Copy link
Copy Markdown
Contributor Author

Replaced by #1121

@matsduf matsduf closed this Nov 2, 2023
@matsduf matsduf deleted the multiple-testagent-one-queue branch May 21, 2026 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

T-Feature Type: New feature in software or test case description V-Minor Versioning: The change gives an update of minor in version.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants