Run several Test Agents against the same queue by matsduf · Pull Request #1115 · zonemaster/zonemaster-backend

matsduf · 2023-06-30T16:12:50Z

Purpose

When a large batch is to be tested, adding several Test Agents increases performance if the server has large capacity. Several servers could also be used. One limitation is that current code only allows one Test Agent per queue. All tests in a batch always belong to the same queue.

This PR makes it possible for several Test Agents to fetch un-run tests from the same queue without the risk of two or more Test Agents testing the same item.

This PR does not handle possible race condition between multiple Test Agents when it comes to process_unfinished_tests(), but that could be worked-around by setting a longer ZONEMASTER_max_zonemaster_execution_time on all Test Agents except one.

This PR also ensures that the progress counter is never set to 0% or 1% during the testing phase.

Changes

lib/Zonemaster/Backend/DB.pm
lib/Zonemaster/Backend/TestAgent.pm

How to test this PR

Run a large batch with one Test Agent and nothing should be changed.
Run a large batch with two Test Agents (the same queue) and there should be no cases when the same test is run by more than one worker.

…t risk of race condition

ghost · 2023-07-03T13:41:06Z

+                        $self->format_time( time() ),
+                        $test_id,
+                    )) {
+                return undef;


This seems a little bit hacky to return undef here. To me test_progress is both a getter and a setter and returns the progress value stored in database. Here it would extend its logic to something new. Would it be possible to move this new logic to another method (or maybe rethink the whole test_progress method)?

If test_progress() is split into set_test_progress() and get_test_progress() with the following logic, do you think that could work?

set_test_progress() returns

1 if setting succeeded.

undef if setting failed (including test_id does not exist)

get_test_progress() returns

Actual value if test_id exists

undef if test_id does not exist

I agree that test_progress() is overloaded with meanings and we should clean it up. Having a combined getter and setter isn't too bad IMO. That's idiomatic in Perl anyways. The problem is that in addition to getting/setting progress we're using it to make state transitions. I also find it problematic that it's trying to paper over bugs elsewhere in the code.

Upon closer inspection I came up with these suggestions:

Make get_test_request() execute UPDATE SET progress=1 WHERE progress=0 directly instead of it calling test_progress(). That's the only place we want to bump progress from 0 to 1 anyway.

Remove the special handling of $progress==1 from test_progress(). Just make it set progress to roughly one percent.

Instead of clamping of $progress to [1, 99] in test_progress(), make it die unless 0 <= $progress < 100, and add a special case just for if $progress==0 { $progress=1 }. Clamping 0 up to 1 allows us to have more than 99 test cases. But if we're trying to set progress to any value outside [0, 99] then clearly there are bugs and we can't trust the results.

Give the remaining UPDATE in test_progress() a more stringent guard clause making sure that 1 <= old_progress <= new_progress, and make it die if no rows are affected by the UPDATE. If we're trying to update progress for a job that isn't running or if we're trying to reverse progress, then clearly there are bugs and we can't trust the results.

Make get_test_request() loop until SELECT returns nothing or until UPDATE succeeds, whichever happens first. I believe spinning is the right thing to do here for as long as there are jobs to allocate.

Instead of a loop both queries could be performed in a transaction that locks the selected row for update. This would be more efficient.

matsduf · 2023-11-02T14:17:44Z

Replaced by #1121

matsduf added 2 commits June 30, 2023 17:04

Makes it possible to run several test agents on the same queue withou…

4506c22

…t risk of race condition

Prevents test agent to try to set progress to 0 or 1

0fcae55

matsduf added T-Feature Type: New feature in software or test case description V-Minor Versioning: The change gives an update of minor in version. labels Jun 30, 2023

matsduf requested review from a user, hannaeko, marc-vanderwal, matsstralbergiis, mattias-p and tgreenx June 30, 2023 16:12

ghost reviewed Jul 3, 2023

View reviewed changes

matsduf requested a review from a user July 3, 2023 15:11

Fixes so that non-updates are captured

4bd1030

tgreenx added this to the v2023.2 milestone Jul 19, 2023

This was referenced Jul 19, 2023

Race condition in garbage collector #1120

Open

Synchronized claiming of jobs for processing #1121

Merged

matsduf closed this Nov 2, 2023

matsduf deleted the multiple-testagent-one-queue branch May 21, 2026 08:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run several Test Agents against the same queue#1115

Run several Test Agents against the same queue#1115
matsduf wants to merge 3 commits into
zonemaster:developfrom
matsduf:multiple-testagent-one-queue

matsduf commented Jun 30, 2023

Uh oh!

ghost Jul 3, 2023

Uh oh!

matsduf Jul 3, 2023 •

edited

Loading

Uh oh!

mattias-p Jul 10, 2023

Uh oh!

matsduf commented Nov 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

matsduf commented Jun 30, 2023

Purpose

Changes

How to test this PR

Uh oh!

ghost Jul 3, 2023

Choose a reason for hiding this comment

Uh oh!

matsduf Jul 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattias-p Jul 10, 2023

Choose a reason for hiding this comment

Uh oh!

matsduf commented Nov 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

matsduf Jul 3, 2023 •

edited

Loading