Run several Test Agents against the same queue#1115
Conversation
| $self->format_time( time() ), | ||
| $test_id, | ||
| )) { | ||
| return undef; |
There was a problem hiding this comment.
This seems a little bit hacky to return undef here. To me test_progress is both a getter and a setter and returns the progress value stored in database. Here it would extend its logic to something new. Would it be possible to move this new logic to another method (or maybe rethink the whole test_progress method)?
There was a problem hiding this comment.
If test_progress() is split into set_test_progress() and get_test_progress() with the following logic, do you think that could work?
set_test_progress() returns
- 1 if setting succeeded.
undefif setting failed (including test_id does not exist)
get_test_progress() returns
- Actual value if test_id exists
undefif test_id does not exist
There was a problem hiding this comment.
I agree that test_progress() is overloaded with meanings and we should clean it up. Having a combined getter and setter isn't too bad IMO. That's idiomatic in Perl anyways. The problem is that in addition to getting/setting progress we're using it to make state transitions. I also find it problematic that it's trying to paper over bugs elsewhere in the code.
Upon closer inspection I came up with these suggestions:
- Make
get_test_request()executeUPDATE SET progress=1 WHERE progress=0directly instead of it callingtest_progress(). That's the only place we want to bump progress from 0 to 1 anyway. - Remove the special handling of
$progress==1fromtest_progress(). Just make it set progress to roughly one percent. - Instead of clamping of $progress to [1, 99] in
test_progress(), make it die unless0 <= $progress < 100, and add a special case just forif $progress==0 { $progress=1 }. Clamping 0 up to 1 allows us to have more than 99 test cases. But if we're trying to set progress to any value outside [0, 99] then clearly there are bugs and we can't trust the results. - Give the remaining
UPDATEintest_progress()a more stringent guard clause making sure that1 <= old_progress <= new_progress, and make it die if no rows are affected by the UPDATE. If we're trying to update progress for a job that isn't running or if we're trying to reverse progress, then clearly there are bugs and we can't trust the results. - Make
get_test_request()loop until SELECT returns nothing or until UPDATE succeeds, whichever happens first. I believe spinning is the right thing to do here for as long as there are jobs to allocate.- Instead of a loop both queries could be performed in a transaction that locks the selected row for update. This would be more efficient.
|
Replaced by #1121 |
Purpose
When a large batch is to be tested, adding several Test Agents increases performance if the server has large capacity. Several servers could also be used. One limitation is that current code only allows one Test Agent per queue. All tests in a batch always belong to the same queue.
This PR makes it possible for several Test Agents to fetch un-run tests from the same queue without the risk of two or more Test Agents testing the same item.
This PR does not handle possible race condition between multiple Test Agents when it comes to
process_unfinished_tests(), but that could be worked-around by setting a longerZONEMASTER_max_zonemaster_execution_timeon all Test Agents except one.This PR also ensures that the progress counter is never set to 0% or 1% during the testing phase.
Changes
How to test this PR