Ensure cancellation reason visible when task is canceled (#142079)#144629
Ensure cancellation reason visible when task is canceled (#142079)#144629michalborek merged 16 commits intoelastic:mainfrom
Conversation
|
💚 CLA has been signed |
) Although, the cancellation reason is set in a synchronised block the access to the variable is not synchronized so there is no atomicity ensured there. We could synchronize the getters but that would be slow but since reason is always read after the isCancelled, this operation is safe and doesn't bring additional synchronization cost.
|
Pinging @elastic/es-distributed (Team:Distributed) |
DaveCTurner
left a comment
There was a problem hiding this comment.
Hmm, todays order of assignments is deliberate, according to the javadoc for getReasonCancelled:
May also be null if the task was just cancelled since we don't set the reason and the cancellation flag atomically.
Instead, the expectation is that callers don't rely on the reason being set when the cancelled flag is, instead using something like ensureNotCancelled or notifyIfCancelled which should avoid the race.
Can you instead fix up the caller that's trying to access the reason in a racy way?
The isCancelled field was not necessary to determine whether the task was cancelled. Equally, we can check whether reason field is null. This change will be slightly less readable but allows us to remove the synchronized block and simplify the cancel() method.
DaveCTurner
left a comment
There was a problem hiding this comment.
Could you adjust CancellableTasksTests to demonstrate the bug that this fixes (i.e. that getReasonCancelled cannot return null after isCancelled returns true)?
server/src/main/java/org/elasticsearch/tasks/CancellableTask.java
Outdated
Show resolved
Hide resolved
|
Hi @michalborek, I've created a changelog YAML for you. |
...r/src/test/java/org/elasticsearch/action/admin/cluster/node/tasks/CancellableTasksTests.java
Outdated
Show resolved
Hide resolved
...r/src/test/java/org/elasticsearch/action/admin/cluster/node/tasks/CancellableTasksTests.java
Outdated
Show resolved
Hide resolved
...r/src/test/java/org/elasticsearch/action/admin/cluster/node/tasks/CancellableTasksTests.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/tasks/CancellableTask.java
Outdated
Show resolved
Hide resolved
DaveCTurner
left a comment
There was a problem hiding this comment.
LGTM thanks @michalborek !
szybia
left a comment
There was a problem hiding this comment.
nice work! TIL more about VarHandle and Thread.onSpinWait :)
| while (task.isCancelled() == false) { | ||
| Thread.onSpinWait(); | ||
| } | ||
| assertThat( | ||
| "toString should consistently render status and reason", | ||
| task.toString(), | ||
| anyOf(endsWith("reason='null', isCancelled=false}"), endsWith("reason='test-reason', isCancelled=true}")) | ||
| ); |
There was a problem hiding this comment.
am i right in saying that after the while loop we can only get reason='test-reason', isCancelled=true}?
and within the while loop we can get both
| while (task.isCancelled() == false) { | |
| Thread.onSpinWait(); | |
| } | |
| assertThat( | |
| "toString should consistently render status and reason", | |
| task.toString(), | |
| anyOf(endsWith("reason='null', isCancelled=false}"), endsWith("reason='test-reason', isCancelled=true}")) | |
| ); | |
| while (task.isCancelled() == false) { | |
| assertThat( | |
| "toString should consistently render status and reason", | |
| task.toString(), | |
| anyOf(endsWith("reason='null', isCancelled=false}"), endsWith("reason='test-reason', isCancelled=true}")) | |
| ); | |
| } | |
| assertThat( | |
| "toString should consistently render status and reason", | |
| task.toString(), | |
| endsWith("reason='test-reason', isCancelled=true}") | |
| ); |
basically same comment as David: #144629 (comment)
i guess not critical but IIUC stronger assertion to have
There was a problem hiding this comment.
Hm, the intention was slightly different.
"Until the task isn't cancelled, I don't care. When it gets cancelled, it should be consistent."
There was a problem hiding this comment.
Ah yeah sorry I missed this. We do care while the task is (recently) not cancelled that we do not see an inconsistent result, and once we know the task is definitely cancelled we must not see isCancelled=false again. So yes I'd rather we adjusted it as Szymon suggests.
There was a problem hiding this comment.
Since I prematurely hit Merge, I've prepared a PR for this: #144716 :)
) Although the cancellation reason is set in a synchronized block, access to the reason variable is not synchronized; therefore, the operation is not atomic. By removing the additional isCancelled field and replacing its usage with a null-check on reason, we achieve the same behavior without the risk of the reason field being null when a task is canceled. Additionally, we can get rid of all the synchronization blocks in this class. To efficiently compare and set the reason, java.lang.invoke.VarHandle was used. IMHO, this is a genuine bug caught by a test that is currently muted. This PR fixes elastic#142079, where a test was randomly failing.
Although, the cancellation reason is set in a synchronized block the access to the
reasonvariable is not synchronized so there operation is not atomic.By removing an additional field (
isCancelled) and replacing its usage by comparing thereasontonullwe can achieve the same behavior without a risk ofreasonfield being null when the task was canceled.To efficiently compare and set the
reason,java.lang.invoke.VarHandlewas used.IMHO this is a genuine bug caught by a test that is currently muted. This PR fixes #142079 issue.