Set local task exit status when time limit is exceeded#6592
Set local task exit status when time limit is exceeded#6592bentsherman merged 2 commits intomasterfrom
Conversation
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
| */ | ||
| if( elapsedTimeMillis() > wallTimeMillis ) { | ||
| destroy() | ||
| task.exitStatus = process.exitValue() |
There was a problem hiding this comment.
Would not make sense to use a predictable exit status (we event is known). Otherwise it can entirely depend on "destroy" implementation, and I'm not even sure that's going to a deterministic value.
There was a problem hiding this comment.
It's an interesting question. I think you could argue either way -- to make it consistent at Nextflow level, or make it consistent with the underlying OS, which the JVM implementation is likely to imitate
I figure that if a user has a pipeline with a mix of e.g. local and SLURM jobs, they would prefer to use the same error codes for both. That's why I lean towards deferring to the JVM
When I tested locally, it returned 143 which corresponds to SIGTERM. This is probably what most implementations do. Some might use 137 (SIGKILL) but that seems less likely since it's more aggressive
There was a problem hiding this comment.
I think the current approach is correct and coherent with the other executors. The UNIXProcess implementation in most of JVM sends the signal TERM. Moreover, the .command.run has a on_term trap for TERM INT USR2 and another on_exit trap that unifies the kill of children processes with signal TERM, and the on_exit that manages the final exit code. So, if for some reason there is a JVM, Queue system or Cloud provider that uses signals INT or USR2 instead of TERM to cancel a task, we should get the same exit code.
There was a problem hiding this comment.
However guessing approach and I don't see what's preventing using a static code, so it's entirely predictable.
There was a problem hiding this comment.
It's not a big deal either way. This is a very minor point. Let's see how the JVM behaves and if any interesting cases come about
| */ | ||
| if( elapsedTimeMillis() > wallTimeMillis ) { | ||
| destroy() | ||
| task.exitStatus = process.exitValue() |
There was a problem hiding this comment.
I think the current approach is correct and coherent with the other executors. The UNIXProcess implementation in most of JVM sends the signal TERM. Moreover, the .command.run has a on_term trap for TERM INT USR2 and another on_exit trap that unifies the kill of children processes with signal TERM, and the on_exit that manages the final exit code. So, if for some reason there is a JVM, Queue system or Cloud provider that uses signals INT or USR2 instead of TERM to cancel a task, we should get the same exit code.
Close #6549
This PR fixes an issue where local tasks that exceed their time limit do not have an exit status, making it difficult to apply common error strategies such as retrying with more time based on the exit code.
The exit status is inferred from the sub-process.