feat(task): enforce per-task timeout configuration#8250
Conversation
Summary of ChangesHello @tvararu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request completes the implementation of per-task timeout configurations by introducing native enforcement at the execution level. Previously, Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request correctly implements per-task timeouts by adding the necessary logic to CmdLineRunner and TaskExecutor. The end-to-end tests have also been updated to cover the new functionality, which is great. I've identified a potential race condition in the timeout handling logic within CmdLineRunner concerning spurious wakeups and memory ordering. I've provided specific suggestions to address these correctness issues. Overall, this is a solid implementation of the feature.
b4bd495 to
bbc6afa
Compare
There was a problem hiding this comment.
This review was AI-generated.
Nice first Rust PR! The overall approach is sound — adding a timer thread with condvar cancellation for per-task timeouts and the SIGTERM→SIGKILL escalation. A few issues to address though:
Blocking Issues
1. Mutex deadlock during SIGKILL grace period (up to 5s hang)
After sending SIGTERM, the timer thread holds the mutex while doing a second cvar.wait_timeout(guard, 5s):
let _ = nix::sys::signal::kill(pid, nix::sys::signal::Signal::SIGTERM);
let (_guard, result) = cvar.wait_timeout(guard, Duration::from_secs(5)).unwrap();Meanwhile, the main thread (after the process exits from SIGTERM) tries to acquire the same mutex to set the cancel flag:
*lock.lock().unwrap() = true; // blocks — timer thread holds the lock
cvar.notify_one(); // never reachedSo the timer thread is waiting on a condvar notification that can't be sent because the main thread is blocked on the mutex. This is a deadlock for up to 5 seconds. The timer thread's wait_timeout will eventually expire, but the main thread is unnecessarily stalled.
Fix: either (a) don't hold the mutex during the SIGKILL grace period (use a simple thread::sleep), or (b) restructure so the cancel flag uses an AtomicBool checked after each wait_timeout instead of holding the mutex across both waits.
2. Race condition: process completes at timeout boundary → false "timed out" error
If the timeout expires at nearly the same instant the process exits successfully:
- Timer thread:
wait_timeoutreturns,timed_out() == true - Timer thread:
timed_out.store(true, ...) - Main thread: receives
ExitStatus(success), exits rx loop - Main thread: sets cancel (too late)
- Main thread: checks
timed_out→ true → bails with "timed out" even though the task succeeded
The window is narrow but real. Consider checking the exit status before checking timed_out, or using a single source of truth for the outcome.
3. execute_raw doesn't get timeout support
When raw = true (either via --raw flag or settings.raw), tasks go through execute_raw() which has no timeout logic. Tasks in raw mode will silently ignore their timeout configuration.
What looks good
- SIGTERM→5s grace→SIGKILL escalation is the right pattern
- Condvar-based cancellation (vs busy-polling or
thread::sleep) is the right approach, just needs the mutex issue fixed - E2e tests now exercise real timeout behavior instead of TODO placeholders
- The
with_timeoutbuilder API is clean and consistent with the rest ofCmdLineRunner
|
@jdx thanks for the review! I did find that same race condition in my back/forth reviews with Claude and Codex but decided to YAGNI it because when I asked them for a fix they ended up quite complex. I wasn't sure it was worth it... I'll take the feedback and plug away + push up some more commits. I'll ping on Discord when it's ready for another review |
15ca04b to
f6c9414
Compare
f6c9414 to
75cc080
Compare
|
@jdx this is ready for another look 👍 |
There was a problem hiding this comment.
This review was AI-generated.
Nice first Rust PR! The architecture is solid. A couple of notes before looking at issues:
Re: jdx's blocking issues — both appear to be already fixed in the current diff:
- Deadlock: The timer thread calls
drop(guard)beforethread::sleep(5s), so the main thread can freely acquire the lock and set the cancel flag during the grace period. No deadlock. execute_raw: The diff does addTimeoutGuard::newtoexecute_raw. Both paths are covered.
jdx's reviews were likely based on an earlier draft. The race condition (#2) is still real — see below.
Issues
1. Spurious wakeups can silently swallow the timeout
Condvar::wait_timeout can return early with timed_out() == false due to a spurious wakeup. The current code treats a spurious wakeup identically to a genuine cancellation — it exits the timer thread and the process never gets killed:
let (guard, wait_result) = cvar.wait_timeout(guard, timeout).unwrap();
if !wait_result.timed_out() {
return; // BUG: spurious wakeup exits here; timeout never fires
}The correct pattern is to loop, track remaining time, and check the cancel predicate explicitly:
let mut guard = lock.lock().unwrap();
let deadline = std::time::Instant::now() + timeout;
loop {
if *guard { return; } // cancelled
let remaining = deadline.saturating_duration_since(std::time::Instant::now());
if remaining.is_zero() { break; }
let (g, result) = cvar.wait_timeout(guard, remaining).unwrap();
guard = g;
if result.timed_out() { break; }
// spurious wakeup: loop and re-check
}2. Race condition at timeout boundary (confirming jdx's #2)
If the timeout fires at nearly the same instant the process exits successfully:
- Timer sets
timed_out = true, sends SIGTERM - Process was already exiting — SIGTERM is moot
- Main thread receives
ExitStatus(success), exits rx loop - Main thread calls
g.cancel()(too late — already set) - Main thread checks
timed_out()→true→ bails "timed out" even though the task succeeded
The fix is simpler than it may seem — just rearrange the existing checks so timed_out() is only checked when the exit status is non-success (in both execute and execute_raw):
// Before:
if timeout_guard.as_ref().is_some_and(|g| g.timed_out()) {
bail!("timed out after {:?}", self.timeout.unwrap());
}
if !status.success() {
self.on_error(combined_output, status)?;
}
// After:
if !status.success() {
if timeout_guard.as_ref().is_some_and(|g| g.timed_out()) {
bail!("timed out after {:?}", self.timeout.unwrap());
}
self.on_error(combined_output, status)?;
}If the process exited successfully, we don't care whether the timer also fired — the task completed its work. If it exited with failure and the timer fired, SIGTERM killed it so we report "timed out". ~2-line change in two places.
3. Kill by PID, not PGID — child processes may survive the timeout
The existing kill_all and signal forwarding use killpg() to kill the entire process group. The timeout guard only does kill(pid, SIGTERM).
If a task spawns sub-processes (e.g. run = "npm run build"), only the top-level process is killed — children survive.
Note: setpgid is only called when stdin is not a TTY (and not at all in execute_raw), so killpg() won't always work. Should try killpg() first with a fallback to kill(), matching the existing kill_all pattern:
let pgid = nix::unistd::Pid::from_raw(pid as i32);
if nix::sys::signal::killpg(pgid, Signal::SIGTERM).is_err() {
let _ = nix::sys::signal::kill(pgid, Signal::SIGTERM);
}4. SIGKILL grace period can't be cancelled (minor)
After SIGTERM kills the process, the main thread calls g.cancel(), but cancel() only notifies the condvar — and at that point the timer thread is inside thread::sleep(5s), not waiting on the condvar. The notification is missed; the timer thread always sleeps the full 5 seconds before checking the cancel flag.
Result: a zombie timer thread lingers for up to 5 seconds after every timed-out task. There's no correctness problem (SIGKILL on a dead PID returns ESRCH, silently ignored), just wasted resources. Could be fixed by replacing thread::sleep with another cvar.wait_timeout so the grace period is also cancellable.
What looks good
- The
let _ = tx.send(...)change throughoutexecute()is necessary and correct — with timeout, the receiver may be dropped before all output is drained, and.unwrap()would panic. with_timeoutbuilder API fits naturally with the existingCmdLineRunnerpattern.task_executor.rs:t.min(g)for the effective timeout (task vs. global) is clean.- SIGTERM → 5s grace → SIGKILL escalation is the right pattern for Unix process cleanup.
0a2c657 to
82761ad
Compare
|
@jdx fixed that + squashed the commits on this branch into a single one so the log is a bit cleaner |
82761ad to
e349fa2
Compare
e349fa2 to
6a09bfb
Compare
|
Did another force push to sneak in a: impl Drop for TimeoutGuard {
fn drop(&mut self) {
self.cancel();
}
}As per my own Claude's review: Add Drop impl for TimeoutGuard
Context
If execute() bails early (error, panic, or the tokio-level timeout in
run.rs dropping the future) before reaching g.cancel(), the timer thread is
orphaned and will eventually send signals to a potentially-recycled PID.
Adding a Drop impl ensures cancel() is always called.
File to modify
src/cmd.rs — add Drop impl after the existing TimeoutGuard impl block
(after line 171)
impl Drop for TimeoutGuard {
fn drop(&mut self) {
self.cancel();
}
} |
Code ReviewOverview: Adds a Issues1. Double timeout enforcement (design concern) The global
For a single task, both fire at roughly the same time, producing a race between two different error messages ( This should probably be rationalized — either remove the global timeout from 2. Race between cancel and timed_out (minor bug) In If the process exited with failure for a non-timeout reason (e.g., runtime error) right at the boundary, the error would be misreported as "timed out". The fix would be to check the cancel flag before setting 3. No Windows grace period On Windows, 4. The guard is created after the child process and stdout/stderr reader threads spawn. If the timeout fires while signal forwarding is also active (e.g., user hits Ctrl+C at the same moment), both the signal forwarder and the timeout guard will send signals to the process. This shouldn't cause incorrect behavior, but the child could receive SIGTERM twice. Nits
What looks good
This review was AI-generated using Claude Opus 4.6. |
CmdLineRunner had no mechanism to enforce per-task timeouts. Tasks could specify a timeout duration but it was never applied during execution — the process would run indefinitely regardless. Add a TimeoutGuard that spawns a timer thread using a condvar with deadline-tracking loop (handling spurious wakeups correctly). On expiry it escalates SIGTERM → 5s grace → SIGKILL using killpg with fallback to kill, matching the existing signal-forwarding pattern so child process trees are terminated. The grace period is itself cancellable via the same condvar, so the timer thread exits promptly when the process dies from SIGTERM. Both execute and execute_raw paths use the guard. Timeout is only reported when exit status is non-success, so a task that finishes its work at the boundary isn't falsely treated as timed out. An AtomicU8 state machine (RUNNING/CANCELLED/TIMED_OUT) with CAS arbitrates the cancel-vs-timeout race so non-timeout failures cannot be misreported. Per-task timeouts are kept separate from the whole-run tokio timeout in cli/run.rs — each layer serves a distinct purpose.
87ae482 to
efa39a7
Compare
|
@jdx done, also pulled out |
jdx
left a comment
There was a problem hiding this comment.
This review was AI-generated.
The earlier AI reviews covered most issues and the current diff addresses the spurious wakeup and race condition concerns they raised. One thing that hasn't been called out clearly:
Global task_timeout vs per-task timeout interaction is misleading
The PR description says:
mise run --timeout=1m deploy # killed after 30s (global setting still wins)
But this "lower wins" behavior isn't actually implemented. The global task_timeout / --timeout flag is a tokio wall-clock timeout wrapping the entire parallelize_tasks call in src/cli/run.rs:337-340. The new per-task timeout is a per-process TimeoutGuard in CmdLineRunner. These are completely independent timers at different levels — they don't compare against each other.
In task_executor.rs, only the task-level timeout is applied:
let effective_timeout = task.timeout.as_ref().and_then(|s| match duration::parse_duration(s) { ... });There's no min(task_timeout, per_task_timeout) logic. Either the PR description should be updated to reflect the actual behavior, or the code should implement the advertised semantics.
jdx
left a comment
There was a problem hiding this comment.
This review was AI-generated.
The earlier AI reviews covered most issues and the current diff addresses the spurious wakeup and race condition concerns they raised. Reiterating one point that was raised before:
Global task_timeout vs per-task timeout interaction is misleading
The PR description says:
mise run --timeout=1m deploy # killed after 30s (global setting still wins)
But this "lower wins" behavior isn't actually implemented. The global task_timeout / --timeout flag is a tokio wall-clock timeout wrapping the entire parallelize_tasks call in src/cli/run.rs:337-340. The new per-task timeout is a per-process TimeoutGuard in CmdLineRunner. These are completely independent timers at different levels — they don't compare against each other.
In task_executor.rs, only the task-level timeout is applied:
let effective_timeout = task.timeout.as_ref().and_then(|s| match duration::parse_duration(s) { ... });There's no min(task_timeout, per_task_timeout) logic. Either the PR description should be updated to reflect the actual behavior, or the code should implement the advertised semantics.
jdx
left a comment
There was a problem hiding this comment.
This review was AI-generated.
Correction to my previous comment — the "lower wins" behavior described in the PR does work in practice, just not via explicit min() logic. The global task_timeout / --timeout fires as a tokio wall-clock timeout on the entire mise run invocation (run.rs:337), while the per-task timeout fires as a per-process TimeoutGuard. They're independent timers that race, and whichever is shorter naturally fires first.
So the described behavior is correct. It might be worth documenting that these are two different mechanisms (whole-run vs per-process), but functionally there's no bug here. Apologies for the noise.
|
Amazing! Thanks so much @jdx for helping me get this in, I can't wait to use this feature 🎉 |
### 🚀 Features - **(conda)** replace custom backend with rattler crates by @jdx in [#8325](#8325) - **(task)** enforce per-task timeout configuration by @tvararu in [#8250](#8250) - **(vsix)** added vsix archives to http backend by @sosumappu in [#8306](#8306) - add core dotnet plugin for .NET SDK management by @jdx in [#8326](#8326) ### 🐛 Bug Fixes - **(conda)** preserve conda_packages on locked install and fix temp file race by @jdx in [#8335](#8335) - **(conda)** deduplicate repodata records to fix solver error on Linux by @jdx in [#8337](#8337) - **(env)** include watch_files in fast-path early exit check by @jdx in [#8317](#8317) - **(env)** clear fish completions when setting/unsetting shell aliases by @jdx in [#8324](#8324) - **(lockfile)** prevent lockfile writes when --locked is set by @jdx in [#8308](#8308) - **(lockfile)** prune orphan tool entries on mise lock by @mackwic in [#8265](#8265) - **(lockfile)** error on contradictory locked=true + lockfile=false config by @jdx in [#8329](#8329) - **(regal)** Update package location by @charlieegan3 in [#8315](#8315) - **(release)** strip markdown heading prefix from communique release title by @jdx in [#8303](#8303) - **(schema)** enforce additionalProperties constraint for env by @adamliang0 in [#8328](#8328) ### 📚 Documentation - Remove incorrect oh-my-zsh plugin ordering comment by @bvosk in [#8323](#8323) - require AI disclosure on GitHub comments by @jdx in [#8330](#8330) ### 📦 Registry - add `oxfmt` by @taoufik07 in [#8316](#8316) ### New Contributors - @adamliang0 made their first contribution in [#8328](#8328) - @tvararu made their first contribution in [#8250](#8250) - @bvosk made their first contribution in [#8323](#8323) - @taoufik07 made their first contribution in [#8316](#8316) - @charlieegan3 made their first contribution in [#8315](#8315) - @sosumappu made their first contribution in [#8306](#8306) ## 📦 Aqua Registry Updates #### New Packages (3) - [`Tyrrrz/FFmpegBin`](https://github.com/Tyrrrz/FFmpegBin) - [`elixir-lang/expert`](https://github.com/elixir-lang/expert) - [`erikjuhani/basalt`](https://github.com/erikjuhani/basalt) #### Updated Packages (5) - [`caarlos0/fork-cleaner`](https://github.com/caarlos0/fork-cleaner) - [`firecow/gitlab-ci-local`](https://github.com/firecow/gitlab-ci-local) - [`jackchuka/mdschema`](https://github.com/jackchuka/mdschema) - [`kunobi-ninja/kunobi-releases`](https://github.com/kunobi-ninja/kunobi-releases) - [`peco/peco`](https://github.com/peco/peco)
## Summary Per-task `timeout` fields were added in jdx#6216 but never enforced at execution time. This PR completes the implementation by adding native timeout support to `CmdLineRunner::execute()`. ## Example Usage ```toml [settings] task_timeout = "30s" [tasks.deploy] run = "npm run deploy" timeout = "5m" [tasks.quick] run = "echo 'done'" timeout = "10s" ``` ```bash mise run deploy # killed after 30s (global wins, it's lower) mise run quick # killed after 10s (task wins, it's lower) mise run --timeout=1m deploy # killed after 30s (global setting still wins) ``` ## Key changes - Added `timeout` field and timer thread to `CmdLineRunner::execute()` - Modified `exec_program` to compute effective timeout from task config and global setting - Enabled all TODO-skipped e2e timeout test scenarios ## Testing - [x] `mise test:e2e test_task_timeout` — all 6 scenarios pass - [x] `cargo clippy --all-features` — clean ## LLM use for this PR I initially built this towards a different idea (that `timeout`s win, not the global one) but scrapped it after a chat on Discord. All implementation was done with Opus 4.6 and Codex 5.3, with them reviewing each other's code a few times. Eventually I settled on this implementation, trying to keep the changeset as minimal as possible for review. This is my first public Rust PR; I don't feel super confident in the implementation, but I'm happy to take feedback and go back to the drawing boards with the LLMs if there's a better way to do this. 👍 --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
### 🚀 Features - **(conda)** replace custom backend with rattler crates by @jdx in [jdx#8325](jdx#8325) - **(task)** enforce per-task timeout configuration by @tvararu in [jdx#8250](jdx#8250) - **(vsix)** added vsix archives to http backend by @sosumappu in [jdx#8306](jdx#8306) - add core dotnet plugin for .NET SDK management by @jdx in [jdx#8326](jdx#8326) ### 🐛 Bug Fixes - **(conda)** preserve conda_packages on locked install and fix temp file race by @jdx in [jdx#8335](jdx#8335) - **(conda)** deduplicate repodata records to fix solver error on Linux by @jdx in [jdx#8337](jdx#8337) - **(env)** include watch_files in fast-path early exit check by @jdx in [jdx#8317](jdx#8317) - **(env)** clear fish completions when setting/unsetting shell aliases by @jdx in [jdx#8324](jdx#8324) - **(lockfile)** prevent lockfile writes when --locked is set by @jdx in [jdx#8308](jdx#8308) - **(lockfile)** prune orphan tool entries on mise lock by @mackwic in [jdx#8265](jdx#8265) - **(lockfile)** error on contradictory locked=true + lockfile=false config by @jdx in [jdx#8329](jdx#8329) - **(regal)** Update package location by @charlieegan3 in [jdx#8315](jdx#8315) - **(release)** strip markdown heading prefix from communique release title by @jdx in [jdx#8303](jdx#8303) - **(schema)** enforce additionalProperties constraint for env by @adamliang0 in [jdx#8328](jdx#8328) ### 📚 Documentation - Remove incorrect oh-my-zsh plugin ordering comment by @bvosk in [jdx#8323](jdx#8323) - require AI disclosure on GitHub comments by @jdx in [jdx#8330](jdx#8330) ### 📦 Registry - add `oxfmt` by @taoufik07 in [jdx#8316](jdx#8316) ### New Contributors - @adamliang0 made their first contribution in [jdx#8328](jdx#8328) - @tvararu made their first contribution in [jdx#8250](jdx#8250) - @bvosk made their first contribution in [jdx#8323](jdx#8323) - @taoufik07 made their first contribution in [jdx#8316](jdx#8316) - @charlieegan3 made their first contribution in [jdx#8315](jdx#8315) - @sosumappu made their first contribution in [jdx#8306](jdx#8306) ## 📦 Aqua Registry Updates #### New Packages (3) - [`Tyrrrz/FFmpegBin`](https://github.com/Tyrrrz/FFmpegBin) - [`elixir-lang/expert`](https://github.com/elixir-lang/expert) - [`erikjuhani/basalt`](https://github.com/erikjuhani/basalt) #### Updated Packages (5) - [`caarlos0/fork-cleaner`](https://github.com/caarlos0/fork-cleaner) - [`firecow/gitlab-ci-local`](https://github.com/firecow/gitlab-ci-local) - [`jackchuka/mdschema`](https://github.com/jackchuka/mdschema) - [`kunobi-ninja/kunobi-releases`](https://github.com/kunobi-ninja/kunobi-releases) - [`peco/peco`](https://github.com/peco/peco)
Summary
Per-task
timeoutfields were added in #6216 but never enforced at execution time. This PR completes the implementation by adding native timeout support toCmdLineRunner::execute().Example Usage
Key changes
timeoutfield and timer thread toCmdLineRunner::execute()exec_programto compute effective timeout from task config and global settingTesting
mise test:e2e test_task_timeout— all 6 scenarios passcargo clippy --all-features— cleanLLM use for this PR
I initially built this towards a different idea (that
timeouts win, not the global one) but scrapped it after a chat on Discord. All implementation was done with Opus 4.6 and Codex 5.3, with them reviewing each other's code a few times. Eventually I settled on this implementation, trying to keep the changeset as minimal as possible for review.This is my first public Rust PR; I don't feel super confident in the implementation, but I'm happy to take feedback and go back to the drawing boards with the LLMs if there's a better way to do this. 👍