Conversation
|
I'm also forgetting whether I have to do something to have this listed in the sidebar under "Developer Documentation", or whether that will happen automatically. |
|
Other thoughts:
|
|
Might it be worth mentioning using the pid with https://docs.julialang.org/en/v1/stdlib/Profile/#Triggered-During-Execution which will print the profile output into the Or is profiling this kind of lingering task not very informative? |
|
I'm not certain that profiling would capture a timer since it fires rarely, but honestly I don't know. One issue is that getting the PID does not seem straightforward. The problem is that the detailed message doesn't show until you hit Ctrl-C, and at that point the process exits so there's nothing left to profile. The only way I've found to grab the PID while it's still running is to use |
That is a separate issue, which Ian already has a couple PRs up to fix. |
|
|
Update: I interpret JuliaLang/Pkg.jl#3583 as suggesting that adding the PID doesn't really help much.
If folks think this would be a good idea, I'd be happy to help with the C part of this effort (it's pretty trivial), but the Pkg logic is something that I could use some assistance with. |
The C part is just calling |
|
Sorry, what I meant was "triggering |
|
To be precise though, it fails on the first time the process calls |
|
I see, so you're saying this could even be an Aqua test with no Julia changes required? |
|
Potentially, yeah. That function just needs to be exposed somehow (which indeed will require a few small C changes) |
|
6284153 now has the C infrastructure I think we'd need. I've verified that ~/src/julia/julia --startup=no --project -e 'using NoTimer; exit(; wait_task=true)'exits cleanly for a fake package that does not start a timer in its Alternately, @vtjnash, It occurs to me that you once surprised me by saying that we probably could run |
|
If I may ask: Since this is a very user-facing problem, why is the documentation buried in the devdocs? Shouldn't there be at least an explanation in the main manual and/or the Pkg docs as well? |
|
Care to make a concrete suggestion of exactly where it should go? |
base/initdefs.jl
Outdated
| """ | ||
| exit(n) = ccall(:jl_exit, Cvoid, (Int32,), n) | ||
| exit() = exit(0) | ||
| exit(n; wait_task::Bool=false) = ccall(:jl_exit, Cvoid, (Int32, Int32), n, wait_task) |
There was a problem hiding this comment.
naming bikeshedding time: should we say this is exit(0, process=true), since it specifies whether this call just kills this task or all tasks?
There was a problem hiding this comment.
I went with kill_tasks=true.
vtjnash
left a comment
There was a problem hiding this comment.
SGTM. Just made a couple small comments
|
Hmm, I realized one issue with the One thought would be to encourage people to write it function __init__()
if ccall(:jl_generating_output, Cint, ()) == 1 || parse(Bool, get(ENV, "JULIA_AQUA_TEST_INIT_TASKS", "false"))
return nothing
end
....
endbut that's less than elegant since it requires a new environment variable simply to avoid a false positive. |
|
Given that the |
|
|
||
| """ | ||
| exit(code=0) | ||
| exit(code=0; kill_tasks=true) |
There was a problem hiding this comment.
It doesn't kill the tasks, just waits for them?
There was a problem hiding this comment.
The default is to kill all tasks (as before). The new optional behavior (kill_tasks=false) is to run the rest of the program to completion and only kill this task.
| JL_DLLEXPORT void jl_task_wait_empty(void); | ||
| JL_DLLEXPORT void jl_postoutput_hook(void); | ||
| JL_DLLEXPORT void JL_NORETURN jl_exit(int status); | ||
| JL_DLLEXPORT void JL_NORETURN jl_exit(int status, int kill_tasks); |
There was a problem hiding this comment.
This is probably a breaking change; I think we should avoid it.
|
Maybe doesn't need to be on the milestone? |
One issue with this is that if you want to run a precompile workload involving that package as a dependency, the package will likely be non-functional. So I am not sure that is the best idea. |
It somewhat only fixes the symptom, rather than the cause. The cause is that the user failed to clean up some resource when they should have, and so there is a leak in the code. In some cases, this code should (in addition to the precompile step to avoid making each and every precompile process expensive and slow) also have an explicit |
|
I think "this" (meaning, a way of addressing the I/O hang, not actually this PR) does need to be on the milestone:
IMO it's the worst error in the language by a very large margin. But there seems to be very little we can do about it from a technical standpoint: for example, we don't have any tools for discovering where the task was launched from. So the only options seem to be documentation and, in difficult cases, encourage bisection-by-commenting. That said, I no longer think the
In JuliaTesting/Aqua.jl#174 I allude to that by saying "In more complex cases, you may need to set up independently-callable functions to launch the tasks and cleanly shut them down." But perhaps that section should be expanded. |
This is #50914, with only the documentation changes, plus an improvement to the warning message. --------- Co-authored-by: Tim Holy <tim.holy@gmail.com> Co-authored-by: Ian Butterworth <i.r.butterworth@gmail.com>
As promised, CC @MilesCranmer
I should say that these devdocs are not very satisfying. I didn't find it very useful to extract the PIDs, because I think the warning message from Pkg does the core job of telling you which causative package is hanging, and AFAICT that's the only useful thing you get from the PID. So the advice basically boils down to "comment half your code at a time and bisect until you've found the problematic lines, then stare at them until you figure out what's going on." There's a little more guidance than that, as you'll see if you review this PR, but I think we just don't have decent diagnostics to pinpoint what is going wrong.