Alternative to #11437: warn, do not fail, if sandbox cannot be deleted#11566
Alternative to #11437: warn, do not fail, if sandbox cannot be deleted#11566nojb wants to merge 2 commits intoocaml:mainfrom
Conversation
…eleted Signed-off-by: Nicolás Ojeda Bär <n.oje.bar@gmail.com>
|
I think this fix makes sense for the time being. It's not really the users fault if a sandbox cannot be deleted, and I don't think it is important enough for a build to fail. If you look above, we already try to delete it when we create a sandbox. If for whatever reason that fails on Windows on the off chance that the sandbox wasn't cleared in a previous build and the virus checker has locked the file, then we can error out. The user should in that case just clear the I don't know if it is worth adding some more sophisticated logic in the future for Windows to avoid this situation entirely, but from reading the previous discussion it appears to be rare and non-fatal. |
|
Before you update the test and merge, let's give a headsup to the JS people since this patch will affect them more directly. |
|
Thanks for the quick review @Alizter. @MSoegtropIMC: would this fix be OK for you? |
|
Reading the comment of @Alizter I would think that avoiding issues with sandbox clean up would be better. "Manually clean up and restart the build" is not helpful for CI, while my solution fixed all CI issues I had. How about having both changes - the retry on file deletes from my PR and this? |
|
So what happens if a previous sandbox could not be cleaned up and then another sandbox needs to be created? Will this lead to a build failure? Or do sandbox folders have unique names? |
The sandbox directories contain a digest of their contents, so unless a rule is being rerun, which won't happen in a single build, there should be no conflicts. The behaviour I am mentioning would only happen on a rebuild, and seems too unlikely to cause major issues. We could perhaps have a better error message in that case however. I can have a look later today at improving that. |
|
I still don't see advantages of this solution over my suggestion. As discussed it is more of a hack because it ignores errors and leaves things in an unclean state while mine actually fixes the issue. Regearding the critics on my solution - that it might slow down builds - I was about to say that this is purely theoretical, but thinking about this it is not even this. So I would prefer my solution - which is unlike this solution also heavily tested in CI since > 1 month. |
Signed-off-by: Nicolás Ojeda Bär <n.oje.bar@gmail.com>
I think this is a good point, and personally I would be ready to merge #11437 in its current state. The only small issue I see with the way the implementation of #11437 is that if the user is bitten by this problem, the logic kicks in completely silently without reporting what is going on (which may be useful information). So as a last proposal, I pushed 9a539d8 which implements the same retry logic as in #11437 but in |
|
To be clear, I am just expressing my opinion on @MSoegtropIMC will you be able to check that this works for you, or is this something that is difficult to reproduce over a short timescale? |
|
The question is if this issue can also happen in non sandbox mode. As far as I can see there is another call to this function in Dune/dune/src/dune_engine/build_system.ml in "execute_action_for_rule", a failing unlink could have strange effects there as well. Since the main issue appears to be logging and the function is called in only few places, how about passing a logging function to the unlink function? Another point: there are quite a few occurrences of Unix.unlink in the dune source tree. Shouldn't these be replaced by the safe unlink function? |
Putting it into Coq Platform CI is easy, but there it would take a week or so to be 99.9% conclusive. Local tests would be faster (a few hours), but more manual work. |
| ~warn | ||
| t.dir | ||
| (Unix_error.Detailed.pp (Unix_error.Detailed.create error ~syscall ~arg)); | ||
| loop (cnt - 1) |
There was a problem hiding this comment.
We should sleep here a bit before looping, to avoid trying to delete the sandbox in a tight loop
| let warn = error = Unix.EACCES && cnt > 0 in | ||
| failed_to_delete_sandbox | ||
| ~warn | ||
| t.dir | ||
| (Unix_error.Detailed.pp (Unix_error.Detailed.create error ~syscall ~arg)); |
There was a problem hiding this comment.
The PR description says "simply warn if the sandbox cannot be deleted, instead of failing" but this part will fail after retrying 30 times.
My understanding is that we don't want the build to fail in case the sandbox files can't be deleted.
If we have retried the maximum number of times then simply continue without doing anything. Also, sleep before retrying to avoid retrying in a tight loop. Implements my own change suggestions in ocaml#11566
|
@nojb FYI, I have implemented my above review comments in rusv-simcorp@9f114b4 |
|
Let's go with #11437 and keep these alternative approaches in mind for the future if needed. |
An alternative to #11437: simply warn if the sandbox cannot be deleted, instead of failing.
Yet another possibility could be to warn and retry (as is done in #11437) the
Path.rm_rfoperation.For discussion.