criu checkpoint/restore: print errors from criu log#3816
criu checkpoint/restore: print errors from criu log#3816lifubang merged 4 commits intoopencontainers:mainfrom
Conversation
|
@adrianreber @avagin PTAL |
10de072 to
840df2a
Compare
840df2a to
a10b9ca
Compare
|
That is good idea. A bit wild maybe. When using CRIU in Podman or CRI-O users usually get the error message from runc and pass it to the user in combination with the location of the log file. I guess it would be important to know if users of runc's checkpoint functionality (like containerd, Podman and CRI-O) can handle this changed runc behaviour. Your log scanner only seems to run during a failure so the added time to scan the log file should normally not be a problem. I like this idea of better error reporting to the user but I am not sure how well runc's user (container engines) can handle multiline error messages. |
I've seen quite a few cases when runc emits multiple errors/warnings etc., so this should not be something that's entirely new. Surely, we'll have 1.2.0rc released to test this. |
|
LGTM |
7579873 to
15c1c60
Compare
15c1c60 to
dffd0b4
Compare
No code change, only added periods to some comments to make godot happy. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Use "switch t" since we only check t. 2. Remove unneeded t assignment. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
When criu fails, it does not give us much context to understand what was the cause of an error -- for that, we need to take a look into its log file. This is somewhat complicated to do (as you can see in parts of checkpoint.bats removed by this commit), and not very user-friendly. Add a function to find and log errors from criu logs, together with some preceding context, in case either checkpoint or restore has failed. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
As we now log the log file name in logCriuErrors. While at it, there is no need to use var.String() with %s as it is done by the runtime. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
dffd0b4 to
3867693
Compare
|
LGTM |
The alternative is to make criu report extended errors. This is somewhat complicated because when debugging criu some context is usually needed (to see what happened just before the error) and |
|
@lifubang PTAL |
When criu fails, it does not give us much context to understand what was the cause of an error -- for that, we need to take a look into its log file.
This is somewhat complicated to do (as you can see in parts of checkpoint.bats removed by this commit), and not very convenient.
Add a function to find and log errors from criu logs, in case either checkpoint or restore has failed.
Fixes: #3711