stages/org.osbuild.tar: implement disk-full test (RHEL-4653)#1909
stages/org.osbuild.tar: implement disk-full test (RHEL-4653)#1909achilleas-k merged 2 commits intoosbuild:mainfrom
Conversation
|
What is in the original issue linked here? It is not public so it's not clear what this PR is following up on. |
|
@supakeen Oh, sorry, you are right, thanks! |
2acbd74 to
54922e4
Compare
|
In it's current state this PR isn't very useful. Capturing the output actually makes it more difficult to troubleshoot failed builds, and doesn't seem to change anything about how the error is handled so I don't see how it relates to the Jira ticket. The ticket itself should probably be closed since the issue appears on quite an old version of osbuild-composer, and it doesn't currently exhibit the reported behaviour. We could bisect some history to figure out which change affected the behaviour (my guess is something in osbuild-composer's handling of failed builds), and maybe fork off another ticket about detecting out-of-space errors and other common stage failures and reporting the error up the stack (instead of crashing). Is this PR the first step towards this kind of error reporting? If that were the case I'd expect to see some error string detection in the stage itself, not just in a test. Right now, as far as production is concerned, the PR is hiding the stdout and stderr of the tar call and not changing any of the behaviour. Why not the opposite? Changing the call to |
@achilleas-k yes, thanks! That's exactly the discussion I was looking for. If that's a feasible way forward I'd be happy to enrich this PR to really care about the output in the stage and make it non-draft then :-D |
There was a problem hiding this comment.
I won't review the code until I understand what we are trying to solve here.
In the linked issue the bug is that osbuild-composer's workers don't set jobs to FAILED when the disk is full and it is established that this is not reproducible (which I assume to mean; jobs do get failed).
In this PR you're testing that stages fail when the disk is full? It sounds like either the scope has to be larger (ensure that osbuild exits non-succesfully on any stage failure) or ensure that osbuild-composer always flags jobs as failed and reports the underlying cause (if it knows).
I'd also note that 'disk full' would never be useful to end-users of the Image Builder service, of course and is only relevant to those who debug (and they probably already look at logs).
A few code bits:
Aside from that; can we not do the mount/umount/unshare in a fixture? This test also needs skips if non-root or no-/tmp is available. I also dislike a hardcoded /tmp and /tmp/disk_full_tests I'd prefer to generate entire name with mks/mkdtemp.
I tried to summarize in the first message. The error of "disk full" is not handled and just exiting is bad to debug, So the Idea is to start with a setup to be able to reproduce "disk full" for all stages and implement the behavior we want to have for each separately. |
mvo5
left a comment
There was a problem hiding this comment.
A quick suggestion about simplify the test setup inline.
89bcef1 to
6e3dbcd
Compare
this should be an example environment for more stages to test if they return a proper error in a "disk full scenario"
6e3dbcd to
fdb2e81
Compare
supakeen
left a comment
There was a problem hiding this comment.
Thank you, this looks good now.
For usecases where for example selinux is not supported, we should expect more errors from tar so we should also accept this when matching the string. Kudos go to Achilleas Koutsou <achilleas@koutsou.net> for this hint
Add support for the new [install.ostree] configuration section with the bls-append-except-default option, which was introduced in bootc PR osbuild#1909 [1] This option allows setting the ostree sysroot.bls-append-except-default config key during installation, which appends kernel arguments to Boot Loader Spec entries except for the default entry. A common use case is setting grub_users="" to bypass GRUB password prompts for non-default deployments. Example usage in manifest: { "type": "org.osbuild.bootc.install.config", "options": { "filename": "10-custom.toml", "config": { "install": { "ostree": { "bls-append-except-default": "grub_users=\"\"" } } } } } [1] bootc-dev/bootc#1909 Signed-off-by: Joel Capitao <jcapitao@redhat.com>
Add support for the new [install.ostree] configuration section with the bls-append-except-default option, which was introduced in bootc PR osbuild#1909 [1] This option allows setting the ostree sysroot.bls-append-except-default config key during installation, which appends kernel arguments to Boot Loader Spec entries except for the default entry. A common use case is setting grub_users="" to bypass GRUB password prompts for non-default deployments. Example usage in manifest: { "type": "org.osbuild.bootc.install.config", "options": { "filename": "10-custom.toml", "config": { "install": { "ostree": { "bls-append-except-default": "grub_users=\"\"" } } } } } [1] bootc-dev/bootc#1909 Signed-off-by: Joel Capitao <jcapitao@redhat.com>
Add support for the new [install.ostree] configuration section with the bls-append-except-default option, which was introduced in bootc PR #1909 [1] This option allows setting the ostree sysroot.bls-append-except-default config key during installation, which appends kernel arguments to Boot Loader Spec entries except for the default entry. A common use case is setting grub_users="" to bypass GRUB password prompts for non-default deployments. Example usage in manifest: { "type": "org.osbuild.bootc.install.config", "options": { "filename": "10-custom.toml", "config": { "install": { "ostree": { "bls-append-except-default": "grub_users=\"\"" } } } } } [1] bootc-dev/bootc#1909 Signed-off-by: Joel Capitao <jcapitao@redhat.com>
As a followup of https://issues.redhat.com/browse/RHEL-4653 it seems that the stages don't return proper
error messages back if the disk is full.
Probably this is known and there is an issue for this, but I didn't find it.
This should be an example environment for more stages to test if they return a proper error in a "disk full scenario"
This PR is ment for a proof of concept for a test environment and basis of discussion
as the error of "disk full" is not properly handled not transformed to a proper error code and not handed over to the stack above in this code.
I think at least stages that have high risk of "disk full", like
tar,rpm,cp… should handle "disk full" more gracefully and return a proper error message.