Skip to content

stages/org.osbuild.tar: implement disk-full test (RHEL-4653)#1909

Merged
achilleas-k merged 2 commits intoosbuild:mainfrom
schuellerf:implement-tmpfs-disk-full-tests
Oct 26, 2024
Merged

stages/org.osbuild.tar: implement disk-full test (RHEL-4653)#1909
achilleas-k merged 2 commits intoosbuild:mainfrom
schuellerf:implement-tmpfs-disk-full-tests

Conversation

@schuellerf
Copy link
Contributor

As a followup of https://issues.redhat.com/browse/RHEL-4653 it seems that the stages don't return proper
error messages back if the disk is full.

Probably this is known and there is an issue for this, but I didn't find it.

This should be an example environment for more stages to test if they return a proper error in a "disk full scenario"

This PR is ment for a proof of concept for a test environment and basis of discussion
as the error of "disk full" is not properly handled not transformed to a proper error code and not handed over to the stack above in this code.

I think at least stages that have high risk of "disk full", like tar, rpm, cp … should handle "disk full" more gracefully and return a proper error message.

@supakeen
Copy link
Member

supakeen commented Oct 22, 2024

What is in the original issue linked here? It is not public so it's not clear what this PR is following up on.

@schuellerf
Copy link
Contributor Author

@supakeen Oh, sorry, you are right, thanks!
The original issue states that osbuild-composer stops working when the disk is full but stays in the RUNNING state.
This is not reproducible anymore, still there is no clear error message, what the failure is.

@schuellerf schuellerf force-pushed the implement-tmpfs-disk-full-tests branch 2 times, most recently from 2acbd74 to 54922e4 Compare October 23, 2024 07:53
@achilleas-k
Copy link
Member

achilleas-k commented Oct 23, 2024

In it's current state this PR isn't very useful. Capturing the output actually makes it more difficult to troubleshoot failed builds, and doesn't seem to change anything about how the error is handled so I don't see how it relates to the Jira ticket.

The ticket itself should probably be closed since the issue appears on quite an old version of osbuild-composer, and it doesn't currently exhibit the reported behaviour. We could bisect some history to figure out which change affected the behaviour (my guess is something in osbuild-composer's handling of failed builds), and maybe fork off another ticket about detecting out-of-space errors and other common stage failures and reporting the error up the stack (instead of crashing).

Is this PR the first step towards this kind of error reporting? If that were the case I'd expect to see some error string detection in the stage itself, not just in a test. Right now, as far as production is concerned, the PR is hiding the stdout and stderr of the tar call and not changing any of the behaviour. Why not the opposite? Changing the call to check=False (or wrapping it in an exception handler), checking for an exit code, and exiting gracefully (instead of raising an exception) would solve a lot more problems, especially if the captured stdout and stderr were printed by the stage since those messages would end up in the osbuild-composer log.

@schuellerf
Copy link
Contributor Author

Is this PR the first step towards this kind of error reporting?

@achilleas-k yes, thanks! That's exactly the discussion I was looking for. If that's a feasible way forward I'd be happy to enrich this PR to really care about the output in the stage and make it non-draft then :-D

Copy link
Member

@supakeen supakeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't review the code until I understand what we are trying to solve here.

In the linked issue the bug is that osbuild-composer's workers don't set jobs to FAILED when the disk is full and it is established that this is not reproducible (which I assume to mean; jobs do get failed).

In this PR you're testing that stages fail when the disk is full? It sounds like either the scope has to be larger (ensure that osbuild exits non-succesfully on any stage failure) or ensure that osbuild-composer always flags jobs as failed and reports the underlying cause (if it knows).

I'd also note that 'disk full' would never be useful to end-users of the Image Builder service, of course and is only relevant to those who debug (and they probably already look at logs).

A few code bits:

Aside from that; can we not do the mount/umount/unshare in a fixture? This test also needs skips if non-root or no-/tmp is available. I also dislike a hardcoded /tmp and /tmp/disk_full_tests I'd prefer to generate entire name with mks/mkdtemp.

@schuellerf
Copy link
Contributor Author

@supakeen

I won't review the code until I understand what we are trying to solve here.

I tried to summarize in the first message. The error of "disk full" is not handled and just exiting is bad to debug,
as the messages of the underlying tools like tar are strange and it takes a while to see that actually the disk is full. I think some stages would still somehow continue, leading to very odd behavior.
…and yes, in the service this should never happen anyway, but osbuild should also generate useful error messages for the "users" or for us in the other use cases

So the Idea is to start with a setup to be able to reproduce "disk full" for all stages and implement the behavior we want to have for each separately.

Copy link
Contributor

@mvo5 mvo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick suggestion about simplify the test setup inline.

@schuellerf schuellerf force-pushed the implement-tmpfs-disk-full-tests branch from 89bcef1 to 6e3dbcd Compare October 24, 2024 17:52
@schuellerf schuellerf marked this pull request as ready for review October 24, 2024 18:06
this should be an example environment
for more stages to test if they return a proper error
in a "disk full scenario"
@schuellerf schuellerf force-pushed the implement-tmpfs-disk-full-tests branch from 6e3dbcd to fdb2e81 Compare October 24, 2024 18:09
supakeen
supakeen previously approved these changes Oct 25, 2024
Copy link
Member

@supakeen supakeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this looks good now.

For usecases where for example selinux is not supported,
we should expect more errors from tar so we should also accept this
when matching the string.

Kudos go to Achilleas Koutsou <achilleas@koutsou.net> for this hint
@achilleas-k achilleas-k enabled auto-merge (rebase) October 25, 2024 11:29
@achilleas-k achilleas-k merged commit 2d1e855 into osbuild:main Oct 26, 2024
joelcapitao added a commit to joelcapitao/osbuild that referenced this pull request Jan 22, 2026
Add support for the new [install.ostree] configuration section
with the bls-append-except-default option, which was introduced
in bootc PR osbuild#1909 [1]

This option allows setting the ostree sysroot.bls-append-except-default
config key during installation, which appends kernel arguments to
Boot Loader Spec entries except for the default entry. A common use
case is setting grub_users="" to bypass GRUB password prompts for
non-default deployments.

Example usage in manifest:
  {
    "type": "org.osbuild.bootc.install.config",
    "options": {
      "filename": "10-custom.toml",
      "config": {
        "install": {
          "ostree": {
            "bls-append-except-default": "grub_users=\"\""
          }
        }
      }
    }
  }

[1] bootc-dev/bootc#1909

Signed-off-by: Joel Capitao <jcapitao@redhat.com>
supakeen pushed a commit to joelcapitao/osbuild that referenced this pull request Jan 27, 2026
Add support for the new [install.ostree] configuration section
with the bls-append-except-default option, which was introduced
in bootc PR osbuild#1909 [1]

This option allows setting the ostree sysroot.bls-append-except-default
config key during installation, which appends kernel arguments to
Boot Loader Spec entries except for the default entry. A common use
case is setting grub_users="" to bypass GRUB password prompts for
non-default deployments.

Example usage in manifest:
  {
    "type": "org.osbuild.bootc.install.config",
    "options": {
      "filename": "10-custom.toml",
      "config": {
        "install": {
          "ostree": {
            "bls-append-except-default": "grub_users=\"\""
          }
        }
      }
    }
  }

[1] bootc-dev/bootc#1909

Signed-off-by: Joel Capitao <jcapitao@redhat.com>
supakeen pushed a commit that referenced this pull request Jan 27, 2026
Add support for the new [install.ostree] configuration section
with the bls-append-except-default option, which was introduced
in bootc PR #1909 [1]

This option allows setting the ostree sysroot.bls-append-except-default
config key during installation, which appends kernel arguments to
Boot Loader Spec entries except for the default entry. A common use
case is setting grub_users="" to bypass GRUB password prompts for
non-default deployments.

Example usage in manifest:
  {
    "type": "org.osbuild.bootc.install.config",
    "options": {
      "filename": "10-custom.toml",
      "config": {
        "install": {
          "ostree": {
            "bls-append-except-default": "grub_users=\"\""
          }
        }
      }
    }
  }

[1] bootc-dev/bootc#1909

Signed-off-by: Joel Capitao <jcapitao@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants