Skip to content

Add chattr stage; support stacked ostree.deployment mounts#1535

Merged
dustymabe merged 5 commits intoosbuild:mainfrom
dustymabe:osbuild-chattr-stage
Jan 31, 2024
Merged

Add chattr stage; support stacked ostree.deployment mounts#1535
dustymabe merged 5 commits intoosbuild:mainfrom
dustymabe:osbuild-chattr-stage

Conversation

@dustymabe
Copy link
Contributor

Here we add two main changes:

  1. a chattr stage to set file/directory attributes
  2. a rework of the ostree.deployment mount to support the OSTree musical chairs on mounted filesystems too.

The ostree.deployment rework is needed for the chattr stage to work (i.e. we lose the immutable bit when going through a org.osbuild.copy stage).

@dustymabe dustymabe force-pushed the osbuild-chattr-stage branch 4 times, most recently from f4fd67c to 97e013f Compare January 16, 2024 16:41
@dustymabe
Copy link
Contributor Author

dustymabe commented Jan 16, 2024

ok I think this is ready for review now and will hopefully pass tests

@dustymabe dustymabe requested review from achilleas-k and mvo5 January 16, 2024 16:43
@dustymabe
Copy link
Contributor Author

@mvo5 I imagine the chattr stage will be desirable for bootc work too.

dustymabe added a commit to dustymabe/coreos-assembler that referenced this pull request Jan 16, 2024
@dustymabe
Copy link
Contributor Author

hmm. I'm not sure what the test_schema_validation_containers_storage_conf[test_data0-storage_test_data0-does not have enough properties] IndexError: list index out of range error is in CI

I might need help figuring that one out.

dustymabe added a commit to dustymabe/coreos-assembler that referenced this pull request Jan 16, 2024
dustymabe added a commit to dustymabe/coreos-assembler that referenced this pull request Jan 16, 2024
@dustymabe
Copy link
Contributor Author

The schutzbot tests are failing with:

build:    	dc6e3a66fef3ebe7c815eb24d348215b9e5e2ed0cd808c15ebbe85fc73181a86
tree:     	75bf377b55bffc73c9ef867b7e002b85465867df4071152b92b1ca2e010e010f
raw-image:	3446c9f287e30dd9d57e788637e9815220a9eca9549724490c97fae1e21d0d14
raw-4k-image:	987e6e5b4199eea2ec36bd18df8cbde9931a0babb8d3ed2451711413636b081a
raw-metal-image:	a64eacdd7ae038e1e74fb56d08a03a32065541ff0f7718d42d651449396a55b9
raw-metal4k-image:	46880e686f35543e99ff5c58e9b0bc61d4207f96bfb12001c231d8cd8ac58579
raw-qemu-image:	76844fbd54fe8934cab3507e38d648e82e67484da531cfed9a970420c91af082
metal:    	0e792149ba2fb9309249ed4e51adaccb1154e7349e69a7f8a5f5ffd066fa683b
metal4k:  	edd4b425246e2b20c1705513eb11eaeaaf473436ec0cdb8f3e1cd9f3c1149bb7
qemu:     	42b97585a1637a1332e26cd78a82c27482802a574fdfeb2812e65e633dce5355
success
success
success
Test duration: 159.7584178699999
Traceback (most recent call last):
  File "test/cases/ostree-images", line 200, in <module>
    main()
  File "test/cases/ostree-images", line 186, in main
    results = run_tests(args, tmp)
  File "/usr/lib64/python3.6/tempfile.py", line 809, in __exit__
    self.cleanup()
  File "/usr/lib64/python3.6/tempfile.py", line 813, in cleanup
    _shutil.rmtree(self.name)
  File "/usr/lib64/python3.6/shutil.py", line 486, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/usr/lib64/python3.6/shutil.py", line 424, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/usr/lib64/python3.6/shutil.py", line 424, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/usr/lib64/python3.6/shutil.py", line 424, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  [Previous line repeated 1 more time]
  File "/usr/lib64/python3.6/shutil.py", line 428, in _rmtree_safe_fd
    onerror(os.rmdir, fullname, sys.exc_info())
  File "/usr/lib64/python3.6/shutil.py", line 426, in _rmtree_safe_fd
    os.rmdir(name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: 'tree'

but locally I don't see any error related to removing files when running sudo test/cases/ostree-images. @achilleas-k do you think you could help me figure out what's going on?

⏱  Duration: 0s
build:          dc6e3a66fef3ebe7c815eb24d348215b9e5e2ed0cd808c15ebbe85fc73181a86
tree:           75bf377b55bffc73c9ef867b7e002b85465867df4071152b92b1ca2e010e010f
raw-image:      3446c9f287e30dd9d57e788637e9815220a9eca9549724490c97fae1e21d0d14
raw-4k-image:   987e6e5b4199eea2ec36bd18df8cbde9931a0babb8d3ed2451711413636b081a
raw-metal-image:        a64eacdd7ae038e1e74fb56d08a03a32065541ff0f7718d42d651449396a55b9
raw-metal4k-image:      46880e686f35543e99ff5c58e9b0bc61d4207f96bfb12001c231d8cd8ac58579
raw-qemu-image: 76844fbd54fe8934cab3507e38d648e82e67484da531cfed9a970420c91af082
metal:          0e792149ba2fb9309249ed4e51adaccb1154e7349e69a7f8a5f5ffd066fa683b
metal4k:        edd4b425246e2b20c1705513eb11eaeaaf473436ec0cdb8f3e1cd9f3c1149bb7
qemu:           42b97585a1637a1332e26cd78a82c27482802a574fdfeb2812e65e633dce5355
success
success
success
Test duration: 266.9270973540006
tests/ok/failed: 1/1/0

@achilleas-k
Copy link
Member

@achilleas-k do you think you could help me figure out what's going on?

Nothing obvious pops out but it's interesting that it's happening for the old behaviour (deploying to tree). Something in the cleanup logic not unmounting everything maybe? Could be a race, but it's interesting that it's failing on all the runners and a race would be less deterministic.

@dustymabe
Copy link
Contributor Author

dustymabe commented Jan 17, 2024 via email

@dustymabe dustymabe force-pushed the osbuild-chattr-stage branch from 97e013f to fc9f0b4 Compare January 17, 2024 19:09
@dustymabe
Copy link
Contributor Author

ok I added my key in #1546 so now I should be able to ssh in to the CI runner

dustymabe added a commit to dustymabe/coreos-assembler that referenced this pull request Jan 17, 2024
dustymabe added a commit to coreos/coreos-assembler that referenced this pull request Jan 17, 2024
@dustymabe dustymabe force-pushed the osbuild-chattr-stage branch from fc9f0b4 to 000c447 Compare January 18, 2024 03:17
@dustymabe
Copy link
Contributor Author

ok I think I've fixed the schutzbot CI failure building ostree images (we'll see if it works in this CI run).

@lukewarmtemp is going to work on the change requested in #1535 (comment) and we'll do a new upload when that is ready.

dustymabe added a commit to dustymabe/coreos-assembler that referenced this pull request Jan 19, 2024
@dustymabe dustymabe force-pushed the osbuild-chattr-stage branch from 000c447 to 98f472f Compare January 22, 2024 15:32
@dustymabe
Copy link
Contributor Author

ok we've now made the requested changes by @achilleas-k in #1535 (comment)

dustymabe added a commit to coreos/coreos-assembler that referenced this pull request Jan 22, 2024
dustymabe added a commit to dustymabe/coreos-assembler that referenced this pull request Jan 24, 2024
Copy link
Member

@achilleas-k achilleas-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Thanks for the work.
I have one comment about the deployment mount's source mount schema, about keeping that consistent with how mounts are defined in stages. Other than that, this LGTM!

dustymabe added a commit to coreos/coreos-assembler that referenced this pull request Jan 26, 2024
@dustymabe dustymabe force-pushed the osbuild-chattr-stage branch from 98f472f to b98e4c7 Compare January 29, 2024 16:07
Copy link
Member

@achilleas-k achilleas-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM.

@dustymabe
Copy link
Contributor Author

dustymabe commented Jan 29, 2024

any ideas on the failing -k stages/test unittest failures? It passes locally for me and the error message doesn't seem to give much insight:

py312: commands[0]> bash -c 'python -m pytest --pyargs --rootdir=. -k stages/test '
============================= test session starts ==============================
platform linux -- Python 3.12.0rc1, pytest-8.0.0, pluggy-1.4.0
cachedir: .tox/py312/.pytest_cache
rootdir: /osb/workdir
plugins: xdist-3.5.0
collected 612 items / 612 deselected / 0 selected

=========================== 612 deselected in 1.21s ============================
py312: exit 5 (1.43 seconds) /osb/workdir> bash -c 'python -m pytest --pyargs --rootdir=. -k stages/test ' pid=78
.pkg: _exit> python /usr/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
  py312: FAIL code 5 (13.55=setup[12.12]+cmd[1.43] seconds)
  evaluation failed :( (13.60 seconds)
Error: Command failed: /usr/bin/docker run --net=host --privileged --rm --volume=/:/osb/host --volume=/home/runner/work/osbuild/osbuild:/osb/workdir --volume=/lib/modules/:/lib/modules/ --volume=/var/run/docker.sock:/var/run/docker.sock ghcr.io/osbuild/osbuild-ci:latest-202308241910 /bin/bash -o errexit -c # Note that only "test.run.test_stages" runs in parallel because

Though.. my local test (host is Fedora 39) does show:

platform linux -- Python 3.12.1, pytest-7.3.2, pluggy-1.2.0

so maybe pytest-8.0.0 ?

@dustymabe dustymabe enabled auto-merge (rebase) January 29, 2024 17:42
@dustymabe
Copy link
Contributor Author

looks like schutzbot is failing with:

Error: creating EC2 Spot Fleet Request: empty result

dustymabe and others added 5 commits January 31, 2024 17:22
This unwinds part of a25ae2b. The way the code ended up both
self.tree and self.mountpoint ended up pointing to the exactly
same path and so we'd end up doing two `umount -R` operations
on the same path. This ended up being a duplicate unmount.

On Fedora 39 this yields an error like:

```
mount/ostree.deployment (org.osbuild.ostree.deployment): umount: /var/osbuild/store/stage/uuid-efaac9370d25455d9e8df6d847ecb5b3/data/tree: not mounted
mount/ostree.deployment (org.osbuild.ostree.deployment): Traceback (most recent call last):
mount/ostree.deployment (org.osbuild.ostree.deployment):   File "/var/b/shared/code/github.com/osbuild/osbuild/mounts/org.osbuild.ostree.deployment", line 136, in <module>
mount/ostree.deployment (org.osbuild.ostree.deployment):     main()
mount/ostree.deployment (org.osbuild.ostree.deployment):   File "/var/b/shared/code/github.com/osbuild/osbuild/mounts/org.osbuild.ostree.deployment", line 132, in main
mount/ostree.deployment (org.osbuild.ostree.deployment):     service.main()
mount/ostree.deployment (org.osbuild.ostree.deployment):   File "/var/b/shared/code/github.com/osbuild/osbuild/osbuild/host.py", line 252, in main
mount/ostree.deployment (org.osbuild.ostree.deployment):     self.stop()
mount/ostree.deployment (org.osbuild.ostree.deployment):   File "/var/b/shared/code/github.com/osbuild/osbuild/osbuild/mounts.py", line 126, in stop
mount/ostree.deployment (org.osbuild.ostree.deployment):     self.umount()
mount/ostree.deployment (org.osbuild.ostree.deployment):   File "/var/b/shared/code/github.com/osbuild/osbuild/mounts/org.osbuild.ostree.deployment", line 125, in umount
mount/ostree.deployment (org.osbuild.ostree.deployment):     subprocess.run(["umount", "-R", self.tree],
mount/ostree.deployment (org.osbuild.ostree.deployment):   File "/usr/lib64/python3.12/subprocess.py", line 571, in run
mount/ostree.deployment (org.osbuild.ostree.deployment):     raise CalledProcessError(retcode, process.args,
mount/ostree.deployment (org.osbuild.ostree.deployment): subprocess.CalledProcessError: Command '['umount', '-R', '/var/osbuild/store/stage/uuid-efaac9370d25455d9e8df6d847ecb5b3/data/tree']
' returned non-zero exit status 1.

⏱  Duration: 103s
```

I think this was necessary because of a bug in util-linux that
mean some of the accounting information got out of date when
doing a `mount --move` operation, which we use here. I think this
bug (or bugs) is now fixed [1][2] in util-linux v2.39 (in Fedora 39),
which is now causing the above pasted error on F39.

Let's just add code here that mentions the problem and workaround
it with a loop to keep unmounting (essentially what the umount -R
should have done to overmounted filesystems if the mountinfo/utab
was correct) and also mention when we should be able to drop this
workaround.

[1] util-linux/util-linux@a04149f
[2] util-linux/util-linux@8cf6c50
It makes things a little more clear to know the variable is pointing
to the path of the deployment.
We still target the tree here, but we open ourselves up to be able
to target something other than the tree in the future. This mostly
exchanges the `tree` variable for `target`.

We also update the comment to try to enhance clarity.
Instead of operating directly on the tree for a stage we can operate
on a mount too. This is useful in the case where operating on the
directory tree of files isn't sufficient and the modifications need
to be made directly to the filesystems on the disk image that we are
creating.

One such example of this is we are having a problem right now where
the immutable bit being set on an OSTree deployment root doesn't
survive the `cp -a --reflink=auto` in the org.osbuild.copy stage when
being copied from the directory tree into the mounted XFS filesystem
we created on the disk image. Thus we have to workaround this loss
of attribute by applying the attribute directly on the mounted
filesystem from the disk.

In this change here we also add a check in osbuild/mounts.py to not
attempt a umount of the root of the mounts directory if that path
is no longer a mountpoint, which can happen when the umount -R
from the mounts/org.osbuild.ostree.deployment also removes the
overmount.

Here is an example of how this would be used:

```
  - type: org.osbuild.chattr
    options:
      immutable: true
      path: mount://root/
    devices:
      disk:
        type: org.osbuild.loopback
        options:
          filename: disk.img
          partscan: true
    mounts:
      - name: root
        type: org.osbuild.xfs
        source: disk
        partition:
          mpp-format-int: '{image.layout[''root''].partnum}'
        target: /
      - name: ostree.deployment
        type: org.osbuild.ostree.deployment
        options:
          source: mount
          deployment:
            osname: fedora-coreos
            ref: ostree/1/1/0
```

The initial mount on `/` is the filesystem from the root partition
on the disk. The second mount (of type org.osbuild.ostree.deployment)
then reconfigures things similar to how an OSTree system is set up.
Add or remove the immutable bit to the specified mount directory.

The need we have for this right now is for the CoreOS builds where
the immutable bit being set on an OSTree deployment root doesn't
survive the `cp -a --reflink=auto` in the org.osbuild.copy stage when
being copied from the directory tree into the mounted XFS filesystem
we created on the disk image. Thus we have to workaround this loss
of attribute by applying the attribute directly on the mounted
filesystem from the disk.
@dustymabe dustymabe merged commit 477a210 into osbuild:main Jan 31, 2024
dustymabe added a commit to dustymabe/coreos-assembler that referenced this pull request Feb 1, 2024
The PR finally merged but didn't make it into the v106
release so we still need to carry the patch for now.
osbuild/osbuild#1535
dustymabe added a commit to coreos/coreos-assembler that referenced this pull request Feb 1, 2024
The PR finally merged but didn't make it into the v106
release so we still need to carry the patch for now.
osbuild/osbuild#1535
mvo5 added a commit to mvo5/images that referenced this pull request Apr 5, 2024
Add the "Default" and "Source" options to OSTreeMountDeployment.
See
- "default" osbuild/osbuild#1553
- "source"  osbuild/osbuild#1535
mvo5 added a commit to mvo5/images that referenced this pull request Apr 5, 2024
Add the "Default" and "Source" options to OSTreeMountDeployment.
See
- "default" osbuild/osbuild#1553
- "source"  osbuild/osbuild#1535
github-merge-queue bot pushed a commit to osbuild/images that referenced this pull request Apr 7, 2024
Add the "Default" and "Source" options to OSTreeMountDeployment.
See
- "default" osbuild/osbuild#1553
- "source"  osbuild/osbuild#1535
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants